Nicholas Riegel d122b773d4 feat: integrate Runboat E2E testing and performance tests into CI/CD pipeline

Updated .gitlab-ci.yml with complete Phase 3 pipeline stages:

New Stages Added:
- preview: Runboat API call to create ephemeral preview instance
- e2e: Playwright E2E tests against Runboat preview
- performance: Server-side performance benchmarks (latency, queries, tokens)

Pipeline Changes:
- runboat_preview job: Requests preview build, extracts URL, posts MR comment
- e2e_tests job: Runs 19 Playwright scenarios against preview URL
- performance_tests job: Runs 7 performance benchmark tests locally
- All jobs include artifacts (HTML reports, traces) for debugging

Job Dependencies:
- e2e_tests needs runboat_preview (waits for preview URL)
- performance_tests runs in parallel with build stage
- All new jobs only on merge_requests (not main/daily)

New Required CI/CD Variables:
- RUNBOAT_API_URL: Runboat API endpoint (secret)
- RUNBOAT_TOKEN: Bearer token for Runboat (secret)
- GITLAB_BOT_TOKEN: GitLab bot token for MR comments (secret)

Updated PHASE3_ROADMAP.md with:
- Runboat setup instructions
- CI/CD variable requirements and how to obtain
- Complete YAML snippets (already in .gitlab-ci.yml)
- Pipeline flow diagram
- Estimated total pipeline time: ~35 minutes

Non-blocking failures:
- runboat_preview: allow_failure=true (Runboat might be unavailable)
- e2e_tests: allow_failure=true (E2E informational, doesn't block merge)
- performance_tests: allow_failure=false (must pass)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-05-30 00:54:59 -04:00

11 KiB

Raw Blame History

Phase 3: Runboat E2E Testing and Performance Benchmarks

Status: In Progress
Start: 2026-05-30
Target: E2E coverage + performance SLOs met

Goals

1. E2E Test Coverage (10–30 scenarios)

Critical user journeys verified via Playwright:

User generates blog post on-demand
User schedules daily blog generation
User views generation logs and retries failed attempts
User edits social media copy before publication
User views published post with correct SEO fields
User receives notification email with correct content
System recovers gracefully from LLM API errors
Multiple users generate posts concurrently (collision handling)

2. Performance Benchmarks

Establish baseline metrics for:

Generation Latency: Time from wizard click to post created
- Target: < 30 seconds (including LLM API call)
- Measure: P50, P95, P99
Token Efficiency: Tokens used per blog post
- Target: 800–1200 tokens for ~800-word post
- Baseline: Record for cost optimization
Database Query Count: N+1 detection
- Target: < 50 queries per generation
- Tool: assertQueryCount() on hot paths
Throughput: Concurrent generations
- Target: 5+ simultaneous posts without degradation
- Stress test: 10 parallel schedule slots
Memory Usage: Peak RSS during generation
- Target: < 500 MB per Odoo process

3. Load Testing

Simulate production scenarios:

100 pending topics in queue
3 active schedule slots all triggering within 5 minutes
5 concurrent users generating posts
Template DB priming time baseline

Implementation Plan

Layer 1: Runboat Setup & E2E Infrastructure

# 1. Create e2e/ directory structure
e2e/
├── conftest.py              # Session/auth fixtures, Runboat polling
├── test_generation.py       # On-demand generation workflow
├── test_scheduling.py       # Schedule slot execution
├── test_notifications.py    # Email and social copy
├── test_error_recovery.py   # API errors and retries
└── requirements.txt         # pytest, playwright

# 2. Set up conftest.py with:
# - wait_for_odoo(url) polling
# - auth_state fixture (admin login)
# - page fixture (authenticated Playwright context)
# - BASE_URL from env var or CI

# 3. Create .gitlab-ci.yml runboat stage:
runboat_preview:
  stage: preview
  script: |
    curl -X POST $RUNBOAT_URL/builds \
      -H "Authorization: Bearer $RUNBOAT_TOKEN" \
      -d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}"

Layer 2: E2E Test Scenarios (10–20 tests)

Generation Workflow (3 tests):

def test_user_generates_blog_post_on_demand(page):
    # Navigate to wizard
    # Fill topic, select provider, set auto-publish
    # Click Generate
    # Assert blog.post created with title + body
    # Assert email sent to configured recipient

def test_user_saves_post_as_draft_for_review(page):
    # Same as above but auto_publish=False
    # Assert post is not published

def test_generation_fails_gracefully_with_api_error(page):
    # Trigger with invalid API key
    # Assert error message displayed
    # Assert "Retry" button visible on log

Scheduling Workflow (2 tests):

def test_user_configures_daily_schedule_slot(page):
    # Navigate to schedule slots
    # Create morning, afternoon, evening slots
    # Set LLM provider and model
    # Toggle auto-publish per slot
    # Save and verify all 3 slots active

def test_user_monitors_generation_logs(page):
    # View all generation logs
    # Filter by state (success/error)
    # Click retry on failed log
    # Verify retry increments attempt counter

Email & Social (2 tests):

def test_email_contains_post_title_and_social_copy(page):
    # Generate and publish post
    # Check generated email in outbox
    # Verify subject contains blog name + post title
    # Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn)

def test_user_edits_social_copy_before_publishing(page):
    # Generate as draft
    # Edit social media copy for each platform
    # Save and publish
    # Verify email uses edited copy

Error Recovery (2 tests):

def test_user_retries_failed_generation(page):
    # Trigger generation with bad API key
    # Log shows error state
    # Fix API key in Settings
    # Click Retry on log
    # Verify post created successfully

def test_schedule_slot_continues_after_api_error(page):
    # Set invalid API key on schedule slot
    # Slot executes, fails, logs error
    # Fix API key
    # Wait for next slot time
    # Verify next generation succeeds

Concurrency (1–2 tests):

def test_multiple_users_generate_posts_concurrently(page):
    # User1 generates on-demand
    # User2 generates on-demand simultaneously
    # Both posts created successfully
    # No database locks or conflicts

Layer 3: Performance Benchmarks

Latency Profiling:

def test_generation_latency_p50_under_30s(page):
    """Measure time from "Generate Now" click to blog.post created."""
    import time
    start = time.time()
    # ... navigate and generate ...
    elapsed = time.time() - start
    assert elapsed < 30, f"Generation took {elapsed}s, target <30s"
    # Record metric: elapsed_seconds_p50

Query Count Assertion:

def test_generation_uses_fewer_than_50_queries(page):
    """Verify no N+1 query patterns."""
    from odoo.tests import TransactionCase
    # In the server-side test, not E2E:
    with self.assertQueryCount(50):
        schedule.run_generation()

Stress Test (not Playwright, server-side):

def test_concurrent_schedule_slots_under_load():
    """3 slots × 5 iterations = 15 posts in rapid succession."""
    # Trigger all 3 schedule slots
    # Measure: peak memory, query count, token usage
    # Assert: all posts created, no failures

Runboat Integration

What is Runboat?

Runboat (by Acsone) provides:

Auto-deployed preview instances of Odoo per CI commit
Live URL for E2E testing (no local bootstrapping needed)
Fresh template DB with addon pre-installed
5-minute auto-cleanup after test run

CI/CD Variables Required

Add these to GitLab Project Settings → CI/CD Variables:

Variable	Type	Purpose	Example
`RUNBOAT_API_URL`	Secret	Runboat API endpoint	`https://api.runboat.dev`
`RUNBOAT_TOKEN`	Secret	Bearer token for Runboat API	`rbk_xxx...`
`GITLAB_BOT_TOKEN`	Secret	Personal/bot token for MR comments	`glpat_xxx...`

How to obtain:

RUNBOAT_API_URL & RUNBOAT_TOKEN: Request from Acsone/infrastructure team
GITLAB_BOT_TOKEN: Create via GitLab → Settings → Access Tokens
- Scopes: api, read_api, read_repository
- Save as CI/CD variable (marked as Protected, Masked)

CI/CD Integration (Already Added)

.gitlab-ci.yml now includes:

Stage: preview

runboat_preview:
  stage: preview
  image: curlimages/curl:latest
  script:
    # Request preview build from Runboat
    - RESP=$(curl -fsSL -X POST "${RUNBOAT_API_URL}/builds" \
        -H "Authorization: Bearer ${RUNBOAT_TOKEN}" \
        -d "{\"repo\":\"${CI_PROJECT_PATH}\",\"sha\":\"${CI_COMMIT_SHA}\"}")
    - BUILD_URL=$(echo "$RESP" | jq -r '.url')
    - echo "BUILD_URL=$BUILD_URL" >> build.env
    # Post comment to MR
    - curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \
        -H "PRIVATE-TOKEN: ${GITLAB_BOT_TOKEN}" \
        -d "body=🚀 [Preview](${BUILD_URL}/odoo) ready"
  artifacts:
    reports:
      dotenv: build.env

Stage: e2e

e2e_tests:
  stage: e2e
  image: mcr.microsoft.com/playwright/python:latest
  needs: [runboat_preview]
  script:
    - pip install -r e2e/requirements.txt
    - pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure
  artifacts:
    when: always
    paths:
      - e2e/traces/
    expire_in: 1 week

Stage: test (performance)

performance_tests:
  stage: test
  image: $ODOO_IMAGE
  script:
    - pytest addons/itsulu_blog_publisher/tests/test_performance.py \
        -m performance --odoo-database=$POSTGRES_DB

Pipeline Flow

Merge Request
    ↓
[lint] black, pylint-odoo (2 min)
    ↓
[test] unit + BDD + performance (10 min)
    ↓
[build] Docker image → registry (3 min)
    ↓
[preview] Runboat deploy (5 min)
    ↓
[e2e] Playwright against preview (15 min)
    ↓
Results → MR comment with preview URL

Total pipeline time: ~35 minutes

Unit/BDD/Performance tests run in parallel with Docker build
E2E tests run after preview is ready

Success Criteria

✅ Phase 3 Complete when:

10–20 E2E scenarios passing (Runboat)
Performance baseline established (latency, tokens, queries)
Concurrent generation verified (5+ simultaneous posts)
All E2E tests green on merge requests
Runboat integration in CI/CD
Performance metrics documented in README
No E2E test flakiness (< 2% failure rate)

Performance SLO Targets

Metric	Target	Rationale
Generation latency (P50)	< 30 seconds	User experience (wizard response time)
Generation latency (P99)	< 60 seconds	Outlier tolerance
Tokens per post	800–1200	Cost baseline for budget planning
Queries per generation	< 50	N+1 detection and DB load
Concurrent posts	5+	Peak capacity without degradation
Email send latency	< 5 seconds	Notification responsiveness
Template DB prime time	< 60 seconds	CI/CD pipeline efficiency

Implementation Timeline

Week	Task	Owner
W1	Set up e2e/ directory, conftest.py, Runboat polling	Claude
W1	Implement 3–5 core E2E scenarios (generation, scheduling)	Claude
W2	Add error recovery and email scenarios	Claude
W2	Set up performance measurement (latency, queries)	Claude
W3	Stress testing and concurrency verification	Claude
W3	Performance tuning if SLOs not met	Claude
W4	Runboat CI/CD integration	Claude
W4	Final verification and documentation	Claude

Known Constraints

Runboat Limitations

Cold start: First request may take 30–60s (instance startup)
Auto-cleanup: Instance removed 5 min after last request
No persistent storage: Data lost when instance cleaned up
Resource limits: CPU/memory capped per deployment tier

E2E Test Maintenance

Brittle selectors: Avoid .o_field_value (auto-generated)
Timing issues: Use page.wait_for_*() not time.sleep()
Flakiness: Run 3× locally before merging
Timeout: Set ≥ 30s for slow JS rendering

References

Next: Set up e2e/ directory and implement core scenarios

11 KiB Raw Blame History Unescape Escape