itsulu-blog-publisher/PHASE3_ROADMAP.md
Nicholas Riegel d122b773d4 feat: integrate Runboat E2E testing and performance tests into CI/CD pipeline
Updated .gitlab-ci.yml with complete Phase 3 pipeline stages:

New Stages Added:
- preview: Runboat API call to create ephemeral preview instance
- e2e: Playwright E2E tests against Runboat preview
- performance: Server-side performance benchmarks (latency, queries, tokens)

Pipeline Changes:
- runboat_preview job: Requests preview build, extracts URL, posts MR comment
- e2e_tests job: Runs 19 Playwright scenarios against preview URL
- performance_tests job: Runs 7 performance benchmark tests locally
- All jobs include artifacts (HTML reports, traces) for debugging

Job Dependencies:
- e2e_tests needs runboat_preview (waits for preview URL)
- performance_tests runs in parallel with build stage
- All new jobs only on merge_requests (not main/daily)

New Required CI/CD Variables:
- RUNBOAT_API_URL: Runboat API endpoint (secret)
- RUNBOAT_TOKEN: Bearer token for Runboat (secret)
- GITLAB_BOT_TOKEN: GitLab bot token for MR comments (secret)

Updated PHASE3_ROADMAP.md with:
- Runboat setup instructions
- CI/CD variable requirements and how to obtain
- Complete YAML snippets (already in .gitlab-ci.yml)
- Pipeline flow diagram
- Estimated total pipeline time: ~35 minutes

Non-blocking failures:
- runboat_preview: allow_failure=true (Runboat might be unavailable)
- e2e_tests: allow_failure=true (E2E informational, doesn't block merge)
- performance_tests: allow_failure=false (must pass)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-30 00:54:59 -04:00

11 KiB
Raw Permalink Blame History

Phase 3: Runboat E2E Testing and Performance Benchmarks

Status: In Progress
Start: 2026-05-30
Target: E2E coverage + performance SLOs met

Goals

1. E2E Test Coverage (1030 scenarios)

Critical user journeys verified via Playwright:

  • User generates blog post on-demand
  • User schedules daily blog generation
  • User views generation logs and retries failed attempts
  • User edits social media copy before publication
  • User views published post with correct SEO fields
  • User receives notification email with correct content
  • System recovers gracefully from LLM API errors
  • Multiple users generate posts concurrently (collision handling)

2. Performance Benchmarks

Establish baseline metrics for:

  • Generation Latency: Time from wizard click to post created
    • Target: < 30 seconds (including LLM API call)
    • Measure: P50, P95, P99
  • Token Efficiency: Tokens used per blog post
    • Target: 8001200 tokens for ~800-word post
    • Baseline: Record for cost optimization
  • Database Query Count: N+1 detection
    • Target: < 50 queries per generation
    • Tool: assertQueryCount() on hot paths
  • Throughput: Concurrent generations
    • Target: 5+ simultaneous posts without degradation
    • Stress test: 10 parallel schedule slots
  • Memory Usage: Peak RSS during generation
    • Target: < 500 MB per Odoo process

3. Load Testing

Simulate production scenarios:

  • 100 pending topics in queue
  • 3 active schedule slots all triggering within 5 minutes
  • 5 concurrent users generating posts
  • Template DB priming time baseline

Implementation Plan

Layer 1: Runboat Setup & E2E Infrastructure

# 1. Create e2e/ directory structure
e2e/
├── conftest.py              # Session/auth fixtures, Runboat polling
├── test_generation.py       # On-demand generation workflow
├── test_scheduling.py       # Schedule slot execution
├── test_notifications.py    # Email and social copy
├── test_error_recovery.py   # API errors and retries
└── requirements.txt         # pytest, playwright

# 2. Set up conftest.py with:
# - wait_for_odoo(url) polling
# - auth_state fixture (admin login)
# - page fixture (authenticated Playwright context)
# - BASE_URL from env var or CI

# 3. Create .gitlab-ci.yml runboat stage:
runboat_preview:
  stage: preview
  script: |
    curl -X POST $RUNBOAT_URL/builds \
      -H "Authorization: Bearer $RUNBOAT_TOKEN" \
      -d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}"

Layer 2: E2E Test Scenarios (1020 tests)

Generation Workflow (3 tests):

def test_user_generates_blog_post_on_demand(page):
    # Navigate to wizard
    # Fill topic, select provider, set auto-publish
    # Click Generate
    # Assert blog.post created with title + body
    # Assert email sent to configured recipient

def test_user_saves_post_as_draft_for_review(page):
    # Same as above but auto_publish=False
    # Assert post is not published

def test_generation_fails_gracefully_with_api_error(page):
    # Trigger with invalid API key
    # Assert error message displayed
    # Assert "Retry" button visible on log

Scheduling Workflow (2 tests):

def test_user_configures_daily_schedule_slot(page):
    # Navigate to schedule slots
    # Create morning, afternoon, evening slots
    # Set LLM provider and model
    # Toggle auto-publish per slot
    # Save and verify all 3 slots active

def test_user_monitors_generation_logs(page):
    # View all generation logs
    # Filter by state (success/error)
    # Click retry on failed log
    # Verify retry increments attempt counter

Email & Social (2 tests):

def test_email_contains_post_title_and_social_copy(page):
    # Generate and publish post
    # Check generated email in outbox
    # Verify subject contains blog name + post title
    # Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn)

def test_user_edits_social_copy_before_publishing(page):
    # Generate as draft
    # Edit social media copy for each platform
    # Save and publish
    # Verify email uses edited copy

Error Recovery (2 tests):

def test_user_retries_failed_generation(page):
    # Trigger generation with bad API key
    # Log shows error state
    # Fix API key in Settings
    # Click Retry on log
    # Verify post created successfully

def test_schedule_slot_continues_after_api_error(page):
    # Set invalid API key on schedule slot
    # Slot executes, fails, logs error
    # Fix API key
    # Wait for next slot time
    # Verify next generation succeeds

Concurrency (12 tests):

def test_multiple_users_generate_posts_concurrently(page):
    # User1 generates on-demand
    # User2 generates on-demand simultaneously
    # Both posts created successfully
    # No database locks or conflicts

Layer 3: Performance Benchmarks

Latency Profiling:

def test_generation_latency_p50_under_30s(page):
    """Measure time from "Generate Now" click to blog.post created."""
    import time
    start = time.time()
    # ... navigate and generate ...
    elapsed = time.time() - start
    assert elapsed < 30, f"Generation took {elapsed}s, target <30s"
    # Record metric: elapsed_seconds_p50

Query Count Assertion:

def test_generation_uses_fewer_than_50_queries(page):
    """Verify no N+1 query patterns."""
    from odoo.tests import TransactionCase
    # In the server-side test, not E2E:
    with self.assertQueryCount(50):
        schedule.run_generation()

Stress Test (not Playwright, server-side):

def test_concurrent_schedule_slots_under_load():
    """3 slots × 5 iterations = 15 posts in rapid succession."""
    # Trigger all 3 schedule slots
    # Measure: peak memory, query count, token usage
    # Assert: all posts created, no failures

Runboat Integration

What is Runboat?

Runboat (by Acsone) provides:

  • Auto-deployed preview instances of Odoo per CI commit
  • Live URL for E2E testing (no local bootstrapping needed)
  • Fresh template DB with addon pre-installed
  • 5-minute auto-cleanup after test run

CI/CD Variables Required

Add these to GitLab Project Settings → CI/CD Variables:

Variable Type Purpose Example
RUNBOAT_API_URL Secret Runboat API endpoint https://api.runboat.dev
RUNBOAT_TOKEN Secret Bearer token for Runboat API rbk_xxx...
GITLAB_BOT_TOKEN Secret Personal/bot token for MR comments glpat_xxx...

How to obtain:

  1. RUNBOAT_API_URL & RUNBOAT_TOKEN: Request from Acsone/infrastructure team
  2. GITLAB_BOT_TOKEN: Create via GitLab → Settings → Access Tokens
    • Scopes: api, read_api, read_repository
    • Save as CI/CD variable (marked as Protected, Masked)

CI/CD Integration (Already Added)

.gitlab-ci.yml now includes:

Stage: preview

runboat_preview:
  stage: preview
  image: curlimages/curl:latest
  script:
    # Request preview build from Runboat
    - RESP=$(curl -fsSL -X POST "${RUNBOAT_API_URL}/builds" \
        -H "Authorization: Bearer ${RUNBOAT_TOKEN}" \
        -d "{\"repo\":\"${CI_PROJECT_PATH}\",\"sha\":\"${CI_COMMIT_SHA}\"}")
    - BUILD_URL=$(echo "$RESP" | jq -r '.url')
    - echo "BUILD_URL=$BUILD_URL" >> build.env
    # Post comment to MR
    - curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \
        -H "PRIVATE-TOKEN: ${GITLAB_BOT_TOKEN}" \
        -d "body=🚀 [Preview](${BUILD_URL}/odoo) ready"
  artifacts:
    reports:
      dotenv: build.env

Stage: e2e

e2e_tests:
  stage: e2e
  image: mcr.microsoft.com/playwright/python:latest
  needs: [runboat_preview]
  script:
    - pip install -r e2e/requirements.txt
    - pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure
  artifacts:
    when: always
    paths:
      - e2e/traces/
    expire_in: 1 week

Stage: test (performance)

performance_tests:
  stage: test
  image: $ODOO_IMAGE
  script:
    - pytest addons/itsulu_blog_publisher/tests/test_performance.py \
        -m performance --odoo-database=$POSTGRES_DB

Pipeline Flow

Merge Request
    ↓
[lint] black, pylint-odoo (2 min)
    ↓
[test] unit + BDD + performance (10 min)
    ↓
[build] Docker image → registry (3 min)
    ↓
[preview] Runboat deploy (5 min)
    ↓
[e2e] Playwright against preview (15 min)
    ↓
Results → MR comment with preview URL

Total pipeline time: ~35 minutes

  • Unit/BDD/Performance tests run in parallel with Docker build
  • E2E tests run after preview is ready

Success Criteria

Phase 3 Complete when:

  • 1020 E2E scenarios passing (Runboat)
  • Performance baseline established (latency, tokens, queries)
  • Concurrent generation verified (5+ simultaneous posts)
  • All E2E tests green on merge requests
  • Runboat integration in CI/CD
  • Performance metrics documented in README
  • No E2E test flakiness (< 2% failure rate)

Performance SLO Targets

Metric Target Rationale
Generation latency (P50) < 30 seconds User experience (wizard response time)
Generation latency (P99) < 60 seconds Outlier tolerance
Tokens per post 8001200 Cost baseline for budget planning
Queries per generation < 50 N+1 detection and DB load
Concurrent posts 5+ Peak capacity without degradation
Email send latency < 5 seconds Notification responsiveness
Template DB prime time < 60 seconds CI/CD pipeline efficiency

Implementation Timeline

Week Task Owner
W1 Set up e2e/ directory, conftest.py, Runboat polling Claude
W1 Implement 35 core E2E scenarios (generation, scheduling) Claude
W2 Add error recovery and email scenarios Claude
W2 Set up performance measurement (latency, queries) Claude
W3 Stress testing and concurrency verification Claude
W3 Performance tuning if SLOs not met Claude
W4 Runboat CI/CD integration Claude
W4 Final verification and documentation Claude

Known Constraints

Runboat Limitations

  • Cold start: First request may take 3060s (instance startup)
  • Auto-cleanup: Instance removed 5 min after last request
  • No persistent storage: Data lost when instance cleaned up
  • Resource limits: CPU/memory capped per deployment tier

E2E Test Maintenance

  • Brittle selectors: Avoid .o_field_value (auto-generated)
  • Timing issues: Use page.wait_for_*() not time.sleep()
  • Flakiness: Run 3× locally before merging
  • Timeout: Set ≥ 30s for slow JS rendering

References


Next: Set up e2e/ directory and implement core scenarios