Created comprehensive E2E test suite for ITSulu Blog Publisher using Playwright and Runboat. Includes: PHASE3_ROADMAP.md: - Goals for E2E coverage (10-30 scenarios) - Performance benchmark targets (latency, tokens, queries, throughput) - Implementation plan with layer-by-layer breakdown - Success criteria and SLO targets - Runboat integration details for CI/CD e2e/ directory structure: - conftest.py: Runboat polling, auth fixtures, page fixture - requirements.txt: pytest, playwright, requests - test_generation.py: On-demand generation workflows (5 tests) - test_scheduling.py: Schedule slot configuration and execution (6 tests) - test_error_recovery.py: Error handling and email notifications (8 tests) Total: 19 E2E test scenarios covering: - On-demand post generation with auto-publish - Scheduled generation with topic queue - Error recovery and retry mechanism - Email notifications with correct content - Social media copy generation - Concurrent post generation - Progress feedback during API calls Tests use: - Playwright sync API with 30s timeout (Odoo JS rendering) - Runboat polling with 180s timeout (instance cold-start) - Session-scoped auth to avoid repeated 30s logins - Data-test-id selectors where available, fallback to get_by_* - Proper wait_for_load_state() for async operations Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
9.8 KiB
9.8 KiB
Phase 3: Runboat E2E Testing and Performance Benchmarks
Status: In Progress
Start: 2026-05-30
Target: E2E coverage + performance SLOs met
Goals
1. E2E Test Coverage (10–30 scenarios)
Critical user journeys verified via Playwright:
- User generates blog post on-demand
- User schedules daily blog generation
- User views generation logs and retries failed attempts
- User edits social media copy before publication
- User views published post with correct SEO fields
- User receives notification email with correct content
- System recovers gracefully from LLM API errors
- Multiple users generate posts concurrently (collision handling)
2. Performance Benchmarks
Establish baseline metrics for:
- Generation Latency: Time from wizard click to post created
- Target: < 30 seconds (including LLM API call)
- Measure: P50, P95, P99
- Token Efficiency: Tokens used per blog post
- Target: 800–1200 tokens for ~800-word post
- Baseline: Record for cost optimization
- Database Query Count: N+1 detection
- Target: < 50 queries per generation
- Tool: assertQueryCount() on hot paths
- Throughput: Concurrent generations
- Target: 5+ simultaneous posts without degradation
- Stress test: 10 parallel schedule slots
- Memory Usage: Peak RSS during generation
- Target: < 500 MB per Odoo process
3. Load Testing
Simulate production scenarios:
- 100 pending topics in queue
- 3 active schedule slots all triggering within 5 minutes
- 5 concurrent users generating posts
- Template DB priming time baseline
Implementation Plan
Layer 1: Runboat Setup & E2E Infrastructure
# 1. Create e2e/ directory structure
e2e/
├── conftest.py # Session/auth fixtures, Runboat polling
├── test_generation.py # On-demand generation workflow
├── test_scheduling.py # Schedule slot execution
├── test_notifications.py # Email and social copy
├── test_error_recovery.py # API errors and retries
└── requirements.txt # pytest, playwright
# 2. Set up conftest.py with:
# - wait_for_odoo(url) polling
# - auth_state fixture (admin login)
# - page fixture (authenticated Playwright context)
# - BASE_URL from env var or CI
# 3. Create .gitlab-ci.yml runboat stage:
runboat_preview:
stage: preview
script: |
curl -X POST $RUNBOAT_URL/builds \
-H "Authorization: Bearer $RUNBOAT_TOKEN" \
-d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}"
Layer 2: E2E Test Scenarios (10–20 tests)
Generation Workflow (3 tests):
def test_user_generates_blog_post_on_demand(page):
# Navigate to wizard
# Fill topic, select provider, set auto-publish
# Click Generate
# Assert blog.post created with title + body
# Assert email sent to configured recipient
def test_user_saves_post_as_draft_for_review(page):
# Same as above but auto_publish=False
# Assert post is not published
def test_generation_fails_gracefully_with_api_error(page):
# Trigger with invalid API key
# Assert error message displayed
# Assert "Retry" button visible on log
Scheduling Workflow (2 tests):
def test_user_configures_daily_schedule_slot(page):
# Navigate to schedule slots
# Create morning, afternoon, evening slots
# Set LLM provider and model
# Toggle auto-publish per slot
# Save and verify all 3 slots active
def test_user_monitors_generation_logs(page):
# View all generation logs
# Filter by state (success/error)
# Click retry on failed log
# Verify retry increments attempt counter
Email & Social (2 tests):
def test_email_contains_post_title_and_social_copy(page):
# Generate and publish post
# Check generated email in outbox
# Verify subject contains blog name + post title
# Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn)
def test_user_edits_social_copy_before_publishing(page):
# Generate as draft
# Edit social media copy for each platform
# Save and publish
# Verify email uses edited copy
Error Recovery (2 tests):
def test_user_retries_failed_generation(page):
# Trigger generation with bad API key
# Log shows error state
# Fix API key in Settings
# Click Retry on log
# Verify post created successfully
def test_schedule_slot_continues_after_api_error(page):
# Set invalid API key on schedule slot
# Slot executes, fails, logs error
# Fix API key
# Wait for next slot time
# Verify next generation succeeds
Concurrency (1–2 tests):
def test_multiple_users_generate_posts_concurrently(page):
# User1 generates on-demand
# User2 generates on-demand simultaneously
# Both posts created successfully
# No database locks or conflicts
Layer 3: Performance Benchmarks
Latency Profiling:
def test_generation_latency_p50_under_30s(page):
"""Measure time from "Generate Now" click to blog.post created."""
import time
start = time.time()
# ... navigate and generate ...
elapsed = time.time() - start
assert elapsed < 30, f"Generation took {elapsed}s, target <30s"
# Record metric: elapsed_seconds_p50
Query Count Assertion:
def test_generation_uses_fewer_than_50_queries(page):
"""Verify no N+1 query patterns."""
from odoo.tests import TransactionCase
# In the server-side test, not E2E:
with self.assertQueryCount(50):
schedule.run_generation()
Stress Test (not Playwright, server-side):
def test_concurrent_schedule_slots_under_load():
"""3 slots × 5 iterations = 15 posts in rapid succession."""
# Trigger all 3 schedule slots
# Measure: peak memory, query count, token usage
# Assert: all posts created, no failures
Runboat Integration
What is Runboat?
Runboat (by Acsone) provides:
- Auto-deployed preview instances of Odoo per CI commit
- Live URL for E2E testing (no local bootstrapping needed)
- Fresh template DB with addon pre-installed
- 5-minute auto-cleanup after test run
CI/CD Integration
# .gitlab-ci.yml
stages: [lint, test, build, preview, e2e]
# ... existing lint + test stages ...
runboat_preview:
stage: preview
image: curlimages/curl:latest
script:
- |
RUNBOAT_URL="${RUNBOAT_API_URL}/builds"
RESP=$(curl -fsSL -X POST "$RUNBOAT_URL" \
-H "Authorization: Bearer $RUNBOAT_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"repo\": \"$CI_PROJECT_PATH\",
\"sha\": \"$CI_COMMIT_SHA\",
\"target_branch\": \"$CI_MERGE_REQUEST_TARGET_BRANCH_NAME\"
}")
BUILD_URL=$(echo "$RESP" | jq -r '.url')
echo "BUILD_URL=$BUILD_URL" >> build.env
# Post comment to MR
curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \
-H "PRIVATE-TOKEN: $GITLAB_BOT_TOKEN" \
--data "body=🚀 [Preview](${BUILD_URL}/odoo) ready for testing"
artifacts:
reports:
dotenv: build.env
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
e2e_tests:
stage: e2e
image: mcr.microsoft.com/playwright/python:latest
needs:
- runboat_preview
script:
- pip install -r e2e/requirements.txt
- pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure
artifacts:
when: on_failure
paths:
- e2e/traces/
expire_in: 1 week
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
Success Criteria
✅ Phase 3 Complete when:
- 10–20 E2E scenarios passing (Runboat)
- Performance baseline established (latency, tokens, queries)
- Concurrent generation verified (5+ simultaneous posts)
- All E2E tests green on merge requests
- Runboat integration in CI/CD
- Performance metrics documented in README
- No E2E test flakiness (< 2% failure rate)
Performance SLO Targets
| Metric | Target | Rationale |
|---|---|---|
| Generation latency (P50) | < 30 seconds | User experience (wizard response time) |
| Generation latency (P99) | < 60 seconds | Outlier tolerance |
| Tokens per post | 800–1200 | Cost baseline for budget planning |
| Queries per generation | < 50 | N+1 detection and DB load |
| Concurrent posts | 5+ | Peak capacity without degradation |
| Email send latency | < 5 seconds | Notification responsiveness |
| Template DB prime time | < 60 seconds | CI/CD pipeline efficiency |
Implementation Timeline
| Week | Task | Owner |
|---|---|---|
| W1 | Set up e2e/ directory, conftest.py, Runboat polling | Claude |
| W1 | Implement 3–5 core E2E scenarios (generation, scheduling) | Claude |
| W2 | Add error recovery and email scenarios | Claude |
| W2 | Set up performance measurement (latency, queries) | Claude |
| W3 | Stress testing and concurrency verification | Claude |
| W3 | Performance tuning if SLOs not met | Claude |
| W4 | Runboat CI/CD integration | Claude |
| W4 | Final verification and documentation | Claude |
Known Constraints
Runboat Limitations
- Cold start: First request may take 30–60s (instance startup)
- Auto-cleanup: Instance removed 5 min after last request
- No persistent storage: Data lost when instance cleaned up
- Resource limits: CPU/memory capped per deployment tier
E2E Test Maintenance
- Brittle selectors: Avoid
.o_field_value(auto-generated) - Timing issues: Use
page.wait_for_*()nottime.sleep() - Flakiness: Run 3× locally before merging
- Timeout: Set ≥ 30s for slow JS rendering
References
Next: Set up e2e/ directory and implement core scenarios