# Phase 3: Runboat E2E Testing and Performance Benchmarks **Status**: In Progress **Start**: 2026-05-30 **Target**: E2E coverage + performance SLOs met ## Goals ### 1. E2E Test Coverage (10–30 scenarios) Critical user journeys verified via Playwright: - [ ] User generates blog post on-demand - [ ] User schedules daily blog generation - [ ] User views generation logs and retries failed attempts - [ ] User edits social media copy before publication - [ ] User views published post with correct SEO fields - [ ] User receives notification email with correct content - [ ] System recovers gracefully from LLM API errors - [ ] Multiple users generate posts concurrently (collision handling) ### 2. Performance Benchmarks Establish baseline metrics for: - **Generation Latency**: Time from wizard click to post created - Target: < 30 seconds (including LLM API call) - Measure: P50, P95, P99 - **Token Efficiency**: Tokens used per blog post - Target: 800–1200 tokens for ~800-word post - Baseline: Record for cost optimization - **Database Query Count**: N+1 detection - Target: < 50 queries per generation - Tool: assertQueryCount() on hot paths - **Throughput**: Concurrent generations - Target: 5+ simultaneous posts without degradation - Stress test: 10 parallel schedule slots - **Memory Usage**: Peak RSS during generation - Target: < 500 MB per Odoo process ### 3. Load Testing Simulate production scenarios: - [ ] 100 pending topics in queue - [ ] 3 active schedule slots all triggering within 5 minutes - [ ] 5 concurrent users generating posts - [ ] Template DB priming time baseline ## Implementation Plan ### Layer 1: Runboat Setup & E2E Infrastructure ```bash # 1. Create e2e/ directory structure e2e/ ├── conftest.py # Session/auth fixtures, Runboat polling ├── test_generation.py # On-demand generation workflow ├── test_scheduling.py # Schedule slot execution ├── test_notifications.py # Email and social copy ├── test_error_recovery.py # API errors and retries └── requirements.txt # pytest, playwright # 2. Set up conftest.py with: # - wait_for_odoo(url) polling # - auth_state fixture (admin login) # - page fixture (authenticated Playwright context) # - BASE_URL from env var or CI # 3. Create .gitlab-ci.yml runboat stage: runboat_preview: stage: preview script: | curl -X POST $RUNBOAT_URL/builds \ -H "Authorization: Bearer $RUNBOAT_TOKEN" \ -d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}" ``` ### Layer 2: E2E Test Scenarios (10–20 tests) **Generation Workflow** (3 tests): ```python def test_user_generates_blog_post_on_demand(page): # Navigate to wizard # Fill topic, select provider, set auto-publish # Click Generate # Assert blog.post created with title + body # Assert email sent to configured recipient def test_user_saves_post_as_draft_for_review(page): # Same as above but auto_publish=False # Assert post is not published def test_generation_fails_gracefully_with_api_error(page): # Trigger with invalid API key # Assert error message displayed # Assert "Retry" button visible on log ``` **Scheduling Workflow** (2 tests): ```python def test_user_configures_daily_schedule_slot(page): # Navigate to schedule slots # Create morning, afternoon, evening slots # Set LLM provider and model # Toggle auto-publish per slot # Save and verify all 3 slots active def test_user_monitors_generation_logs(page): # View all generation logs # Filter by state (success/error) # Click retry on failed log # Verify retry increments attempt counter ``` **Email & Social** (2 tests): ```python def test_email_contains_post_title_and_social_copy(page): # Generate and publish post # Check generated email in outbox # Verify subject contains blog name + post title # Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn) def test_user_edits_social_copy_before_publishing(page): # Generate as draft # Edit social media copy for each platform # Save and publish # Verify email uses edited copy ``` **Error Recovery** (2 tests): ```python def test_user_retries_failed_generation(page): # Trigger generation with bad API key # Log shows error state # Fix API key in Settings # Click Retry on log # Verify post created successfully def test_schedule_slot_continues_after_api_error(page): # Set invalid API key on schedule slot # Slot executes, fails, logs error # Fix API key # Wait for next slot time # Verify next generation succeeds ``` **Concurrency** (1–2 tests): ```python def test_multiple_users_generate_posts_concurrently(page): # User1 generates on-demand # User2 generates on-demand simultaneously # Both posts created successfully # No database locks or conflicts ``` ### Layer 3: Performance Benchmarks **Latency Profiling**: ```python def test_generation_latency_p50_under_30s(page): """Measure time from "Generate Now" click to blog.post created.""" import time start = time.time() # ... navigate and generate ... elapsed = time.time() - start assert elapsed < 30, f"Generation took {elapsed}s, target <30s" # Record metric: elapsed_seconds_p50 ``` **Query Count Assertion**: ```python def test_generation_uses_fewer_than_50_queries(page): """Verify no N+1 query patterns.""" from odoo.tests import TransactionCase # In the server-side test, not E2E: with self.assertQueryCount(50): schedule.run_generation() ``` **Stress Test** (not Playwright, server-side): ```python def test_concurrent_schedule_slots_under_load(): """3 slots × 5 iterations = 15 posts in rapid succession.""" # Trigger all 3 schedule slots # Measure: peak memory, query count, token usage # Assert: all posts created, no failures ``` ## Runboat Integration ### What is Runboat? Runboat (by Acsone) provides: - **Auto-deployed preview instances** of Odoo per CI commit - **Live URL** for E2E testing (no local bootstrapping needed) - **Fresh template DB** with addon pre-installed - **5-minute auto-cleanup** after test run ### CI/CD Integration ```yaml # .gitlab-ci.yml stages: [lint, test, build, preview, e2e] # ... existing lint + test stages ... runboat_preview: stage: preview image: curlimages/curl:latest script: - | RUNBOAT_URL="${RUNBOAT_API_URL}/builds" RESP=$(curl -fsSL -X POST "$RUNBOAT_URL" \ -H "Authorization: Bearer $RUNBOAT_TOKEN" \ -H "Content-Type: application/json" \ -d "{ \"repo\": \"$CI_PROJECT_PATH\", \"sha\": \"$CI_COMMIT_SHA\", \"target_branch\": \"$CI_MERGE_REQUEST_TARGET_BRANCH_NAME\" }") BUILD_URL=$(echo "$RESP" | jq -r '.url') echo "BUILD_URL=$BUILD_URL" >> build.env # Post comment to MR curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \ -H "PRIVATE-TOKEN: $GITLAB_BOT_TOKEN" \ --data "body=🚀 [Preview](${BUILD_URL}/odoo) ready for testing" artifacts: reports: dotenv: build.env rules: - if: '$CI_PIPELINE_SOURCE == "merge_request_event"' e2e_tests: stage: e2e image: mcr.microsoft.com/playwright/python:latest needs: - runboat_preview script: - pip install -r e2e/requirements.txt - pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure artifacts: when: on_failure paths: - e2e/traces/ expire_in: 1 week rules: - if: '$CI_PIPELINE_SOURCE == "merge_request_event"' ``` ## Success Criteria ✅ **Phase 3 Complete when:** - [ ] 10–20 E2E scenarios passing (Runboat) - [ ] Performance baseline established (latency, tokens, queries) - [ ] Concurrent generation verified (5+ simultaneous posts) - [ ] All E2E tests green on merge requests - [ ] Runboat integration in CI/CD - [ ] Performance metrics documented in README - [ ] No E2E test flakiness (< 2% failure rate) ## Performance SLO Targets | Metric | Target | Rationale | |---|---|---| | Generation latency (P50) | < 30 seconds | User experience (wizard response time) | | Generation latency (P99) | < 60 seconds | Outlier tolerance | | Tokens per post | 800–1200 | Cost baseline for budget planning | | Queries per generation | < 50 | N+1 detection and DB load | | Concurrent posts | 5+ | Peak capacity without degradation | | Email send latency | < 5 seconds | Notification responsiveness | | Template DB prime time | < 60 seconds | CI/CD pipeline efficiency | ## Implementation Timeline | Week | Task | Owner | |---|---|---| | W1 | Set up e2e/ directory, conftest.py, Runboat polling | Claude | | W1 | Implement 3–5 core E2E scenarios (generation, scheduling) | Claude | | W2 | Add error recovery and email scenarios | Claude | | W2 | Set up performance measurement (latency, queries) | Claude | | W3 | Stress testing and concurrency verification | Claude | | W3 | Performance tuning if SLOs not met | Claude | | W4 | Runboat CI/CD integration | Claude | | W4 | Final verification and documentation | Claude | ## Known Constraints ### Runboat Limitations - **Cold start**: First request may take 30–60s (instance startup) - **Auto-cleanup**: Instance removed 5 min after last request - **No persistent storage**: Data lost when instance cleaned up - **Resource limits**: CPU/memory capped per deployment tier ### E2E Test Maintenance - **Brittle selectors**: Avoid `.o_field_value` (auto-generated) - **Timing issues**: Use `page.wait_for_*()` not `time.sleep()` - **Flakiness**: Run 3× locally before merging - **Timeout**: Set ≥ 30s for slow JS rendering ## References - [Runboat Documentation](https://docs.acsone.eu/runboat/) - [Playwright Python API](https://playwright.dev/python/) - [Odoo E2E Best Practices](https://github.com/OCA/server-tools/tree/17.0#e2e-testing) --- **Next**: Set up e2e/ directory and implement core scenarios