itsulu-blog-publisher/PHASE3_ROADMAP.md
Nicholas Riegel acfa1d93d7 feat: establish Phase 3 E2E testing infrastructure with Playwright
Created comprehensive E2E test suite for ITSulu Blog Publisher using Playwright
and Runboat. Includes:

PHASE3_ROADMAP.md:
- Goals for E2E coverage (10-30 scenarios)
- Performance benchmark targets (latency, tokens, queries, throughput)
- Implementation plan with layer-by-layer breakdown
- Success criteria and SLO targets
- Runboat integration details for CI/CD

e2e/ directory structure:
- conftest.py: Runboat polling, auth fixtures, page fixture
- requirements.txt: pytest, playwright, requests
- test_generation.py: On-demand generation workflows (5 tests)
- test_scheduling.py: Schedule slot configuration and execution (6 tests)
- test_error_recovery.py: Error handling and email notifications (8 tests)

Total: 19 E2E test scenarios covering:
- On-demand post generation with auto-publish
- Scheduled generation with topic queue
- Error recovery and retry mechanism
- Email notifications with correct content
- Social media copy generation
- Concurrent post generation
- Progress feedback during API calls

Tests use:
- Playwright sync API with 30s timeout (Odoo JS rendering)
- Runboat polling with 180s timeout (instance cold-start)
- Session-scoped auth to avoid repeated 30s logins
- Data-test-id selectors where available, fallback to get_by_*
- Proper wait_for_load_state() for async operations

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-30 00:50:43 -04:00

310 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 3: Runboat E2E Testing and Performance Benchmarks
**Status**: In Progress
**Start**: 2026-05-30
**Target**: E2E coverage + performance SLOs met
## Goals
### 1. E2E Test Coverage (1030 scenarios)
Critical user journeys verified via Playwright:
- [ ] User generates blog post on-demand
- [ ] User schedules daily blog generation
- [ ] User views generation logs and retries failed attempts
- [ ] User edits social media copy before publication
- [ ] User views published post with correct SEO fields
- [ ] User receives notification email with correct content
- [ ] System recovers gracefully from LLM API errors
- [ ] Multiple users generate posts concurrently (collision handling)
### 2. Performance Benchmarks
Establish baseline metrics for:
- **Generation Latency**: Time from wizard click to post created
- Target: < 30 seconds (including LLM API call)
- Measure: P50, P95, P99
- **Token Efficiency**: Tokens used per blog post
- Target: 8001200 tokens for ~800-word post
- Baseline: Record for cost optimization
- **Database Query Count**: N+1 detection
- Target: < 50 queries per generation
- Tool: assertQueryCount() on hot paths
- **Throughput**: Concurrent generations
- Target: 5+ simultaneous posts without degradation
- Stress test: 10 parallel schedule slots
- **Memory Usage**: Peak RSS during generation
- Target: < 500 MB per Odoo process
### 3. Load Testing
Simulate production scenarios:
- [ ] 100 pending topics in queue
- [ ] 3 active schedule slots all triggering within 5 minutes
- [ ] 5 concurrent users generating posts
- [ ] Template DB priming time baseline
## Implementation Plan
### Layer 1: Runboat Setup & E2E Infrastructure
```bash
# 1. Create e2e/ directory structure
e2e/
├── conftest.py # Session/auth fixtures, Runboat polling
├── test_generation.py # On-demand generation workflow
├── test_scheduling.py # Schedule slot execution
├── test_notifications.py # Email and social copy
├── test_error_recovery.py # API errors and retries
└── requirements.txt # pytest, playwright
# 2. Set up conftest.py with:
# - wait_for_odoo(url) polling
# - auth_state fixture (admin login)
# - page fixture (authenticated Playwright context)
# - BASE_URL from env var or CI
# 3. Create .gitlab-ci.yml runboat stage:
runboat_preview:
stage: preview
script: |
curl -X POST $RUNBOAT_URL/builds \
-H "Authorization: Bearer $RUNBOAT_TOKEN" \
-d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}"
```
### Layer 2: E2E Test Scenarios (1020 tests)
**Generation Workflow** (3 tests):
```python
def test_user_generates_blog_post_on_demand(page):
# Navigate to wizard
# Fill topic, select provider, set auto-publish
# Click Generate
# Assert blog.post created with title + body
# Assert email sent to configured recipient
def test_user_saves_post_as_draft_for_review(page):
# Same as above but auto_publish=False
# Assert post is not published
def test_generation_fails_gracefully_with_api_error(page):
# Trigger with invalid API key
# Assert error message displayed
# Assert "Retry" button visible on log
```
**Scheduling Workflow** (2 tests):
```python
def test_user_configures_daily_schedule_slot(page):
# Navigate to schedule slots
# Create morning, afternoon, evening slots
# Set LLM provider and model
# Toggle auto-publish per slot
# Save and verify all 3 slots active
def test_user_monitors_generation_logs(page):
# View all generation logs
# Filter by state (success/error)
# Click retry on failed log
# Verify retry increments attempt counter
```
**Email & Social** (2 tests):
```python
def test_email_contains_post_title_and_social_copy(page):
# Generate and publish post
# Check generated email in outbox
# Verify subject contains blog name + post title
# Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn)
def test_user_edits_social_copy_before_publishing(page):
# Generate as draft
# Edit social media copy for each platform
# Save and publish
# Verify email uses edited copy
```
**Error Recovery** (2 tests):
```python
def test_user_retries_failed_generation(page):
# Trigger generation with bad API key
# Log shows error state
# Fix API key in Settings
# Click Retry on log
# Verify post created successfully
def test_schedule_slot_continues_after_api_error(page):
# Set invalid API key on schedule slot
# Slot executes, fails, logs error
# Fix API key
# Wait for next slot time
# Verify next generation succeeds
```
**Concurrency** (12 tests):
```python
def test_multiple_users_generate_posts_concurrently(page):
# User1 generates on-demand
# User2 generates on-demand simultaneously
# Both posts created successfully
# No database locks or conflicts
```
### Layer 3: Performance Benchmarks
**Latency Profiling**:
```python
def test_generation_latency_p50_under_30s(page):
"""Measure time from "Generate Now" click to blog.post created."""
import time
start = time.time()
# ... navigate and generate ...
elapsed = time.time() - start
assert elapsed < 30, f"Generation took {elapsed}s, target <30s"
# Record metric: elapsed_seconds_p50
```
**Query Count Assertion**:
```python
def test_generation_uses_fewer_than_50_queries(page):
"""Verify no N+1 query patterns."""
from odoo.tests import TransactionCase
# In the server-side test, not E2E:
with self.assertQueryCount(50):
schedule.run_generation()
```
**Stress Test** (not Playwright, server-side):
```python
def test_concurrent_schedule_slots_under_load():
"""3 slots × 5 iterations = 15 posts in rapid succession."""
# Trigger all 3 schedule slots
# Measure: peak memory, query count, token usage
# Assert: all posts created, no failures
```
## Runboat Integration
### What is Runboat?
Runboat (by Acsone) provides:
- **Auto-deployed preview instances** of Odoo per CI commit
- **Live URL** for E2E testing (no local bootstrapping needed)
- **Fresh template DB** with addon pre-installed
- **5-minute auto-cleanup** after test run
### CI/CD Integration
```yaml
# .gitlab-ci.yml
stages: [lint, test, build, preview, e2e]
# ... existing lint + test stages ...
runboat_preview:
stage: preview
image: curlimages/curl:latest
script:
- |
RUNBOAT_URL="${RUNBOAT_API_URL}/builds"
RESP=$(curl -fsSL -X POST "$RUNBOAT_URL" \
-H "Authorization: Bearer $RUNBOAT_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"repo\": \"$CI_PROJECT_PATH\",
\"sha\": \"$CI_COMMIT_SHA\",
\"target_branch\": \"$CI_MERGE_REQUEST_TARGET_BRANCH_NAME\"
}")
BUILD_URL=$(echo "$RESP" | jq -r '.url')
echo "BUILD_URL=$BUILD_URL" >> build.env
# Post comment to MR
curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \
-H "PRIVATE-TOKEN: $GITLAB_BOT_TOKEN" \
--data "body=🚀 [Preview](${BUILD_URL}/odoo) ready for testing"
artifacts:
reports:
dotenv: build.env
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
e2e_tests:
stage: e2e
image: mcr.microsoft.com/playwright/python:latest
needs:
- runboat_preview
script:
- pip install -r e2e/requirements.txt
- pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure
artifacts:
when: on_failure
paths:
- e2e/traces/
expire_in: 1 week
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
```
## Success Criteria
**Phase 3 Complete when:**
- [ ] 1020 E2E scenarios passing (Runboat)
- [ ] Performance baseline established (latency, tokens, queries)
- [ ] Concurrent generation verified (5+ simultaneous posts)
- [ ] All E2E tests green on merge requests
- [ ] Runboat integration in CI/CD
- [ ] Performance metrics documented in README
- [ ] No E2E test flakiness (< 2% failure rate)
## Performance SLO Targets
| Metric | Target | Rationale |
|---|---|---|
| Generation latency (P50) | < 30 seconds | User experience (wizard response time) |
| Generation latency (P99) | < 60 seconds | Outlier tolerance |
| Tokens per post | 8001200 | Cost baseline for budget planning |
| Queries per generation | < 50 | N+1 detection and DB load |
| Concurrent posts | 5+ | Peak capacity without degradation |
| Email send latency | < 5 seconds | Notification responsiveness |
| Template DB prime time | < 60 seconds | CI/CD pipeline efficiency |
## Implementation Timeline
| Week | Task | Owner |
|---|---|---|
| W1 | Set up e2e/ directory, conftest.py, Runboat polling | Claude |
| W1 | Implement 35 core E2E scenarios (generation, scheduling) | Claude |
| W2 | Add error recovery and email scenarios | Claude |
| W2 | Set up performance measurement (latency, queries) | Claude |
| W3 | Stress testing and concurrency verification | Claude |
| W3 | Performance tuning if SLOs not met | Claude |
| W4 | Runboat CI/CD integration | Claude |
| W4 | Final verification and documentation | Claude |
## Known Constraints
### Runboat Limitations
- **Cold start**: First request may take 3060s (instance startup)
- **Auto-cleanup**: Instance removed 5 min after last request
- **No persistent storage**: Data lost when instance cleaned up
- **Resource limits**: CPU/memory capped per deployment tier
### E2E Test Maintenance
- **Brittle selectors**: Avoid `.o_field_value` (auto-generated)
- **Timing issues**: Use `page.wait_for_*()` not `time.sleep()`
- **Flakiness**: Run 3× locally before merging
- **Timeout**: Set 30s for slow JS rendering
## References
- [Runboat Documentation](https://docs.acsone.eu/runboat/)
- [Playwright Python API](https://playwright.dev/python/)
- [Odoo E2E Best Practices](https://github.com/OCA/server-tools/tree/17.0#e2e-testing)
---
**Next**: Set up e2e/ directory and implement core scenarios