Updated .gitlab-ci.yml with complete Phase 3 pipeline stages: New Stages Added: - preview: Runboat API call to create ephemeral preview instance - e2e: Playwright E2E tests against Runboat preview - performance: Server-side performance benchmarks (latency, queries, tokens) Pipeline Changes: - runboat_preview job: Requests preview build, extracts URL, posts MR comment - e2e_tests job: Runs 19 Playwright scenarios against preview URL - performance_tests job: Runs 7 performance benchmark tests locally - All jobs include artifacts (HTML reports, traces) for debugging Job Dependencies: - e2e_tests needs runboat_preview (waits for preview URL) - performance_tests runs in parallel with build stage - All new jobs only on merge_requests (not main/daily) New Required CI/CD Variables: - RUNBOAT_API_URL: Runboat API endpoint (secret) - RUNBOAT_TOKEN: Bearer token for Runboat (secret) - GITLAB_BOT_TOKEN: GitLab bot token for MR comments (secret) Updated PHASE3_ROADMAP.md with: - Runboat setup instructions - CI/CD variable requirements and how to obtain - Complete YAML snippets (already in .gitlab-ci.yml) - Pipeline flow diagram - Estimated total pipeline time: ~35 minutes Non-blocking failures: - runboat_preview: allow_failure=true (Runboat might be unavailable) - e2e_tests: allow_failure=true (E2E informational, doesn't block merge) - performance_tests: allow_failure=false (must pass) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
346 lines
11 KiB
Markdown
346 lines
11 KiB
Markdown
# Phase 3: Runboat E2E Testing and Performance Benchmarks
|
||
|
||
**Status**: In Progress
|
||
**Start**: 2026-05-30
|
||
**Target**: E2E coverage + performance SLOs met
|
||
|
||
## Goals
|
||
|
||
### 1. E2E Test Coverage (10–30 scenarios)
|
||
|
||
Critical user journeys verified via Playwright:
|
||
- [ ] User generates blog post on-demand
|
||
- [ ] User schedules daily blog generation
|
||
- [ ] User views generation logs and retries failed attempts
|
||
- [ ] User edits social media copy before publication
|
||
- [ ] User views published post with correct SEO fields
|
||
- [ ] User receives notification email with correct content
|
||
- [ ] System recovers gracefully from LLM API errors
|
||
- [ ] Multiple users generate posts concurrently (collision handling)
|
||
|
||
### 2. Performance Benchmarks
|
||
|
||
Establish baseline metrics for:
|
||
- **Generation Latency**: Time from wizard click to post created
|
||
- Target: < 30 seconds (including LLM API call)
|
||
- Measure: P50, P95, P99
|
||
- **Token Efficiency**: Tokens used per blog post
|
||
- Target: 800–1200 tokens for ~800-word post
|
||
- Baseline: Record for cost optimization
|
||
- **Database Query Count**: N+1 detection
|
||
- Target: < 50 queries per generation
|
||
- Tool: assertQueryCount() on hot paths
|
||
- **Throughput**: Concurrent generations
|
||
- Target: 5+ simultaneous posts without degradation
|
||
- Stress test: 10 parallel schedule slots
|
||
- **Memory Usage**: Peak RSS during generation
|
||
- Target: < 500 MB per Odoo process
|
||
|
||
### 3. Load Testing
|
||
|
||
Simulate production scenarios:
|
||
- [ ] 100 pending topics in queue
|
||
- [ ] 3 active schedule slots all triggering within 5 minutes
|
||
- [ ] 5 concurrent users generating posts
|
||
- [ ] Template DB priming time baseline
|
||
|
||
## Implementation Plan
|
||
|
||
### Layer 1: Runboat Setup & E2E Infrastructure
|
||
|
||
```bash
|
||
# 1. Create e2e/ directory structure
|
||
e2e/
|
||
├── conftest.py # Session/auth fixtures, Runboat polling
|
||
├── test_generation.py # On-demand generation workflow
|
||
├── test_scheduling.py # Schedule slot execution
|
||
├── test_notifications.py # Email and social copy
|
||
├── test_error_recovery.py # API errors and retries
|
||
└── requirements.txt # pytest, playwright
|
||
|
||
# 2. Set up conftest.py with:
|
||
# - wait_for_odoo(url) polling
|
||
# - auth_state fixture (admin login)
|
||
# - page fixture (authenticated Playwright context)
|
||
# - BASE_URL from env var or CI
|
||
|
||
# 3. Create .gitlab-ci.yml runboat stage:
|
||
runboat_preview:
|
||
stage: preview
|
||
script: |
|
||
curl -X POST $RUNBOAT_URL/builds \
|
||
-H "Authorization: Bearer $RUNBOAT_TOKEN" \
|
||
-d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}"
|
||
```
|
||
|
||
### Layer 2: E2E Test Scenarios (10–20 tests)
|
||
|
||
**Generation Workflow** (3 tests):
|
||
```python
|
||
def test_user_generates_blog_post_on_demand(page):
|
||
# Navigate to wizard
|
||
# Fill topic, select provider, set auto-publish
|
||
# Click Generate
|
||
# Assert blog.post created with title + body
|
||
# Assert email sent to configured recipient
|
||
|
||
def test_user_saves_post_as_draft_for_review(page):
|
||
# Same as above but auto_publish=False
|
||
# Assert post is not published
|
||
|
||
def test_generation_fails_gracefully_with_api_error(page):
|
||
# Trigger with invalid API key
|
||
# Assert error message displayed
|
||
# Assert "Retry" button visible on log
|
||
```
|
||
|
||
**Scheduling Workflow** (2 tests):
|
||
```python
|
||
def test_user_configures_daily_schedule_slot(page):
|
||
# Navigate to schedule slots
|
||
# Create morning, afternoon, evening slots
|
||
# Set LLM provider and model
|
||
# Toggle auto-publish per slot
|
||
# Save and verify all 3 slots active
|
||
|
||
def test_user_monitors_generation_logs(page):
|
||
# View all generation logs
|
||
# Filter by state (success/error)
|
||
# Click retry on failed log
|
||
# Verify retry increments attempt counter
|
||
```
|
||
|
||
**Email & Social** (2 tests):
|
||
```python
|
||
def test_email_contains_post_title_and_social_copy(page):
|
||
# Generate and publish post
|
||
# Check generated email in outbox
|
||
# Verify subject contains blog name + post title
|
||
# Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn)
|
||
|
||
def test_user_edits_social_copy_before_publishing(page):
|
||
# Generate as draft
|
||
# Edit social media copy for each platform
|
||
# Save and publish
|
||
# Verify email uses edited copy
|
||
```
|
||
|
||
**Error Recovery** (2 tests):
|
||
```python
|
||
def test_user_retries_failed_generation(page):
|
||
# Trigger generation with bad API key
|
||
# Log shows error state
|
||
# Fix API key in Settings
|
||
# Click Retry on log
|
||
# Verify post created successfully
|
||
|
||
def test_schedule_slot_continues_after_api_error(page):
|
||
# Set invalid API key on schedule slot
|
||
# Slot executes, fails, logs error
|
||
# Fix API key
|
||
# Wait for next slot time
|
||
# Verify next generation succeeds
|
||
```
|
||
|
||
**Concurrency** (1–2 tests):
|
||
```python
|
||
def test_multiple_users_generate_posts_concurrently(page):
|
||
# User1 generates on-demand
|
||
# User2 generates on-demand simultaneously
|
||
# Both posts created successfully
|
||
# No database locks or conflicts
|
||
```
|
||
|
||
### Layer 3: Performance Benchmarks
|
||
|
||
**Latency Profiling**:
|
||
```python
|
||
def test_generation_latency_p50_under_30s(page):
|
||
"""Measure time from "Generate Now" click to blog.post created."""
|
||
import time
|
||
start = time.time()
|
||
# ... navigate and generate ...
|
||
elapsed = time.time() - start
|
||
assert elapsed < 30, f"Generation took {elapsed}s, target <30s"
|
||
# Record metric: elapsed_seconds_p50
|
||
```
|
||
|
||
**Query Count Assertion**:
|
||
```python
|
||
def test_generation_uses_fewer_than_50_queries(page):
|
||
"""Verify no N+1 query patterns."""
|
||
from odoo.tests import TransactionCase
|
||
# In the server-side test, not E2E:
|
||
with self.assertQueryCount(50):
|
||
schedule.run_generation()
|
||
```
|
||
|
||
**Stress Test** (not Playwright, server-side):
|
||
```python
|
||
def test_concurrent_schedule_slots_under_load():
|
||
"""3 slots × 5 iterations = 15 posts in rapid succession."""
|
||
# Trigger all 3 schedule slots
|
||
# Measure: peak memory, query count, token usage
|
||
# Assert: all posts created, no failures
|
||
```
|
||
|
||
## Runboat Integration
|
||
|
||
### What is Runboat?
|
||
|
||
Runboat (by Acsone) provides:
|
||
- **Auto-deployed preview instances** of Odoo per CI commit
|
||
- **Live URL** for E2E testing (no local bootstrapping needed)
|
||
- **Fresh template DB** with addon pre-installed
|
||
- **5-minute auto-cleanup** after test run
|
||
|
||
### CI/CD Variables Required
|
||
|
||
Add these to **GitLab Project Settings → CI/CD Variables**:
|
||
|
||
| Variable | Type | Purpose | Example |
|
||
|----------|------|---------|---------|
|
||
| `RUNBOAT_API_URL` | Secret | Runboat API endpoint | `https://api.runboat.dev` |
|
||
| `RUNBOAT_TOKEN` | Secret | Bearer token for Runboat API | `rbk_xxx...` |
|
||
| `GITLAB_BOT_TOKEN` | Secret | Personal/bot token for MR comments | `glpat_xxx...` |
|
||
|
||
**How to obtain**:
|
||
1. **RUNBOAT_API_URL & RUNBOAT_TOKEN**: Request from Acsone/infrastructure team
|
||
2. **GITLAB_BOT_TOKEN**: Create via **GitLab → Settings → Access Tokens**
|
||
- Scopes: `api`, `read_api`, `read_repository`
|
||
- Save as CI/CD variable (marked as Protected, Masked)
|
||
|
||
### CI/CD Integration (Already Added)
|
||
|
||
`.gitlab-ci.yml` now includes:
|
||
|
||
**Stage: preview**
|
||
```yaml
|
||
runboat_preview:
|
||
stage: preview
|
||
image: curlimages/curl:latest
|
||
script:
|
||
# Request preview build from Runboat
|
||
- RESP=$(curl -fsSL -X POST "${RUNBOAT_API_URL}/builds" \
|
||
-H "Authorization: Bearer ${RUNBOAT_TOKEN}" \
|
||
-d "{\"repo\":\"${CI_PROJECT_PATH}\",\"sha\":\"${CI_COMMIT_SHA}\"}")
|
||
- BUILD_URL=$(echo "$RESP" | jq -r '.url')
|
||
- echo "BUILD_URL=$BUILD_URL" >> build.env
|
||
# Post comment to MR
|
||
- curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \
|
||
-H "PRIVATE-TOKEN: ${GITLAB_BOT_TOKEN}" \
|
||
-d "body=🚀 [Preview](${BUILD_URL}/odoo) ready"
|
||
artifacts:
|
||
reports:
|
||
dotenv: build.env
|
||
```
|
||
|
||
**Stage: e2e**
|
||
```yaml
|
||
e2e_tests:
|
||
stage: e2e
|
||
image: mcr.microsoft.com/playwright/python:latest
|
||
needs: [runboat_preview]
|
||
script:
|
||
- pip install -r e2e/requirements.txt
|
||
- pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure
|
||
artifacts:
|
||
when: always
|
||
paths:
|
||
- e2e/traces/
|
||
expire_in: 1 week
|
||
```
|
||
|
||
**Stage: test (performance)**
|
||
```yaml
|
||
performance_tests:
|
||
stage: test
|
||
image: $ODOO_IMAGE
|
||
script:
|
||
- pytest addons/itsulu_blog_publisher/tests/test_performance.py \
|
||
-m performance --odoo-database=$POSTGRES_DB
|
||
```
|
||
|
||
### Pipeline Flow
|
||
|
||
```
|
||
Merge Request
|
||
↓
|
||
[lint] black, pylint-odoo (2 min)
|
||
↓
|
||
[test] unit + BDD + performance (10 min)
|
||
↓
|
||
[build] Docker image → registry (3 min)
|
||
↓
|
||
[preview] Runboat deploy (5 min)
|
||
↓
|
||
[e2e] Playwright against preview (15 min)
|
||
↓
|
||
Results → MR comment with preview URL
|
||
```
|
||
|
||
**Total pipeline time**: ~35 minutes
|
||
- Unit/BDD/Performance tests run in parallel with Docker build
|
||
- E2E tests run after preview is ready
|
||
|
||
## Success Criteria
|
||
|
||
✅ **Phase 3 Complete when:**
|
||
- [ ] 10–20 E2E scenarios passing (Runboat)
|
||
- [ ] Performance baseline established (latency, tokens, queries)
|
||
- [ ] Concurrent generation verified (5+ simultaneous posts)
|
||
- [ ] All E2E tests green on merge requests
|
||
- [ ] Runboat integration in CI/CD
|
||
- [ ] Performance metrics documented in README
|
||
- [ ] No E2E test flakiness (< 2% failure rate)
|
||
|
||
## Performance SLO Targets
|
||
|
||
| Metric | Target | Rationale |
|
||
|---|---|---|
|
||
| Generation latency (P50) | < 30 seconds | User experience (wizard response time) |
|
||
| Generation latency (P99) | < 60 seconds | Outlier tolerance |
|
||
| Tokens per post | 800–1200 | Cost baseline for budget planning |
|
||
| Queries per generation | < 50 | N+1 detection and DB load |
|
||
| Concurrent posts | 5+ | Peak capacity without degradation |
|
||
| Email send latency | < 5 seconds | Notification responsiveness |
|
||
| Template DB prime time | < 60 seconds | CI/CD pipeline efficiency |
|
||
|
||
## Implementation Timeline
|
||
|
||
| Week | Task | Owner |
|
||
|---|---|---|
|
||
| W1 | Set up e2e/ directory, conftest.py, Runboat polling | Claude |
|
||
| W1 | Implement 3–5 core E2E scenarios (generation, scheduling) | Claude |
|
||
| W2 | Add error recovery and email scenarios | Claude |
|
||
| W2 | Set up performance measurement (latency, queries) | Claude |
|
||
| W3 | Stress testing and concurrency verification | Claude |
|
||
| W3 | Performance tuning if SLOs not met | Claude |
|
||
| W4 | Runboat CI/CD integration | Claude |
|
||
| W4 | Final verification and documentation | Claude |
|
||
|
||
## Known Constraints
|
||
|
||
### Runboat Limitations
|
||
|
||
- **Cold start**: First request may take 30–60s (instance startup)
|
||
- **Auto-cleanup**: Instance removed 5 min after last request
|
||
- **No persistent storage**: Data lost when instance cleaned up
|
||
- **Resource limits**: CPU/memory capped per deployment tier
|
||
|
||
### E2E Test Maintenance
|
||
|
||
- **Brittle selectors**: Avoid `.o_field_value` (auto-generated)
|
||
- **Timing issues**: Use `page.wait_for_*()` not `time.sleep()`
|
||
- **Flakiness**: Run 3× locally before merging
|
||
- **Timeout**: Set ≥ 30s for slow JS rendering
|
||
|
||
## References
|
||
|
||
- [Runboat Documentation](https://docs.acsone.eu/runboat/)
|
||
- [Playwright Python API](https://playwright.dev/python/)
|
||
- [Odoo E2E Best Practices](https://github.com/OCA/server-tools/tree/17.0#e2e-testing)
|
||
|
||
---
|
||
|
||
**Next**: Set up e2e/ directory and implement core scenarios
|