# Phase 3: Runboat E2E Testing and Performance Benchmarks

**Status**: In Progress  
**Start**: 2026-05-30  
**Target**: E2E coverage + performance SLOs met

## Goals

### 1. E2E Test Coverage (10–30 scenarios)

Critical user journeys verified via Playwright:
- [ ] User generates blog post on-demand
- [ ] User schedules daily blog generation
- [ ] User views generation logs and retries failed attempts
- [ ] User edits social media copy before publication
- [ ] User views published post with correct SEO fields
- [ ] User receives notification email with correct content
- [ ] System recovers gracefully from LLM API errors
- [ ] Multiple users generate posts concurrently (collision handling)

### 2. Performance Benchmarks

Establish baseline metrics for:
- **Generation Latency**: Time from wizard click to post created
  - Target: < 30 seconds (including LLM API call)
  - Measure: P50, P95, P99
- **Token Efficiency**: Tokens used per blog post
  - Target: 800–1200 tokens for ~800-word post
  - Baseline: Record for cost optimization
- **Database Query Count**: N+1 detection
  - Target: < 50 queries per generation
  - Tool: assertQueryCount() on hot paths
- **Throughput**: Concurrent generations
  - Target: 5+ simultaneous posts without degradation
  - Stress test: 10 parallel schedule slots
- **Memory Usage**: Peak RSS during generation
  - Target: < 500 MB per Odoo process

### 3. Load Testing

Simulate production scenarios:
- [ ] 100 pending topics in queue
- [ ] 3 active schedule slots all triggering within 5 minutes
- [ ] 5 concurrent users generating posts
- [ ] Template DB priming time baseline

## Implementation Plan

### Layer 1: Runboat Setup & E2E Infrastructure

```bash
# 1. Create e2e/ directory structure
e2e/
├── conftest.py              # Session/auth fixtures, Runboat polling
├── test_generation.py       # On-demand generation workflow
├── test_scheduling.py       # Schedule slot execution
├── test_notifications.py    # Email and social copy
├── test_error_recovery.py   # API errors and retries
└── requirements.txt         # pytest, playwright

# 2. Set up conftest.py with:
# - wait_for_odoo(url) polling
# - auth_state fixture (admin login)
# - page fixture (authenticated Playwright context)
# - BASE_URL from env var or CI

# 3. Create .gitlab-ci.yml runboat stage:
runboat_preview:
  stage: preview
  script: |
    curl -X POST $RUNBOAT_URL/builds \
      -H "Authorization: Bearer $RUNBOAT_TOKEN" \
      -d "{\"repo\":\"$CI_PROJECT_PATH\",\"sha\":\"$CI_COMMIT_SHA\"}"
```

### Layer 2: E2E Test Scenarios (10–20 tests)

**Generation Workflow** (3 tests):
```python
def test_user_generates_blog_post_on_demand(page):
    # Navigate to wizard
    # Fill topic, select provider, set auto-publish
    # Click Generate
    # Assert blog.post created with title + body
    # Assert email sent to configured recipient

def test_user_saves_post_as_draft_for_review(page):
    # Same as above but auto_publish=False
    # Assert post is not published

def test_generation_fails_gracefully_with_api_error(page):
    # Trigger with invalid API key
    # Assert error message displayed
    # Assert "Retry" button visible on log
```

**Scheduling Workflow** (2 tests):
```python
def test_user_configures_daily_schedule_slot(page):
    # Navigate to schedule slots
    # Create morning, afternoon, evening slots
    # Set LLM provider and model
    # Toggle auto-publish per slot
    # Save and verify all 3 slots active

def test_user_monitors_generation_logs(page):
    # View all generation logs
    # Filter by state (success/error)
    # Click retry on failed log
    # Verify retry increments attempt counter
```

**Email & Social** (2 tests):
```python
def test_email_contains_post_title_and_social_copy(page):
    # Generate and publish post
    # Check generated email in outbox
    # Verify subject contains blog name + post title
    # Verify body contains social platforms (X, BlueSky, Mastodon, LinkedIn)

def test_user_edits_social_copy_before_publishing(page):
    # Generate as draft
    # Edit social media copy for each platform
    # Save and publish
    # Verify email uses edited copy
```

**Error Recovery** (2 tests):
```python
def test_user_retries_failed_generation(page):
    # Trigger generation with bad API key
    # Log shows error state
    # Fix API key in Settings
    # Click Retry on log
    # Verify post created successfully

def test_schedule_slot_continues_after_api_error(page):
    # Set invalid API key on schedule slot
    # Slot executes, fails, logs error
    # Fix API key
    # Wait for next slot time
    # Verify next generation succeeds
```

**Concurrency** (1–2 tests):
```python
def test_multiple_users_generate_posts_concurrently(page):
    # User1 generates on-demand
    # User2 generates on-demand simultaneously
    # Both posts created successfully
    # No database locks or conflicts
```

### Layer 3: Performance Benchmarks

**Latency Profiling**:
```python
def test_generation_latency_p50_under_30s(page):
    """Measure time from "Generate Now" click to blog.post created."""
    import time
    start = time.time()
    # ... navigate and generate ...
    elapsed = time.time() - start
    assert elapsed < 30, f"Generation took {elapsed}s, target <30s"
    # Record metric: elapsed_seconds_p50
```

**Query Count Assertion**:
```python
def test_generation_uses_fewer_than_50_queries(page):
    """Verify no N+1 query patterns."""
    from odoo.tests import TransactionCase
    # In the server-side test, not E2E:
    with self.assertQueryCount(50):
        schedule.run_generation()
```

**Stress Test** (not Playwright, server-side):
```python
def test_concurrent_schedule_slots_under_load():
    """3 slots × 5 iterations = 15 posts in rapid succession."""
    # Trigger all 3 schedule slots
    # Measure: peak memory, query count, token usage
    # Assert: all posts created, no failures
```

## Runboat Integration

### What is Runboat?

Runboat (by Acsone) provides:
- **Auto-deployed preview instances** of Odoo per CI commit
- **Live URL** for E2E testing (no local bootstrapping needed)
- **Fresh template DB** with addon pre-installed
- **5-minute auto-cleanup** after test run

### CI/CD Integration

```yaml
# .gitlab-ci.yml

stages: [lint, test, build, preview, e2e]

# ... existing lint + test stages ...

runboat_preview:
  stage: preview
  image: curlimages/curl:latest
  script:
    - |
      RUNBOAT_URL="${RUNBOAT_API_URL}/builds"
      RESP=$(curl -fsSL -X POST "$RUNBOAT_URL" \
        -H "Authorization: Bearer $RUNBOAT_TOKEN" \
        -H "Content-Type: application/json" \
        -d "{
          \"repo\": \"$CI_PROJECT_PATH\",
          \"sha\": \"$CI_COMMIT_SHA\",
          \"target_branch\": \"$CI_MERGE_REQUEST_TARGET_BRANCH_NAME\"
        }")
      BUILD_URL=$(echo "$RESP" | jq -r '.url')
      echo "BUILD_URL=$BUILD_URL" >> build.env
      
      # Post comment to MR
      curl -X POST "$CI_API_V4_URL/projects/$CI_PROJECT_ID/merge_requests/$CI_MERGE_REQUEST_IID/notes" \
        -H "PRIVATE-TOKEN: $GITLAB_BOT_TOKEN" \
        --data "body=🚀 [Preview](${BUILD_URL}/odoo) ready for testing"
  artifacts:
    reports:
      dotenv: build.env
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'

e2e_tests:
  stage: e2e
  image: mcr.microsoft.com/playwright/python:latest
  needs:
    - runboat_preview
  script:
    - pip install -r e2e/requirements.txt
    - pytest e2e/ --base-url=$BUILD_URL -v --tracing=retain-on-failure
  artifacts:
    when: on_failure
    paths:
      - e2e/traces/
    expire_in: 1 week
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
```

## Success Criteria

✅ **Phase 3 Complete when:**
- [ ] 10–20 E2E scenarios passing (Runboat)
- [ ] Performance baseline established (latency, tokens, queries)
- [ ] Concurrent generation verified (5+ simultaneous posts)
- [ ] All E2E tests green on merge requests
- [ ] Runboat integration in CI/CD
- [ ] Performance metrics documented in README
- [ ] No E2E test flakiness (< 2% failure rate)

## Performance SLO Targets

| Metric | Target | Rationale |
|---|---|---|
| Generation latency (P50) | < 30 seconds | User experience (wizard response time) |
| Generation latency (P99) | < 60 seconds | Outlier tolerance |
| Tokens per post | 800–1200 | Cost baseline for budget planning |
| Queries per generation | < 50 | N+1 detection and DB load |
| Concurrent posts | 5+ | Peak capacity without degradation |
| Email send latency | < 5 seconds | Notification responsiveness |
| Template DB prime time | < 60 seconds | CI/CD pipeline efficiency |

## Implementation Timeline

| Week | Task | Owner |
|---|---|---|
| W1 | Set up e2e/ directory, conftest.py, Runboat polling | Claude |
| W1 | Implement 3–5 core E2E scenarios (generation, scheduling) | Claude |
| W2 | Add error recovery and email scenarios | Claude |
| W2 | Set up performance measurement (latency, queries) | Claude |
| W3 | Stress testing and concurrency verification | Claude |
| W3 | Performance tuning if SLOs not met | Claude |
| W4 | Runboat CI/CD integration | Claude |
| W4 | Final verification and documentation | Claude |

## Known Constraints

### Runboat Limitations

- **Cold start**: First request may take 30–60s (instance startup)
- **Auto-cleanup**: Instance removed 5 min after last request
- **No persistent storage**: Data lost when instance cleaned up
- **Resource limits**: CPU/memory capped per deployment tier

### E2E Test Maintenance

- **Brittle selectors**: Avoid `.o_field_value` (auto-generated)
- **Timing issues**: Use `page.wait_for_*()` not `time.sleep()`
- **Flakiness**: Run 3× locally before merging
- **Timeout**: Set ≥ 30s for slow JS rendering

## References

- [Runboat Documentation](https://docs.acsone.eu/runboat/)
- [Playwright Python API](https://playwright.dev/python/)
- [Odoo E2E Best Practices](https://github.com/OCA/server-tools/tree/17.0#e2e-testing)

---

**Next**: Set up e2e/ directory and implement core scenarios