# Tabby E2E Test Plan

> A proposal for end-to-end test coverage of the Tabby Chrome extension, combining **functional assertions** and **visual artifacts** (screenshots + videos).

---

## TL;DR

| Metric | Estimate |
|---|---|
| Total source LOC | **6,362** |
| Covered by plan | **~60–65% lines**, ~50% branches |
| Features exercised | **40 of 49** functionally, **~15 of 49** visually |
| Harness | Playwright + `launchPersistentContext` + unpacked extension (same pattern as existing `scripts/video-demo.mjs`) |

**Recommendation:** start with a golden-path smoke suite (~30% coverage, <2 min), then grow breadth.

---

## 1. Testing strategy

Two independent layers, run side-by-side:

### 1a. Functional tests
Playwright `expect()` assertions against the DOM **plus** `sw.evaluate()` against the service-worker singletons (`__tabbyStorage`, `__tabbyCapture`, `__tabbyBackup`) exposed by `src/background/index.ts`.

- **Binary pass/fail.** Exit code drives CI.
- Assert end-state, not mid-animation frames (springs are time-dependent and flaky).
- Mock Supabase + Mixpanel network calls; everything else runs against real Chrome APIs.

### 1b. Visual tests
- **PNG per stable state** (layouts, menus, dialogs) — diffable later with pixelmatch.
- **MP4 per animation** (zoom, pinch, DnD, welcome flow) — recorded under Xvfb + ffmpeg, uploaded via `scripts/dbx-upload.sh` for review.

Visual artifacts are **not a replacement** for functional assertions — they're the only way to catch regressions in motion/layout, where DOM state looks fine but the product doesn't.

---

## 2. Coverage by file

| Area | Files (LOC) | Coverage |
|---|---|---|
| App shell | `App.tsx` (73), `main.tsx` (58), `hooks.tsx` (457) | **~75%** |
| Header | `Header.tsx` (438) | ~75% — folder-picker blocked by user-gesture requirement |
| Grid | `TabGrid.tsx` (265), `TabCard.tsx` (229) | **~80%** |
| Stack | `StackView.tsx` (764) | ~70% — pinch cooldown branches hard to hit |
| Zoom entrance/exit | `ZoomEntrance.tsx` (384), `pre-overlay.ts` (33) | ~65% |
| Welcome overlay | `WelcomeOverlay.tsx` (254) | ~80% |
| Refresh popup | `RefreshPopup.tsx` (151) | ~80% |
| Content script | `content/zoom-out.ts` (67) | ~75% |
| Commands & icon | `background/index.ts` (162), `tab-ready.ts` (57) | ~70% |
| Capture/storage | `capture.ts` (981), `storage.ts` (279) | **~55%** — retry/rate-limit branches |
| Messages router | `messages.ts` (424) | ~65% |
| Backup | `backup.ts` (271) | ~50% — leans on existing `test:reinstall` |
| Clustering | `background/clustering.ts` (500) | ~30% — Supabase mocked |
| Analytics | `analytics.ts` (159) | ~30% — Mixpanel mocked |
| Shared types | `types.ts` (147) | 100% (static) |

**Weighted total: ~60–65% LOC, ~50% branch.**

---

## 3. Feature inventory (49 items)

### A. Bootstrap
1. Install / new-tab override loads `tab.html`
2. First-time welcome overlay (with bounce animation)
3. Congratulations + confetti (5 s loop)
4. Toolbar icon click — single-instance / multi-Tabby policy

### B. Header controls
5. Search input + clear button
6. Search shortcuts: `/`, `Cmd/Ctrl+F`, `Escape`
7. Layout toggle: grid ↔ stack
8. Overview toggle (spring scale-down)
9. Hamburger menu: open, outside-click close
10. Alt/Option-click reveals Debug submenu
11. About dialog

### C. Grid layout
12. Window sections with tab count + current marker
13. TabCard rendering (thumbnail, favicon, title, placeholder)
14. 3D tilt on hover
15. Close button (hover-reveal X)
16. Click → zoom-exit → activate
17. Drag-and-drop reorder / cross-window move
18. Cluster groups rendering
19. Ungroup all tabs

### D. Stack layout
20. 3D fanned stack geometry
21. Focused mode (viewport-scaled center card)
22. Hover → focus transition
23. Mouse-past-slot hand-off to next card
24. Pinch zoom-out (focused → non-focused)
25. Pinch zoom-in (non-focused → focused)
26. Pinch zoom-in while focused → open tab
27. 500 ms pinch cooldown
28. Horizontal wheel → horizontal scroll
29. Close window button
30. Close tab from card
31. DnD reorder (horizontal strategy)

### E. Zoom entrance
32. Pre-overlay placeholder before React mount
33. Zoom-entrance animation on NTP load
34. Zoom-exit animation on card click

### F. Refresh-all-screenshots
35. Launches 585×501 popup
36. Progress UI (catwalk video + bar)
37. Stop button halts loop
38. Completion returns focus to original window/tab

### G. Content-script zoom-out
39. Pinch-out on any page → opens Tabby (1500 ms cooldown)

### H. Keyboard commands
40. `Ctrl/Cmd+Shift+X` — toggle expose mode
41. `Ctrl/Cmd+Shift+A` — expose all tabs

### I. Capture + storage
42. Per-tab capture on activation / update
43. Restricted URLs skipped (`chrome://`, extension, webstore)
44. Thumbnail re-association after browser restart
45. Thumbnail healing after cross-window DnD

### J. Debug / backup
46. `getDebugSettings` / `setDebugSettings` round-trip
47. Multi-Tabby toggle (one per window)
48. Folder backup mirror + restore

### K. Analytics
49. `OnNewTabPageLoaded`, `First Time Usage` events

---

## 4. Functional vs visual split

| Kind | Items | How tested |
|---|---|---|
| **Functional only** | 4, 6, 19, 29, 30, 37, 38, 40, 41, 42, 43, 44, 46, 47, 48, 49 | DOM + `sw.evaluate` assertions |
| **Functional + screenshot** | 1, 5, 7, 9, 10, 11, 12, 13, 18, 35 | Asserts + PNG |
| **Functional + video** | 2, 3, 8, 14, 16, 17, 22, 23, 24, 25, 26, 28, 31, 32, 33, 34, 36, 39, 45 | Asserts + MP4 under Xvfb |
| **Video only** | — | (none — every visual test also asserts) |

### Example assertions

```js
// Item 5 — Search filters tabs
await page.fill('input[placeholder^="Search"]', 'alpha')
await expect(page.locator('[data-tab-id]')).toHaveCount(1)

// Item 15 — Close button removes tab
const before = await sw.evaluate(() => chrome.tabs.query({}).then(t => t.length))
await page.locator('[data-tab-id] button.close').first().click()
const after = await sw.evaluate(() => chrome.tabs.query({}).then(t => t.length))
expect(after).toBe(before - 1)

// Item 42 — Thumbnail captured on activation
await page.goto('http://127.0.0.1:17001/')
await waitForCaptures(sw, ['http://127.0.0.1:17001/'], 10_000)
const rows = await sw.evaluate(() => self.__tabbyStorage.getAllRaw())
expect(rows.find(r => r.url === 'http://127.0.0.1:17001/').dataUrl.length).toBeGreaterThan(1000)
```

---

## 5. What we do NOT test

Being explicit so scope is clear:

- **User-gesture-gated dialogs**: `showDirectoryPicker` (backup folder), Chrome permission re-auth prompts
- **Network-backed features**: real Supabase clustering, real Mixpanel analytics (both mocked)
- **Frame-accurate animation timing**: we assert end-state, not "at 300 ms scale was 0.55"
- **Performance**: no FPS counters, no "no jank" checks
- **Accessibility, themes, multi-monitor**: not in scope
- **`visualViewport.scale` mechanics** in `content/zoom-out.ts`: we fire wheel events, but can't programmatically zoom the page
- **Capture retry / rate-limit edge branches** in `capture.ts`: hard to trigger deterministically

---

## 6. Proposed harness structure

Mirrors the existing `scripts/video-demo.mjs` pattern:

```
test/e2e/
  setup.mjs                    # shared helpers: launch(), getServiceWorker(), seedTabs(), local HTTP server
  functional/
    bootstrap.test.mjs         # items 1–4
    header.test.mjs            # items 5–11
    grid.test.mjs              # items 12–19
    stack.test.mjs             # items 20–31
    zoom.test.mjs              # items 32–34
    refresh.test.mjs           # items 35–38
    commands.test.mjs          # items 39–41
    capture.test.mjs           # items 42–45
    backup.test.mjs            # items 46–48
  visual/                      # generates PNGs into docs/e2e-screens/
    layouts.mjs
    menus.mjs
    welcome.mjs
    clusters.mjs
  video/                       # generates MP4s into docs/videos/ (via Xvfb wrapper)
    zoom-entrance.mjs
    pinch-gestures.mjs
    drag-drop.mjs
    welcome-flow.mjs
    refresh-progress.mjs
```

New npm scripts:
```json
"test:e2e":        "node test/e2e/run-functional.mjs",
"test:e2e:visual": "node test/e2e/run-visual.mjs",
"test:e2e:video":  "bash test/e2e/run-video.sh"
```

---

## 7. Phased rollout

| Phase | What | Coverage | Time to run |
|---|---|---|---|
| **1. Smoke** | Bootstrap, layout toggle, search, click-activate, close tab, DnD, capture, persistence | ~30% | <2 min |
| **2. Breadth** | All remaining functional items (menus, clusters, commands, backup) | ~55% | ~5 min |
| **3. Visual** | All PNGs + MP4s under Xvfb + Dropbox upload | ~65% | ~10 min |

---

## 8. Open questions for you

1. **Start with smoke (phase 1) or plan the whole thing upfront?**
2. **Video strategy** — per-feature short clips (easier to diff) or one long narrative?
3. **Should I add network mocks for Supabase + Mixpanel** to push capture/clustering coverage past 75%?
4. **Location** — `test/e2e/` (new, structured) vs extending `scripts/` alongside the existing persistence tests?

---

*Generated from analysis of `src/` as of 2026-04-24. See `scripts/video-demo.mjs` and `scripts/test-persistence.mjs` for existing test patterns this plan builds on.*