Tabby E2E Test Plan

Functional + visual coverage proposal
Chrome extension · 6,362 LOC · 49 features

TL;DR

~62%
line coverage
40/49
features asserted
~15
visual artifacts

Playwright + launchPersistentContext + unpacked extension.
Same harness as existing scripts/video-demo.mjs.

Two-layer strategy

Functional

  • Playwright expect() on DOM
  • sw.evaluate() on SW singletons
  • Binary pass/fail, drives CI
  • Assert end-state, not mid-animation
  • Mock Supabase + Mixpanel

Visual

  • PNG per stable state (layouts, menus, dialogs)
  • MP4 per animation (zoom, pinch, DnD)
  • Recorded under Xvfb + ffmpeg
  • Uploaded via scripts/dbx-upload.sh
  • For human review, not CI gate

Visuals are not a replacement for assertions — they catch regressions in motion that DOM state alone misses.

Coverage by file

AreaFiles (LOC)Coverage
App shellApp, main, hooks (588)~75%
HeaderHeader.tsx (438)~75%
GridTabGrid, TabCard (494)~80%
StackStackView.tsx (764)~70%
Zoom entranceZoomEntrance, pre-overlay (417)~65%
WelcomeWelcomeOverlay.tsx (254)~80%
Refresh popupRefreshPopup.tsx (151)~80%
Content scriptzoom-out.ts (67)~75%
Commands & iconindex, tab-ready (219)~70%
Capture/storagecapture, storage (1260)~55%
Messagesmessages.ts (424)~65%
Backupbackup.ts (271)~50%
Clusteringclustering.ts (500)~30%
Analyticsanalytics.ts (159)~30%

Weighted total: ~60–65% LOC · ~50% branch

49 features, 11 areas

A Bootstrap (4)

B Header controls (7)

C Grid layout (8)

D Stack layout (12)

E Zoom entrance/exit (3)

F Refresh screenshots (4)

G Content-script pinch (1)

H Keyboard commands (2)

I Capture + storage (4)

J Debug + backup (3)

K Analytics (1)

A – B: Bootstrap & Header

A. Bootstrap (4)

  1. Install / new-tab override loads tab.html
  2. First-time welcome overlay (bounce)
  3. Congratulations + confetti (5 s)
  4. Toolbar icon / single-instance policy

B. Header (7)

  1. Search input + clear
  2. Search shortcuts (/, Cmd+F, Esc)
  3. Layout toggle: grid ↔ stack
  4. Overview spring-zoom
  5. Hamburger menu + outside-click close
  6. Alt/Option reveals Debug submenu
  7. About dialog

C – D: Grid & Stack

C. Grid (8)

  1. Window sections + current marker
  2. TabCard render (thumb, favicon, title)
  3. 3D tilt on hover
  4. Close button X
  5. Click → zoom-exit → activate
  6. DnD reorder + cross-window
  7. Cluster groups render
  8. Ungroup all

D. Stack (12)

  1. 3D fanned stack geometry
  2. Focused mode (viewport-scaled)
  3. Hover → focus
  4. Mouse-past-slot hand-off
  5. Pinch zoom-out
  6. Pinch zoom-in
  7. Pinch-in focused → open tab
  8. 500 ms cooldown
  9. Horizontal wheel scroll
  10. Close window
  11. Close tab from card
  12. DnD reorder (horizontal)

E – H: Zoom, Refresh, Commands

E. Zoom entrance (3)

  1. Pre-overlay before React mount
  2. Zoom-entrance on NTP load
  3. Zoom-exit on card click

F. Refresh (4)

  1. Launches 585×501 popup
  2. Progress UI (catwalk + bar)
  3. Stop button halts loop
  4. Completion returns focus

G. Content script (1)

  1. Pinch-out → open Tabby

H. Commands (2)

  1. Cmd+Shift+X — expose mode
  2. Cmd+Shift+A — expose all

I – K: Capture, Backup, Analytics

I. Capture + storage (4)

  1. Per-tab capture on activation / update
  2. Restricted URLs skipped
  3. Thumbnail re-association after restart
  4. Thumbnail healing after cross-window DnD

J. Debug + backup (3)

  1. getDebugSettings round-trip
  2. Multi-Tabby toggle
  3. Folder backup mirror + restore

K. Analytics (1)

  1. OnNewTabPageLoaded, First Time Usage

Functional vs visual split

KindCountExamples
Functional only 16 close window, commands, persistence, debug settings
Functional + PNG 10 layouts, menus, clusters, search empty state
Functional + MP4 19 zoom, pinch, DnD, welcome flow, refresh progress
Visual only every visual test also asserts

Example assertions

// Item 5 — Search filters tabs
await page.fill('input[placeholder^="Search"]', 'alpha')
await expect(page.locator('[data-tab-id]')).toHaveCount(1)

// Item 15 — Close button removes tab
const before = await sw.evaluate(() => chrome.tabs.query({}).then(t => t.length))
await page.locator('[data-tab-id] button.close').first().click()
const after = await sw.evaluate(() => chrome.tabs.query({}).then(t => t.length))
expect(after).toBe(before - 1)

// Item 42 — Thumbnail captured on activation
await page.goto('http://127.0.0.1:17001/')
await waitForCaptures(sw, ['http://127.0.0.1:17001/'], 10_000)
const rows = await sw.evaluate(() => self.__tabbyStorage.getAllRaw())
expect(rows.find(r => r.url === 'http://127.0.0.1:17001/').dataUrl.length)
  .toBeGreaterThan(1000)

What we do NOT test

  • gesture showDirectoryPicker, permission re-auth prompts
  • network real Supabase clustering, real Mixpanel
  • timing frame-accurate animation timing
  • perf FPS counters, jank detection
  • a11y accessibility, themes, multi-monitor
  • browser visualViewport.scale in content script
  • retry capture rate-limit / retry edge branches

Explicitly out of scope so expectations stay clear.

Harness structure

test/e2e/
  setup.mjs                       # shared helpers
  functional/
    bootstrap.test.mjs            # items 1–4
    header.test.mjs               # items 5–11
    grid.test.mjs                 # items 12–19
    stack.test.mjs                # items 20–31
    zoom.test.mjs                 # items 32–34
    refresh.test.mjs              # items 35–38
    commands.test.mjs             # items 39–41
    capture.test.mjs              # items 42–45
    backup.test.mjs               # items 46–48
  visual/                         # PNGs → docs/e2e-screens/
  video/                          # MP4s → docs/videos/ (Xvfb wrapper)
"test:e2e":        "node test/e2e/run-functional.mjs",
"test:e2e:visual": "node test/e2e/run-visual.mjs",
"test:e2e:video":  "bash test/e2e/run-video.sh"

Phased rollout

PhaseScopeCoverageRuntime
1. Smoke Bootstrap, layouts, search, click, close, DnD, capture, persistence ~30% <2 min
2. Breadth Menus, clusters, commands, backup, all remaining functional ~55% ~5 min
3. Visual PNGs + MP4s under Xvfb + Dropbox upload ~65% ~10 min

Each phase is independently shippable.

Four decisions

  1. Phase 1 only, or full plan upfront?
  2. Short per-feature clips, or one narrative video?
  3. Mock Supabase + Mixpanel to push past 75% coverage?
  4. Location: new test/e2e/ tree, or extend scripts/?

Pick any, then I start writing code.

Questions?

Full plan: docs/e2e-test-plan.md
Slides: docs/e2e-test-plan-slides.html