Automation testing
Automation is organized in layers so every change gets appropriate signal—fast smoke locally, deeper regression in CI, and selective production smoke where it adds value.
How automation is organized
- Smoke — Minimal path checks after build: critical routes render, gateway health responds, no obvious breakage. local + CI
- Regression — Broader suites that cover demos, API contracts, and integration paths. Run on merge and release candidates.
- Critical workflows — User-visible journeys (RAG query, eval dashboard, gateway playground) that must never silently break.
- Confidence gating — Tests block promotion when assertions fail; flaky tests are fixed or quarantined—not ignored.
Coverage layers
UI
Navigation, layout, and key interactions in the Next.js app.
API
Gateway and service endpoints: shape, status codes, and error paths.
Integration
Browser or harness through gateway to real or stubbed backends.
Testing and AI reliability
LLM-backed services are non-deterministic; automation complements eval dashboards and regression baselines. UI and API tests lock structure, latency envelopes, and failure modes; eval pipelines track output quality. Together they reduce surprise when models or prompts change.
How automation fits into CI/CD and AI evaluation
One pipeline mindset: code, models, and prompts all need gates.
CI/CD — Commits trigger lint, typecheck, unit checks where applicable, then UI/API/integration tiers. Release branches can add longer runs or nightly full regression.
AI evaluation — The eval service and automated suites are documented under projects; automation ensures the app and gateway still serve those flows. Planned coverage includes wiring eval smoke into the same promotion gates as the web tier.