Projects
Production-grade AI systems demonstrating real engineering patterns
Overview
A production-ready RAG system with document chunking, vector embeddings, semantic search, and LLM-generated answers with citations. Includes strict mode for safety when confidence is low.
Tech Stack
API Endpoints
/ingest— Ingest documents/ask— Ask a question/sources— List sourcesKey Features
- Document chunking with configurable overlap
- Vector embeddings using OpenAI text-embedding-3-small
- Top-k retrieval with cosine similarity
- Answers with inline citations
- Strict mode: returns 'I don't know' on low confidence
- Confidence scoring for transparency
Overview
A comprehensive testing framework that helps prevent LLM regressions. Includes JSON schema validation, citation detection, consistency testing, and hallucination guards.
Tech Stack
API Endpoints
/runs— Start evaluation run/runs/latest— Get latest run/runs/{id}— Get specific runKey Features
- JSON schema validation for structured outputs
- Citation presence detection
- Consistency testing (same prompt, multiple runs)
- Hallucination guard (groundedness scoring)
- CI/CD integration with fail-on-regression
- Historical run tracking and comparison
Overview
A middleware layer that wraps all AI calls with enterprise-grade controls. Handles rate limiting, prompt injection detection, PII redaction, and cost estimation.
Tech Stack
API Endpoints
/rag/ask— Proxy to RAG service/eval/run— Proxy to Eval serviceKey Features
- Token bucket rate limiting (per-IP or global)
- Prompt injection detection (system override, jailbreak)
- PII redaction (email, phone, SSN, credit card)
- Cost estimation with token counting
- Structured JSON logging with request_id
- Request/response validation
Overview
An AI-powered incident investigation system that ingests artifacts (logs, alerts, deploy history), reconstructs timelines, and generates ranked root-cause hypotheses with evidence citations, confidence scoring, and strict refusal when evidence is weak.
Tech Stack
API Endpoints
/incident/ingest— Ingest case artifacts/incident/analyze— Analyze incident/incident/cases— List cases/incident/cases/{id}/rerun— Rerun analysisKey Features
- Timeline reconstruction from logs and alerts
- Evidence-based hypothesis generation
- Strict mode refusal when evidence is insufficient
- Counter-evidence surfacing for balanced analysis
- What Changed detection (deploys, configs)
- Interactive rerun with constraints
Overview
An intelligent DevOps system that evaluates deployment risk before changes reach production. Analyzes CI/CD pipelines, deploy events, and historical incidents to provide risk scores, evidence-based explanations, and rollout recommendations.
Tech Stack
API Endpoints
/devops/changes/ingest— Ingest a change/devops/changes/analyze— Analyze risk/devops/changes— List changes/devops/changes/{id}— Get change detailsKey Features
- Risk scoring based on historical patterns
- Similar incident detection and citation
- Rollout recommendations (canary, feature flag, etc.)
- Blast radius analysis
- Change velocity tracking
- Strict mode for insufficient data
Overview
A decision support system that analyzes problem statements and constraints to recommend appropriate AI architectures. Explicitly recommends non-AI solutions when appropriate and refuses when information is insufficient.
Tech Stack
API Endpoints
/architecture/review— Perform architecture review/architecture/reviews— List past reviews/architecture/reviews/{id}— Get review details/architecture/reviews/{id}/feedback— Submit feedbackKey Features
- RAG vs fine-tuning vs rules recommendations
- Constraint-aware analysis (latency, scale, compliance)
- Strict refusal when information is insufficient
- Tradeoff and risk analysis
- Alternative approaches evaluation
- Human feedback integration