Projects

Production-grade AI systems demonstrating real engineering patterns

AI Knowledge Retrieval (RAG System)

Evidence-based answers with citations and strict refusal.

Overview

A production-ready RAG system with document chunking, vector embeddings, semantic search, and LLM-generated answers with citations. Includes strict mode for safety when confidence is low.

Tech Stack

FastAPI

ChromaDB

OpenAI

Pydantic

Python 3.11

API Endpoints

POST

/ingest— Ingest documents

POST

/ask— Ask a question

GET

/sources— List sources

Key Features

Document chunking with configurable overlap
Vector embeddings using OpenAI text-embedding-3-small
Top-k retrieval with cosine similarity
Answers with inline citations
Strict mode: returns 'I don't know' on low confidence
Confidence scoring for transparency

LLM Evaluation & Regression Testing

Automated quality gates for LLM outputs in CI/CD.

Demo Code

Overview

A comprehensive testing framework that helps prevent LLM regressions. Includes JSON schema validation, citation detection, consistency testing, and hallucination guards.

Tech Stack

FastAPI

Click CLI

Pydantic

JSONSchema

Python 3.11

API Endpoints

POST

/runs— Start evaluation run

GET

/runs/latest— Get latest run

GET

/runs/{id}— Get specific run

Key Features

JSON schema validation for structured outputs
Citation presence detection
Consistency testing (same prompt, multiple runs)
Hallucination guard (groundedness scoring)
CI/CD integration with fail-on-regression
Historical run tracking and comparison

Secure AI Gateway

PII redaction, prompt injection defense, rate limiting, and cost tracking.

Demo Code

Overview

A middleware layer that wraps all AI calls with enterprise-grade controls. Handles rate limiting, prompt injection detection, PII redaction, and cost estimation.

Tech Stack

FastAPI

Token Bucket

Tiktoken

HTTPX

Python 3.11

API Endpoints

POST

/rag/ask— Proxy to RAG service

POST

/eval/run— Proxy to Eval service

Key Features

Token bucket rate limiting (per-IP or global)
Prompt injection detection (system override, jailbreak)
PII redaction (email, phone, SSN, credit card)
Cost estimation with token counting
Structured JSON logging with request_id
Request/response validation

AI Incident Investigation

Timeline reconstruction and root-cause analysis with evidence and human feedback.

Demo Code

Overview

An AI-powered incident investigation system that ingests artifacts (logs, alerts, deploy history), reconstructs timelines, and generates ranked root-cause hypotheses with evidence citations, confidence scoring, and strict refusal when evidence is weak.

Tech Stack

FastAPI

ChromaDB

OpenAI

RAG

Python 3.11

API Endpoints

POST

/incident/ingest— Ingest case artifacts

POST

/incident/analyze— Analyze incident

GET

/incident/cases— List cases

POST

/incident/cases/{id}/rerun— Rerun analysis

Key Features

Timeline reconstruction from logs and alerts
Evidence-based hypothesis generation
Strict mode refusal when evidence is insufficient
Counter-evidence surfacing for balanced analysis
What Changed detection (deploys, configs)
Interactive rerun with constraints

AI-Assisted DevOps Risk Analysis

Pre-deployment risk scoring, rollout recommendations, and change impact analysis.

Demo Code

Overview

An intelligent DevOps system that evaluates deployment risk before changes reach production. Analyzes CI/CD pipelines, deploy events, and historical incidents to provide risk scores, evidence-based explanations, and rollout recommendations.

Tech Stack

FastAPI

ChromaDB

OpenAI

RAG

Python 3.11

API Endpoints

POST

/devops/changes/ingest— Ingest a change

POST

/devops/changes/analyze— Analyze risk

GET

/devops/changes— List changes

GET

/devops/changes/{id}— Get change details

Key Features

Risk scoring based on historical patterns
Similar incident detection and citation
Rollout recommendations (canary, feature flag, etc.)
Blast radius analysis
Change velocity tracking
Strict mode for insufficient data

AI Solution Architecture Review

Architecture recommendations with tradeoffs — including when AI should NOT be used.

Demo Code

Overview

A decision support system that analyzes problem statements and constraints to recommend appropriate AI architectures. Explicitly recommends non-AI solutions when appropriate and refuses when information is insufficient.

Tech Stack

FastAPI

ChromaDB

OpenAI

Pydantic

Python 3.11

API Endpoints

POST

/architecture/review— Perform architecture review

GET

/architecture/reviews— List past reviews

GET

/architecture/reviews/{id}— Get review details

POST

/architecture/reviews/{id}/feedback— Submit feedback

Key Features

RAG vs fine-tuning vs rules recommendations
Constraint-aware analysis (latency, scale, compliance)
Strict refusal when information is insufficient
Tradeoff and risk analysis
Alternative approaches evaluation
Human feedback integration