Projects

Production-grade AI systems demonstrating real engineering patterns

AI Knowledge Retrieval (RAG System)

Evidence-based answers with citations and strict refusal.

Overview

A production-ready RAG system with document chunking, vector embeddings, semantic search, and LLM-generated answers with citations. Includes strict mode for safety when confidence is low.

Tech Stack

FastAPI
ChromaDB
OpenAI
Pydantic
Python 3.11

API Endpoints

POST
/ingestIngest documents
POST
/askAsk a question
GET
/sourcesList sources

Key Features

  • Document chunking with configurable overlap
  • Vector embeddings using OpenAI text-embedding-3-small
  • Top-k retrieval with cosine similarity
  • Answers with inline citations
  • Strict mode: returns 'I don't know' on low confidence
  • Confidence scoring for transparency

LLM Evaluation & Regression Testing

Automated quality gates for LLM outputs in CI/CD.

Overview

A comprehensive testing framework that helps prevent LLM regressions. Includes JSON schema validation, citation detection, consistency testing, and hallucination guards.

Tech Stack

FastAPI
Click CLI
Pydantic
JSONSchema
Python 3.11

API Endpoints

POST
/runsStart evaluation run
GET
/runs/latestGet latest run
GET
/runs/{id}Get specific run

Key Features

  • JSON schema validation for structured outputs
  • Citation presence detection
  • Consistency testing (same prompt, multiple runs)
  • Hallucination guard (groundedness scoring)
  • CI/CD integration with fail-on-regression
  • Historical run tracking and comparison

Secure AI Gateway

PII redaction, prompt injection defense, rate limiting, and cost tracking.

Overview

A middleware layer that wraps all AI calls with enterprise-grade controls. Handles rate limiting, prompt injection detection, PII redaction, and cost estimation.

Tech Stack

FastAPI
Token Bucket
Tiktoken
HTTPX
Python 3.11

API Endpoints

POST
/rag/askProxy to RAG service
POST
/eval/runProxy to Eval service

Key Features

  • Token bucket rate limiting (per-IP or global)
  • Prompt injection detection (system override, jailbreak)
  • PII redaction (email, phone, SSN, credit card)
  • Cost estimation with token counting
  • Structured JSON logging with request_id
  • Request/response validation

AI Incident Investigation

Timeline reconstruction and root-cause analysis with evidence and human feedback.

Overview

An AI-powered incident investigation system that ingests artifacts (logs, alerts, deploy history), reconstructs timelines, and generates ranked root-cause hypotheses with evidence citations, confidence scoring, and strict refusal when evidence is weak.

Tech Stack

FastAPI
ChromaDB
OpenAI
RAG
Python 3.11

API Endpoints

POST
/incident/ingestIngest case artifacts
POST
/incident/analyzeAnalyze incident
GET
/incident/casesList cases
POST
/incident/cases/{id}/rerunRerun analysis

Key Features

  • Timeline reconstruction from logs and alerts
  • Evidence-based hypothesis generation
  • Strict mode refusal when evidence is insufficient
  • Counter-evidence surfacing for balanced analysis
  • What Changed detection (deploys, configs)
  • Interactive rerun with constraints

AI-Assisted DevOps Risk Analysis

Pre-deployment risk scoring, rollout recommendations, and change impact analysis.

Overview

An intelligent DevOps system that evaluates deployment risk before changes reach production. Analyzes CI/CD pipelines, deploy events, and historical incidents to provide risk scores, evidence-based explanations, and rollout recommendations.

Tech Stack

FastAPI
ChromaDB
OpenAI
RAG
Python 3.11

API Endpoints

POST
/devops/changes/ingestIngest a change
POST
/devops/changes/analyzeAnalyze risk
GET
/devops/changesList changes
GET
/devops/changes/{id}Get change details

Key Features

  • Risk scoring based on historical patterns
  • Similar incident detection and citation
  • Rollout recommendations (canary, feature flag, etc.)
  • Blast radius analysis
  • Change velocity tracking
  • Strict mode for insufficient data

AI Solution Architecture Review

Architecture recommendations with tradeoffs — including when AI should NOT be used.

Overview

A decision support system that analyzes problem statements and constraints to recommend appropriate AI architectures. Explicitly recommends non-AI solutions when appropriate and refuses when information is insufficient.

Tech Stack

FastAPI
ChromaDB
OpenAI
Pydantic
Python 3.11

API Endpoints

POST
/architecture/reviewPerform architecture review
GET
/architecture/reviewsList past reviews
GET
/architecture/reviews/{id}Get review details
POST
/architecture/reviews/{id}/feedbackSubmit feedback

Key Features

  • RAG vs fine-tuning vs rules recommendations
  • Constraint-aware analysis (latency, scale, compliance)
  • Strict refusal when information is insufficient
  • Tradeoff and risk analysis
  • Alternative approaches evaluation
  • Human feedback integration