System Design

Architecture

A microservices architecture with a secure gateway as the single entry point for all AI operations.

System Overview

All traffic flows through the Secure AI Gateway for rate limiting, security checks, and observability.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Applied AI Engineering Portfolio                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│    ┌──────────────┐                                                          │
│    │  Web (Next.js)│                                                          │
│    │   Port 3000   │                                                          │
│    └───────┬───────┘                                                          │
│            │                                                                  │
│            ▼                                                                  │
│    ┌────────────────────────────────────────────────────────────────┐        │
│    │                   Secure AI Gateway (8000)                      │        │
│    │  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │        │
│    │  │   Rate   │  │   PII    │  │  Prompt   │  │    Cost      │  │        │
│    │  │  Limit   │─▶│  Redact  │─▶│ Injection │─▶│  Estimation  │  │        │
│    │  └──────────┘  └──────────┘  └───────────┘  └──────────────┘  │        │
│    └────────────────────────────┬───────────────────────────────────┘        │
│                                 │                                             │
│         ┌───────────────────────┼───────────────────────┐                    │
│         │                       │                       │                    │
│         ▼                       ▼                       ▼                    │
│  ┌─────────────┐        ┌─────────────┐        ┌─────────────┐              │
│  │     RAG     │        │    Eval     │        │  Incident   │              │
│  │   (8001)    │        │   (8002)    │        │   (8003)    │              │
│  │ + ChromaDB  │        │             │        │ + ChromaDB  │              │
│  └─────────────┘        └─────────────┘        └─────────────┘              │
│         │                                              │                     │
│         │                       ┌──────────────────────┘                    │
│         │                       ▼                                            │
│         │               ┌─────────────┐                                     │
│         │               │   DevOps    │                                     │
│         │               │   (8004)    │◀─── Cross-service incident lookup   │
│         │               │ + ChromaDB  │                                     │
│         │               └─────────────┘                                     │
│         │                       │                                            │
│         ▼                       ▼                                            │
│    ┌─────────────────────────────────────────────────────────────┐          │
│    │                    OpenAI API (External)                     │          │
│    │         Embeddings (text-embedding-3-small)                  │          │
│    │         Chat Completions (gpt-4o-mini)                       │          │
│    └─────────────────────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────────────────────┘

Request Flow

How requests are processed through the system

1

Web Frontend (Next.js)

User interacts with the portfolio website. All API calls are made to the Secure Gateway.

2

Secure AI Gateway

Applies rate limiting, PII redaction, prompt injection detection, and cost estimation before routing to backend services.

3

Backend Services

RAG, Eval, Incident, and DevOps services process requests using ChromaDB for vector storage and OpenAI for embeddings/completions.

4

Response with Metadata

Responses are enriched with gateway metadata (request_id, latency, cost estimates) before returning to the client.

Services

Web Frontend

:3000
Next.js

User interface for all demos and documentation

Secure AI Gateway

:8000
FastAPI

Central entry point with security middleware

Knowledge Retrieval

:8001
FastAPI + ChromaDB

RAG system with document chunking and citations

Evaluation Service

:8002
FastAPI

LLM testing and regression detection

Incident Investigation

:8003
FastAPI + ChromaDB

Root cause analysis with evidence retrieval

DevOps Risk Analysis

:8004
FastAPI + ChromaDB

Deployment risk scoring and recommendations

Design Decisions

Key architectural choices and tradeoffs

Gateway Pattern

All traffic flows through a single gateway, enabling centralized security, observability, and rate limiting without duplicating logic in each service.

Service Isolation

Each AI capability runs as an independent service with its own data store, allowing independent scaling and deployment.

Evidence-Based AI

All LLM outputs require evidence citations. Strict mode enables refusal when confidence is low, preventing hallucination in production.

Local-First Development

Docker Compose enables full-stack local development. ChromaDB provides vector storage without external dependencies.

Explore the Systems

See these architectural patterns in action through the live demos.