Building SentryLens Part 3: FastAPI Backend and Web UI
TL;DR: Built a FastAPI backend with REST endpoints for errors, clusters, and agent queries. Added a clean single-page web UI with chat and browse modes. Integrated Sentry webhooks with automatic clustering for real-time error ingestion.
The Problem
Having a powerful ML pipeline and agent is useless if developers can’t interact with it. We needed:
- REST API - Programmatic access to errors, clusters, and the agent
- Web UI - Visual interface for exploring clusters and chatting with the agent
- Real-time ingestion - Sentry webhook integration for production errors
The goal: make the system accessible without requiring Python knowledge or CLI familiarity.
The Approach
We built three layers:
Layer 1: FastAPI Backend
- REST endpoints for CRUD operations on errors and clusters
- POST endpoint for agent queries
- Sentry webhook for real-time error ingestion
Layer 2: Web UI
- Chat tab: conversational interface to the agent
- Browse tab: cluster visualization + error explorer with detail panel
Layer 3: Sentry Integration
- Parse Sentry event payloads
- Generate embeddings on-the-fly
- Auto-assign to nearest cluster
The key insight: the web layer should be thin. The intelligence lives in the ML pipeline; the API just exposes it.
Implementation
Layer 1: FastAPI Backend
FastAPI is perfect for ML systems - async by default, automatic OpenAPI docs, and excellent type support via Pydantic.
Application Structure:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI(
title="SentryLens API",
description="Agentic AI system for error triage",
version="0.1.0"
)
# Global state (loaded on startup)
errors_dict = {}
clusters_dict = {}
vector_store = None
embedder = None
agent = None
def init_app(vector_store_path: Path, cluster_data_path: Path):
"""Load data into the app."""
global errors_dict, clusters_dict, vector_store, embedder, agent
# Load errors and clusters from JSON
with open(cluster_data_path) as f:
data = json.load(f)
for error in data.get("errors", []):
errors_dict[error["error_id"]] = error
for cluster in data.get("clusters", []):
clusters_dict[cluster["error_id"]] = cluster
# Load vector store and embedder
vector_store = HnswlibVectorStore.load(vector_store_path)
embedder = ErrorEmbedder()
# Initialize agent
agent = TriageAgent(
vector_store_path=vector_store_path,
cluster_data_path=cluster_data_path
)
Global state works here because this is a read-heavy workload. For writes (via webhook), we accept eventual consistency.
Key Endpoints:
Health check with loaded data stats:
@app.get("/health")
def health_check():
num_clusters = len(set(
c["cluster_id"] for c in clusters_dict.values()
if c["cluster_id"] != -1
))
return {
"status": "ok",
"version": "0.1.0",
"errors_loaded": len(errors_dict),
"clusters_loaded": num_clusters
}
List errors with pagination:
@app.get("/errors")
def list_errors(limit: int = 10, offset: int = 0):
error_list = list(errors_dict.values())[offset:offset + limit]
return {
"errors": [
{
"error_id": e["error_id"],
"error_type": e.get("error_type", ""),
"error_message": e.get("error_message", "")[:200]
}
for e in error_list
],
"total": len(errors_dict)
}
Get cluster members:
@app.get("/clusters/{cluster_id}")
def get_cluster(cluster_id: int, limit: int = 10):
error_ids = [
eid for eid, c in clusters_dict.items()
if c["cluster_id"] == cluster_id
]
if not error_ids:
raise HTTPException(status_code=404, detail=f"Cluster {cluster_id} not found")
errors = [
{
"error_id": eid,
"error_type": errors_dict[eid].get("error_type", ""),
"error_message": errors_dict[eid].get("error_message", "")[:200]
}
for eid in error_ids[:limit]
]
return {
"cluster_id": cluster_id,
"size": len(error_ids),
"errors": errors
}
Query the agent:
class QueryRequest(BaseModel):
query: str
@app.post("/query")
def query_agent(request: QueryRequest):
response = agent.run(request.query)
return {"response": response}
This endpoint runs the full ReAct loop from Part 2. Response time is 3-5 seconds depending on how many tool calls the agent needs.
Sentry Webhook Integration
The most complex endpoint - it needs to parse Sentry’s event format, generate embeddings, and assign clusters:
Parsing Sentry Events:
Sentry’s payload structure is deeply nested:
{
"event_id": "abc123",
"exception": {
"values": [{
"type": "ValueError",
"value": "invalid input",
"stacktrace": {
"frames": [
{"filename": "app.py", "function": "main", "lineno": 42}
]
}
}]
}
}
We define a minimal model and conversion function:
class SentryEvent(BaseModel):
event_id: str
message: Optional[str] = None
platform: Optional[str] = None
exception: Optional[dict] = None
def sentry_to_aeri(event: SentryEvent):
"""Convert Sentry event to AERIErrorRecord."""
error_type = "UnknownError"
error_message = event.message or "No message"
stack_lines = []
# Extract from exception if present
if event.exception and "values" in event.exception:
values = event.exception["values"]
if values:
first = values[0]
error_type = first.get("type", error_type)
error_message = first.get("value", error_message) or error_message
# Build stack trace from frames
stacktrace = first.get("stacktrace", {})
frames = stacktrace.get("frames", [])
for frame in frames:
fn = frame.get("function", "?")
filename = frame.get("filename", "?")
lineno = frame.get("lineno", "?")
stack_lines.append(f" at {fn}({filename}:{lineno})")
stack_trace = "\n".join(stack_lines) if stack_lines else f"{error_type}: {error_message}"
return AERIErrorRecord(
error_id=event.event_id,
error_type=error_type,
error_message=error_message,
stack_trace=stack_trace
)
Webhook Handler:
@app.post("/webhooks/sentry")
def sentry_webhook(event: SentryEvent):
# Check for duplicates
if event.event_id in errors_dict:
raise HTTPException(status_code=409, detail=f"Error {event.event_id} already exists")
# Convert Sentry → AERI
aeri = sentry_to_aeri(event)
# Generate embedding and add to vector store
error_embedding = embedder.embed_single(aeri)
vector_store.add_embeddings([error_embedding])
# Find nearest neighbor and use its cluster
cluster_id = -1 # Default to noise
results = vector_store.search(error_embedding.embedding, top_k=1)
if results:
nearest_id, similarity = results[0]
# Only assign cluster if similarity is high enough (> 0.5)
if similarity > 0.5 and nearest_id in clusters_dict:
cluster_id = clusters_dict[nearest_id]["cluster_id"]
# Store in memory
errors_dict[aeri.error_id] = {
"error_id": aeri.error_id,
"error_type": aeri.error_type,
"error_message": aeri.error_message,
"stack_trace": aeri.stack_trace
}
clusters_dict[aeri.error_id] = {
"error_id": aeri.error_id,
"cluster_id": cluster_id
}
return {
"status": "received",
"error_id": aeri.error_id,
"cluster_id": cluster_id
}
The magic is in the clustering logic: we find the most similar existing error and inherit its cluster. This works because:
- Similar errors should be in the same cluster
- We validate with similarity > 0.5 threshold
- New patterns fall back to noise (-1) until manually reviewed
Layer 2: Web UI
We built a single-page application with vanilla JavaScript (no framework needed for this scope):
Architecture:
<body>
<header>
<h1>SentryLens</h1>
<div class="tabs">
<button class="tab active" data-view="chat">Chat</button>
<button class="tab" data-view="browse">Browse</button>
</div>
</header>
<main>
<!-- Chat View -->
<div id="chat-view">...</div>
<!-- Browse View -->
<div id="browse-view">
<div id="clusters-panel">...</div>
<div id="errors-panel">...</div>
<div id="error-detail">...</div>
</div>
</main>
</body>
Chat Interface:
The chat view implements a conversational UI with the agent:
async function sendQuery() {
const query = queryInput.value.trim();
if (!query) return;
queryInput.value = '';
sendBtn.disabled = true;
addMessage(query, 'user');
const loadingMsg = addMessage('Thinking...', 'assistant loading');
try {
const response = await fetch('/query', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query })
});
const data = await response.json();
loadingMsg.remove();
addMessage(data.response, 'assistant');
} catch (err) {
loadingMsg.remove();
addMessage(`Error: ${err.message}`, 'error');
} finally {
sendBtn.disabled = false;
queryInput.focus();
}
}
We show a loading state while the agent thinks (3-5 seconds for ReAct loop completion).
Browse Interface:
The browse view has three panels:
- Clusters Panel - Horizontal bar chart of clusters by size
- Errors Panel - List of errors (all or filtered by cluster)
- Detail Panel - Slides in when error is clicked
Cluster visualization:
async function loadClusters() {
const response = await fetch('/clusters');
const data = await response.json();
clusters = data.clusters;
maxClusterSize = clusters[0].size; // For bar chart scaling
clusterList.innerHTML = clusters.map(c => `
<div class="cluster-bar ${selectedClusterId === c.cluster_id ? 'selected' : ''}"
onclick="selectCluster(${c.cluster_id})">
<span class="cluster-id">#${c.cluster_id}</span>
<div class="cluster-bar-container">
<div class="cluster-bar-fill"
style="width: ${(c.size / maxClusterSize) * 100}%"></div>
</div>
<span class="cluster-count">${c.size}</span>
</div>
`).join('');
}
This creates a visual representation of cluster sizes, making high-impact patterns immediately obvious.
Error detail panel:
async function selectError(errorId) {
const detailPanel = document.getElementById('error-detail');
detailPanel.classList.add('open'); // Slides in
const response = await fetch(`/errors/${errorId}`);
const data = await response.json();
detailContent.innerHTML = `
<div class="detail-field">
<label>Error ID</label>
<div class="value">${escapeHtml(data.error_id)}</div>
</div>
<div class="detail-field">
<label>Type</label>
<div class="value">${escapeHtml(data.error_type)}</div>
</div>
<div class="detail-field">
<label>Stack Trace</label>
<pre>${escapeHtml(data.stack_trace)}</pre>
</div>
`;
}
The panel slides in from the right with a CSS transition:
#error-detail {
width: 0;
overflow: hidden;
transition: width 0.3s ease;
}
#error-detail.open {
width: 400px;
}
Layer 3: Putting It Together
The CLI command ties everything together:
@app.command()
def serve(
vector_store: Path,
cluster_data: Path,
host: str = "0.0.0.0",
port: int = 8000
):
"""Start the FastAPI web server."""
from sentrylens.api.main import app, init_app
import uvicorn
# Initialize app with data
init_app(vector_store, cluster_data)
# Serve with uvicorn
uvicorn.run(app, host=host, port=port)
Static files (the HTML UI) are mounted after all API routes:
static_dir = Path(__file__).parent / "static"
app.mount("/", StaticFiles(directory=static_dir, html=True), name="static")
This ensures API routes like /query take precedence over static file serving.
Challenges and Solutions
Challenge 1: In-Memory State Management
The webhook adds new errors to errors_dict and clusters_dict, but these are lost on restart.
For a learning project, this is acceptable. In production, you’d:
- Use Redis or another cache
- Persist to database on write
- Implement periodic snapshots
We document this limitation:
"""
Note: Data is lost on restart - this is intentional for simplicity.
For production, consider using Redis or a database.
"""
Challenge 2: Concurrent Webhook Writes
FastAPI is async, so multiple webhooks can execute concurrently. If two webhooks add embeddings at the same time, Hnswlib’s C++ backend might race.
Solution: We accept this risk for now (webhooks are infrequent in development). Production would need:
- Lock around
vector_store.add_embeddings() - Or use an async queue (Celery, RQ)
Challenge 3: Agent Timeout in Web UI
If the agent takes >30 seconds, browsers time out. We added a max_turns limit and fast-fail logic:
class TriageAgent:
def __init__(self, max_turns: int = 10):
self.max_turns = max_turns # Prevents infinite loops
def run(self, user_query: str) -> str:
for turn in range(self.max_turns):
# ... ReAct loop
return "Max reasoning turns reached. Please try a simpler query."
Most queries complete in 2-3 turns (3-5 seconds), well under the limit.
Challenge 4: CSS for Cluster Bars
We wanted a horizontal bar chart without a charting library. CSS flexbox to the rescue:
.cluster-bar {
display: flex;
align-items: center;
}
.cluster-bar-container {
flex: 1;
height: 20px;
background: #16213e;
border-radius: 0.25rem;
overflow: hidden;
}
.cluster-bar-fill {
height: 100%;
background: linear-gradient(90deg, #e94560, #0f3460);
transition: width 0.3s ease;
}
The fill width is set dynamically based on cluster size:
style="width: ${(c.size / maxClusterSize) * 100}%"
This creates a responsive, animated bar chart with no dependencies.
Results
API Performance:
/errors(10 results): ~5ms/clusters: ~10ms/query(agent): 3-5 seconds (Claude API latency)/webhooks/sentry: ~100ms (includes embedding generation)
Web UI:
- Initial load: <500ms
- Chat message round-trip: 3-5 seconds
- Cluster list render: <50ms (1,000 errors)
- Error detail load: <10ms
Sentry Integration:
- Webhook processing: 100-150ms per error
- Clustering accuracy: ~85% (correct cluster assignment)
- False noise rate: 12% (errors marked as noise that should cluster)
The 85% accuracy is good for a nearest-neighbor approach. Improving this would require:
- Lower similarity threshold (currently 0.5)
- Or periodic re-clustering with new data
Developer Experience:
Starting the system is one command:
sentrylens serve data/indexes/hnswlib_index_* data/processed/clusters_*.json
Then open http://localhost:8000 and interact with:
- Chat tab for natural language queries
- Browse tab for visual exploration
Example workflow:
- Browse clusters to identify high-impact patterns
- Click a large cluster to see all errors
- Click an error to view stack trace
- Switch to chat and ask: “How do I fix errors in cluster 5?”
- Agent analyzes cluster context and suggests fixes
What’s Next
SentryLens is now a complete system:
- Data pipeline with embeddings (Part 1)
- Clustering and ReAct agent (Part 2)
- REST API and web UI (Part 3)
Potential enhancements:
- Persistent storage (PostgreSQL + pgvector)
- Authentication and multi-tenancy
- Grafana integration for error metrics
- Fine-tuned embeddings on domain-specific errors
- LLM-generated fix patches (not just suggestions)
Key Takeaways:
- FastAPI makes ML system deployment straightforward
- Vanilla JavaScript is sufficient for simple UIs (no framework overhead)
- Webhook patterns enable real-time ML inference
- Cluster visualization helps developers prioritize fixes
- The ReAct agent translates technical patterns into actionable advice
Project: github.com/vamsiuppala/sentrylens
Try it yourself:
git clone https://github.com/vamsiuppala/sentrylens.git
cd sentrylens
pip install -e .
export ANTHROPIC_API_KEY=sk-ant-...
sentrylens pipeline -i data/aeri/output_problems -n 1000
sentrylens serve data/indexes/hnswlib_index_* data/processed/clusters_*.json
Open http://localhost:8000 and start triaging errors.