SentryLens - An attempt at developing an agentic AI system for error triage

Technical blog posts documenting the development of SentryLens, an agentic AI system for error triage.

Series Overview

This three-part series covers the complete journey of building a production-grade ML pipeline that combines semantic embeddings, clustering, and LLM agents to help developers triage errors at scale.

Part 1: Data Pipeline and Semantic Embeddings

Topics Covered:

Loading and validating Eclipse AERI error data with Pydantic
Generating semantic embeddings with sentence-transformers
Building a fast vector index with Hnswlib for similarity search
Handling nested JSON formats and data normalization

Key Takeaways:

Contract-first approach with Pydantic schemas
Batch processing for performance (32-64 batch size)
HNSW graphs for sub-3ms search latency
Domain-aware text preparation (first 10 stack frames)

Metrics:

1,000 errors processed in ~30 seconds
99.8% validation success rate
Sub-3ms search for top-5 similar errors

Part 2: Clustering and the ReAct Agent

Topics Covered:

HDBSCAN density-based clustering for automatic error grouping
Building a ReAct agent with Claude’s native tool_use
Implementing three tools: search, analyze, suggest
Combining cluster context with LLM reasoning

Key Takeaways:

HDBSCAN automatically detects cluster count
ReAct pattern for agentic reasoning
Tool responses should be concise and structured
Cluster size indicates fix priority

Metrics:

1,000 errors → 32 clusters + 8% noise
Average query: 2-3 tool calls
92% success rate on user queries
3-5 second response time

Part 3: FastAPI Backend and Web UI

Topics Covered:

FastAPI REST endpoints for errors, clusters, and agent queries
Single-page web UI with chat and browse modes
Sentry webhook integration with auto-clustering
Cluster visualization with CSS-only bar charts

Key Takeaways:

FastAPI for ML system deployment
Vanilla JavaScript for simple UIs
Webhook patterns for real-time inference
Nearest-neighbor for cluster assignment

Metrics:

API latency: 5-10ms for CRUD, 3-5s for agent
Webhook processing: 100-150ms per error
85% clustering accuracy on new errors
UI load time: <500ms

Project Links

Repository: github.com/vamsiuppala/sentrylens
Documentation: See main README.md in project root

Stack

ML/AI: sentence-transformers, Hnswlib, HDBSCAN, Claude API
Backend: FastAPI, Uvicorn, Pydantic
Frontend: Vanilla JavaScript, HTML5, CSS3
Data: Eclipse AERI dataset, Sentry webhooks

Quick Start

# Clone and setup
git clone https://github.com/vamsiuppala/sentrylens.git
cd sentrylens
pip install -e .
export ANTHROPIC_API_KEY=sk-ant-...

# Run full pipeline
sentrylens pipeline -i data/aeri/output_problems -n 1000

# Start web server
sentrylens serve data/indexes/hnswlib_index_* data/processed/clusters_*.json

# Open http://localhost:8000