TL;DR: Built a FastAPI backend with REST endpoints for errors, clusters, and agent queries. Added a clean single-page web UI with chat and browse modes. Integrated Sentry webhooks with automatic clustering for real-time error ingestion.

The Problem

Having a powerful ML pipeline and agent is useless if developers can’t interact with it. We needed:

  1. REST API - Programmatic access to errors, clusters, and the agent
  2. Web UI - Visual interface for exploring clusters and chatting with the agent
  3. Real-time ingestion - Sentry webhook integration for production errors

The goal: make the system accessible without requiring Python knowledge or CLI familiarity.

The Approach

We built three layers:

Layer 1: FastAPI Backend

  • REST endpoints for CRUD operations on errors and clusters
  • POST endpoint for agent queries
  • Sentry webhook for real-time error ingestion

Layer 2: Web UI

  • Chat tab: conversational interface to the agent
  • Browse tab: cluster visualization + error explorer with detail panel

Layer 3: Sentry Integration

  • Parse Sentry event payloads
  • Generate embeddings on-the-fly
  • Auto-assign to nearest cluster

The key insight: the web layer should be thin. The intelligence lives in the ML pipeline; the API just exposes it.

Implementation

Layer 1: FastAPI Backend

FastAPI is perfect for ML systems - async by default, automatic OpenAPI docs, and excellent type support via Pydantic.

Application Structure:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI(
    title="SentryLens API",
    description="Agentic AI system for error triage",
    version="0.1.0"
)

# Global state (loaded on startup)
errors_dict = {}
clusters_dict = {}
vector_store = None
embedder = None
agent = None

def init_app(vector_store_path: Path, cluster_data_path: Path):
    """Load data into the app."""
    global errors_dict, clusters_dict, vector_store, embedder, agent

    # Load errors and clusters from JSON
    with open(cluster_data_path) as f:
        data = json.load(f)
        for error in data.get("errors", []):
            errors_dict[error["error_id"]] = error
        for cluster in data.get("clusters", []):
            clusters_dict[cluster["error_id"]] = cluster

    # Load vector store and embedder
    vector_store = HnswlibVectorStore.load(vector_store_path)
    embedder = ErrorEmbedder()

    # Initialize agent
    agent = TriageAgent(
        vector_store_path=vector_store_path,
        cluster_data_path=cluster_data_path
    )

Global state works here because this is a read-heavy workload. For writes (via webhook), we accept eventual consistency.

Key Endpoints:

Health check with loaded data stats:

@app.get("/health")
def health_check():
    num_clusters = len(set(
        c["cluster_id"] for c in clusters_dict.values()
        if c["cluster_id"] != -1
    ))

    return {
        "status": "ok",
        "version": "0.1.0",
        "errors_loaded": len(errors_dict),
        "clusters_loaded": num_clusters
    }

List errors with pagination:

@app.get("/errors")
def list_errors(limit: int = 10, offset: int = 0):
    error_list = list(errors_dict.values())[offset:offset + limit]
    return {
        "errors": [
            {
                "error_id": e["error_id"],
                "error_type": e.get("error_type", ""),
                "error_message": e.get("error_message", "")[:200]
            }
            for e in error_list
        ],
        "total": len(errors_dict)
    }

Get cluster members:

@app.get("/clusters/{cluster_id}")
def get_cluster(cluster_id: int, limit: int = 10):
    error_ids = [
        eid for eid, c in clusters_dict.items()
        if c["cluster_id"] == cluster_id
    ]

    if not error_ids:
        raise HTTPException(status_code=404, detail=f"Cluster {cluster_id} not found")

    errors = [
        {
            "error_id": eid,
            "error_type": errors_dict[eid].get("error_type", ""),
            "error_message": errors_dict[eid].get("error_message", "")[:200]
        }
        for eid in error_ids[:limit]
    ]

    return {
        "cluster_id": cluster_id,
        "size": len(error_ids),
        "errors": errors
    }

Query the agent:

class QueryRequest(BaseModel):
    query: str

@app.post("/query")
def query_agent(request: QueryRequest):
    response = agent.run(request.query)
    return {"response": response}

This endpoint runs the full ReAct loop from Part 2. Response time is 3-5 seconds depending on how many tool calls the agent needs.

Sentry Webhook Integration

The most complex endpoint - it needs to parse Sentry’s event format, generate embeddings, and assign clusters:

Parsing Sentry Events:

Sentry’s payload structure is deeply nested:

{
  "event_id": "abc123",
  "exception": {
    "values": [{
      "type": "ValueError",
      "value": "invalid input",
      "stacktrace": {
        "frames": [
          {"filename": "app.py", "function": "main", "lineno": 42}
        ]
      }
    }]
  }
}

We define a minimal model and conversion function:

class SentryEvent(BaseModel):
    event_id: str
    message: Optional[str] = None
    platform: Optional[str] = None
    exception: Optional[dict] = None

def sentry_to_aeri(event: SentryEvent):
    """Convert Sentry event to AERIErrorRecord."""
    error_type = "UnknownError"
    error_message = event.message or "No message"
    stack_lines = []

    # Extract from exception if present
    if event.exception and "values" in event.exception:
        values = event.exception["values"]
        if values:
            first = values[0]
            error_type = first.get("type", error_type)
            error_message = first.get("value", error_message) or error_message

            # Build stack trace from frames
            stacktrace = first.get("stacktrace", {})
            frames = stacktrace.get("frames", [])
            for frame in frames:
                fn = frame.get("function", "?")
                filename = frame.get("filename", "?")
                lineno = frame.get("lineno", "?")
                stack_lines.append(f"  at {fn}({filename}:{lineno})")

    stack_trace = "\n".join(stack_lines) if stack_lines else f"{error_type}: {error_message}"

    return AERIErrorRecord(
        error_id=event.event_id,
        error_type=error_type,
        error_message=error_message,
        stack_trace=stack_trace
    )

Webhook Handler:

@app.post("/webhooks/sentry")
def sentry_webhook(event: SentryEvent):
    # Check for duplicates
    if event.event_id in errors_dict:
        raise HTTPException(status_code=409, detail=f"Error {event.event_id} already exists")

    # Convert Sentry → AERI
    aeri = sentry_to_aeri(event)

    # Generate embedding and add to vector store
    error_embedding = embedder.embed_single(aeri)
    vector_store.add_embeddings([error_embedding])

    # Find nearest neighbor and use its cluster
    cluster_id = -1  # Default to noise
    results = vector_store.search(error_embedding.embedding, top_k=1)
    if results:
        nearest_id, similarity = results[0]
        # Only assign cluster if similarity is high enough (> 0.5)
        if similarity > 0.5 and nearest_id in clusters_dict:
            cluster_id = clusters_dict[nearest_id]["cluster_id"]

    # Store in memory
    errors_dict[aeri.error_id] = {
        "error_id": aeri.error_id,
        "error_type": aeri.error_type,
        "error_message": aeri.error_message,
        "stack_trace": aeri.stack_trace
    }

    clusters_dict[aeri.error_id] = {
        "error_id": aeri.error_id,
        "cluster_id": cluster_id
    }

    return {
        "status": "received",
        "error_id": aeri.error_id,
        "cluster_id": cluster_id
    }

The magic is in the clustering logic: we find the most similar existing error and inherit its cluster. This works because:

  1. Similar errors should be in the same cluster
  2. We validate with similarity > 0.5 threshold
  3. New patterns fall back to noise (-1) until manually reviewed

Layer 2: Web UI

We built a single-page application with vanilla JavaScript (no framework needed for this scope):

Architecture:

<body>
    <header>
        <h1>SentryLens</h1>
        <div class="tabs">
            <button class="tab active" data-view="chat">Chat</button>
            <button class="tab" data-view="browse">Browse</button>
        </div>
    </header>

    <main>
        <!-- Chat View -->
        <div id="chat-view">...</div>

        <!-- Browse View -->
        <div id="browse-view">
            <div id="clusters-panel">...</div>
            <div id="errors-panel">...</div>
            <div id="error-detail">...</div>
        </div>
    </main>
</body>

Chat Interface:

The chat view implements a conversational UI with the agent:

async function sendQuery() {
    const query = queryInput.value.trim();
    if (!query) return;

    queryInput.value = '';
    sendBtn.disabled = true;

    addMessage(query, 'user');
    const loadingMsg = addMessage('Thinking...', 'assistant loading');

    try {
        const response = await fetch('/query', {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({ query })
        });

        const data = await response.json();
        loadingMsg.remove();
        addMessage(data.response, 'assistant');
    } catch (err) {
        loadingMsg.remove();
        addMessage(`Error: ${err.message}`, 'error');
    } finally {
        sendBtn.disabled = false;
        queryInput.focus();
    }
}

We show a loading state while the agent thinks (3-5 seconds for ReAct loop completion).

Browse Interface:

The browse view has three panels:

  1. Clusters Panel - Horizontal bar chart of clusters by size
  2. Errors Panel - List of errors (all or filtered by cluster)
  3. Detail Panel - Slides in when error is clicked

Cluster visualization:

async function loadClusters() {
    const response = await fetch('/clusters');
    const data = await response.json();
    clusters = data.clusters;
    maxClusterSize = clusters[0].size;  // For bar chart scaling

    clusterList.innerHTML = clusters.map(c => `
        <div class="cluster-bar ${selectedClusterId === c.cluster_id ? 'selected' : ''}"
             onclick="selectCluster(${c.cluster_id})">
            <span class="cluster-id">#${c.cluster_id}</span>
            <div class="cluster-bar-container">
                <div class="cluster-bar-fill"
                     style="width: ${(c.size / maxClusterSize) * 100}%"></div>
            </div>
            <span class="cluster-count">${c.size}</span>
        </div>
    `).join('');
}

This creates a visual representation of cluster sizes, making high-impact patterns immediately obvious.

Error detail panel:

async function selectError(errorId) {
    const detailPanel = document.getElementById('error-detail');
    detailPanel.classList.add('open');  // Slides in

    const response = await fetch(`/errors/${errorId}`);
    const data = await response.json();

    detailContent.innerHTML = `
        <div class="detail-field">
            <label>Error ID</label>
            <div class="value">${escapeHtml(data.error_id)}</div>
        </div>
        <div class="detail-field">
            <label>Type</label>
            <div class="value">${escapeHtml(data.error_type)}</div>
        </div>
        <div class="detail-field">
            <label>Stack Trace</label>
            <pre>${escapeHtml(data.stack_trace)}</pre>
        </div>
    `;
}

The panel slides in from the right with a CSS transition:

#error-detail {
    width: 0;
    overflow: hidden;
    transition: width 0.3s ease;
}

#error-detail.open {
    width: 400px;
}

Layer 3: Putting It Together

The CLI command ties everything together:

@app.command()
def serve(
    vector_store: Path,
    cluster_data: Path,
    host: str = "0.0.0.0",
    port: int = 8000
):
    """Start the FastAPI web server."""
    from sentrylens.api.main import app, init_app
    import uvicorn

    # Initialize app with data
    init_app(vector_store, cluster_data)

    # Serve with uvicorn
    uvicorn.run(app, host=host, port=port)

Static files (the HTML UI) are mounted after all API routes:

static_dir = Path(__file__).parent / "static"
app.mount("/", StaticFiles(directory=static_dir, html=True), name="static")

This ensures API routes like /query take precedence over static file serving.

Challenges and Solutions

Challenge 1: In-Memory State Management

The webhook adds new errors to errors_dict and clusters_dict, but these are lost on restart.

For a learning project, this is acceptable. In production, you’d:

  • Use Redis or another cache
  • Persist to database on write
  • Implement periodic snapshots

We document this limitation:

"""
Note: Data is lost on restart - this is intentional for simplicity.
For production, consider using Redis or a database.
"""

Challenge 2: Concurrent Webhook Writes

FastAPI is async, so multiple webhooks can execute concurrently. If two webhooks add embeddings at the same time, Hnswlib’s C++ backend might race.

Solution: We accept this risk for now (webhooks are infrequent in development). Production would need:

  • Lock around vector_store.add_embeddings()
  • Or use an async queue (Celery, RQ)

Challenge 3: Agent Timeout in Web UI

If the agent takes >30 seconds, browsers time out. We added a max_turns limit and fast-fail logic:

class TriageAgent:
    def __init__(self, max_turns: int = 10):
        self.max_turns = max_turns  # Prevents infinite loops

    def run(self, user_query: str) -> str:
        for turn in range(self.max_turns):
            # ... ReAct loop

        return "Max reasoning turns reached. Please try a simpler query."

Most queries complete in 2-3 turns (3-5 seconds), well under the limit.

Challenge 4: CSS for Cluster Bars

We wanted a horizontal bar chart without a charting library. CSS flexbox to the rescue:

.cluster-bar {
    display: flex;
    align-items: center;
}

.cluster-bar-container {
    flex: 1;
    height: 20px;
    background: #16213e;
    border-radius: 0.25rem;
    overflow: hidden;
}

.cluster-bar-fill {
    height: 100%;
    background: linear-gradient(90deg, #e94560, #0f3460);
    transition: width 0.3s ease;
}

The fill width is set dynamically based on cluster size:

style="width: ${(c.size / maxClusterSize) * 100}%"

This creates a responsive, animated bar chart with no dependencies.

Results

API Performance:

  • /errors (10 results): ~5ms
  • /clusters: ~10ms
  • /query (agent): 3-5 seconds (Claude API latency)
  • /webhooks/sentry: ~100ms (includes embedding generation)

Web UI:

  • Initial load: <500ms
  • Chat message round-trip: 3-5 seconds
  • Cluster list render: <50ms (1,000 errors)
  • Error detail load: <10ms

Sentry Integration:

  • Webhook processing: 100-150ms per error
  • Clustering accuracy: ~85% (correct cluster assignment)
  • False noise rate: 12% (errors marked as noise that should cluster)

The 85% accuracy is good for a nearest-neighbor approach. Improving this would require:

  • Lower similarity threshold (currently 0.5)
  • Or periodic re-clustering with new data

Developer Experience:

Starting the system is one command:

sentrylens serve data/indexes/hnswlib_index_* data/processed/clusters_*.json

Then open http://localhost:8000 and interact with:

  • Chat tab for natural language queries
  • Browse tab for visual exploration

Example workflow:

  1. Browse clusters to identify high-impact patterns
  2. Click a large cluster to see all errors
  3. Click an error to view stack trace
  4. Switch to chat and ask: “How do I fix errors in cluster 5?”
  5. Agent analyzes cluster context and suggests fixes

What’s Next

SentryLens is now a complete system:

  • Data pipeline with embeddings (Part 1)
  • Clustering and ReAct agent (Part 2)
  • REST API and web UI (Part 3)

Potential enhancements:

  • Persistent storage (PostgreSQL + pgvector)
  • Authentication and multi-tenancy
  • Grafana integration for error metrics
  • Fine-tuned embeddings on domain-specific errors
  • LLM-generated fix patches (not just suggestions)

Key Takeaways:

  • FastAPI makes ML system deployment straightforward
  • Vanilla JavaScript is sufficient for simple UIs (no framework overhead)
  • Webhook patterns enable real-time ML inference
  • Cluster visualization helps developers prioritize fixes
  • The ReAct agent translates technical patterns into actionable advice

Project: github.com/vamsiuppala/sentrylens

Try it yourself:

git clone https://github.com/vamsiuppala/sentrylens.git
cd sentrylens
pip install -e .
export ANTHROPIC_API_KEY=sk-ant-...
sentrylens pipeline -i data/aeri/output_problems -n 1000
sentrylens serve data/indexes/hnswlib_index_* data/processed/clusters_*.json

Open http://localhost:8000 and start triaging errors.