Project: Evalgen

What is it?

  • Scrapes web pages and PDFs as input sources
  • Uses an LLM to generate questions, expected answers, and telecom category labels
  • Stores results in Firestore and serves them via a JSON API endpoint
  • Includes a clean web UI for browsing generated datasets
  • Deployed on GCP Cloud Run

The Evalgen interface

Backstory

Built during my internship at Capgemini/Telia, where the team needed evaluation data for an AI support assistant. The challenge: the assistant was being trained on a knowledge base of product documentation, but there was no structured dataset of questions a real customer might ask — which made evaluation difficult.

Evalgen solves this by turning any web page or PDF into a set of plausible Q&A pairs, tagged with a broad support category. The outputs can feed directly into an evaluation framework like Langfuse to measure how well an agent handles different query types.

The architecture is straightforward: a Python FastAPI app running on GCP Cloud Run, with Firestore as the storage backend. The scraping layer handles both HTML and PDF inputs. Results are available through a web UI and a JSON endpoint so they can be consumed by other tools in the pipeline.

The main technical challenge was IAM — making sure each GCP service (Cloud Run, Firestore, the scraping worker) had the right service account with the minimum necessary permissions. Getting that right without over-provisioning took most of the debugging time.

Technical details

Stack

  • Python (FastAPI)
  • GCP Cloud Run
  • Firestore (document storage)
  • OpenAI API / Vertex AI (Q&A generation)
  • Web scraping (HTML + PDF)
  • IAM / Service Accounts