Project: Evalgen
What is it?
- Scrapes web pages and PDFs as input sources
- Uses an LLM to generate questions, expected answers, and telecom category labels
- Stores results in Firestore and serves them via a JSON API endpoint
- Includes a clean web UI for browsing generated datasets
- Deployed on GCP Cloud Run
The Evalgen interface
Backstory
Built during my internship at Capgemini/Telia, where the team needed evaluation data for an AI support assistant. The challenge: the assistant was being trained on a knowledge base of product documentation, but there was no structured dataset of questions a real customer might ask — which made evaluation difficult.
Evalgen solves this by turning any web page or PDF into a set of plausible Q&A pairs, tagged with a broad support category. The outputs can feed directly into an evaluation framework like Langfuse to measure how well an agent handles different query types.
The architecture is straightforward: a Python FastAPI app running on GCP Cloud Run, with Firestore as the storage backend. The scraping layer handles both HTML and PDF inputs. Results are available through a web UI and a JSON endpoint so they can be consumed by other tools in the pipeline.
The main technical challenge was IAM — making sure each GCP service (Cloud Run, Firestore, the scraping worker) had the right service account with the minimum necessary permissions. Getting that right without over-provisioning took most of the debugging time.
Technical details
Stack
- Python (FastAPI)
- GCP Cloud Run
- Firestore (document storage)
- OpenAI API / Vertex AI (Q&A generation)
- Web scraping (HTML + PDF)
- IAM / Service Accounts