Retrieval systems · evaluation · automation
Gioia Zheng
Information Retrieval · RAG Evaluation · Reproducible ML · AI Infrastructure
B.Sc. student at Sapienza University of Rome building reproducible retrieve → rerank → generate systems and the evaluation machinery around large-scale QA pipelines.
- LOC Rome, IT
- EDU B.Sc. ACSAI · Sapienza
- DOMAIN IR · RAG evaluation · reproducible ML
- CODE open source
Featured project All projects
msmarco-genqa
active Reproducible retrieve → rerank → generate pipeline on MS MARCO with paired-bootstrap evaluation and a manifest-enforced reproducibility contract.
Token-F1 Δ +0.171
95% CI [+0.163, +0.178]
Queries 6,980 paired
Recent writing
Failure analysis in retrieval-augmented generation
2026-05-28 A single aggregate score is the wrong unit for evaluating a RAG pipeline. Reporting per-category failure rates makes regressions visible that aggregates hide.
Read note