Retrieval systems · evaluation · automation

Gioia Zheng

Information Retrieval · RAG Evaluation · Reproducible ML · AI Infrastructure

B.Sc. student at Sapienza University of Rome building reproducible retrieve → rerank → generate systems and the evaluation machinery around large-scale QA pipelines.

Featured project Contact me

LOC Rome, IT
EDU B.Sc. ACSAI · Sapienza
DOMAIN IR · RAG evaluation · reproducible ML
CODE open source

Featured project All projects

msmarco-genqa

active

Reproducible retrieve → rerank → generate pipeline on MS MARCO with paired-bootstrap evaluation and a manifest-enforced reproducibility contract.

retrievalragevaluationreproducibility

Token-F1 Δ +0.171

95% CI [+0.163, +0.178]

Queries 6,980 paired

updated 2026-05-29 · View project

Recent writing

Failure analysis in retrieval-augmented generation

2026-05-28

A single aggregate score is the wrong unit for evaluating a RAG pipeline. Reporting per-category failure rates makes regressions visible that aggregates hide.

Read note