Protocol-Driven Evaluation of LLM Agents
Agent Evaluation · Research Infrastructure · Reproducible Workflows
Master internship research on evaluating heterogeneous LLM-based agents through shared protocols, rubrics, and reproducible workflows.
Proof: Master Internship · First-author Paper · NeurIPS LLM Evaluation Workshop
Context
LLM-based agents are difficult to compare because they often use different interfaces, tools, task structures, and evaluation assumptions.
My Role
I worked on the research direction during my master internship, including literature review, benchmark analysis, evaluation design, and the platform concept.
Contribution
The project explored a protocol-driven approach for evaluating heterogeneous LLM agents through shared tasks, structured rubrics, and reproducible workflows.
Outcome
The work became part of my master internship and was later developed into a first-author paper accepted at the NeurIPS LLM Evaluation Workshop.
Media
Poster · Pipeline diagram · Paper title screenshot · Internship report cover · Research timeline