A Protocol-Driven Platform for Agent-Agnostic Evaluation of LLM Agents
Abstract: We present an end-to-end, agent-agnostic framework designed to systematically evaluate Large Language Model (LLM) agents. By isolating the execution and evaluation protocols from internal agent implementations, our platform enables reproducible side-by-side benchmarking across diverse cognitive architectures, establishing clean benchmarks and data-driven metrics for future agent developments.