[GenAI Summit] Testing AI Agents: A Practical Framework for Reliability and Performance
Talk Abstract:
As AI agents powered by large language models (LLMs) become integral to production systems, ensuring their reliability and safety is both critical and uniquely challenging. Unlike traditional software, agentic systems are dynamic, probabilistic, and highly sensitive to subtle changes—making conventional testing approaches insufficient.
This talk presents a practical framework for testing AI agents, grounded in real-world experience developing and deploying production-grade agents at PagerDuty. The main focus will be on iterative regression testing: how to design, execute, and refine regression tests that catch failures and performance drifts as agents evolve. We’ll walk through a real use case, highlighting the challenges and solutions encountered along the way.
Beyond regression testing, we’ll cover the additional layers of testing essential for agentic systems, including unit tests for individual tools, adversarial testing to probe robustness, and ethical testing to evaluate outputs for bias, fairness, and compliance. Finally, I’ll share how we’re building automated pipelines to streamline test execution, scoring, and benchmarking—enabling rapid iteration and continuous improvement.
Attendees will leave with a practical, end-to-end framework for testing AI agents, actionable strategies for regression and beyond, and a deeper understanding of how to ensure their own AI systems are reliable, robust, and ready for real-world deployment.
What You’ll Learn:
Attendees will learn a practical, end-to-end framework for testing AI agents—covering correctness, robustness, and ethics—so they can confidently deploy reliable, high-performing LLM-based systems in production.
Presenter:
Irena Grabovitch-Zuyev, Staff Applied Scientist, PagderDuty
About the Presenter:
Irena Grabovitch-Zuyev is a Staff Applied Scientist at PagerDuty and a driving force behind PagerDuty Advance, the company’s generative AI capabilities. She leads the development of AI agents that are transforming how customers interact with PagerDuty, pushing the boundaries of incident response and automation.
With over 15 years of experience in machine learning, Irena specializes in generative AI, data mining, machine learning, and information retrieval. At PagerDuty, she partners with stakeholders and customers to identify business challenges and deliver innovative, data-driven solutions.
Irena earned her graduate degree in Information Retrieval in Social Networks from the Technion – Israel Institute of Technology. Before joining PagerDuty, she spent five years at Yahoo Research as part of the Mail Mining team, where her machine learning solutions for automatic extraction and classification were deployed at scale, powering Yahoo Mail’s backend and processing hundreds of millions of messages daily.
She is the author of several academic articles published at top conferences and the inventor of multiple patents. Irena is also a passionate advocate for increasing representation in tech, believing that diversity and inclusion are essential to innovation.
Login to PagerDuty Commons
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.