AI Testing
SPONSOR TRACK TALK
The Rabbit Hole of Grammar: LLMs on Trial with QA Adventures
Traditional QA pipelines struggle with brittle scripts, dynamic requirements, and the absence of reliable test oracles for AI-generated outputs. The team has developed an automated pipeline with LLMs to do functional testing of web applications. The approach applies metamorphic testing for analysing LLMs and Chomsky’s universal grammar rules for language-agnostic testing of the developed pipeline in the presence of textual mutations. The results show that this approach provides a theoretical backbone for checking semantic fidelity in natural languages, while Gherkin offers a structured, machine-readable counterpart for robust validation.
This talk, Jalpa Soni will demonstrate an end-to-end LLM pipeline showing:
1. Explanation of universal grammar rules used for validating cross-lingual performance of LLMs in language translations.
2. Validation checks with controlled textual mutations based on Universal Grammar for natural languages (English > Spanish).
3. The effect of the length of text to be translated on the robustness of LLMs.
4. Demonstration of the limit of realistic text mutations by humans in general and how LLMs can outperform even in extreme cases of text mutations.
What you’ll learn
Session details
A short note from Jalpa on her talk

Jalpa Soni
Jalpa Soni acts as a Senior Data Scientist at the AI Innovation Lab at Capitole, leading the research components of multiple projects. She has a background in physics and a long experience of working in data science applied to a wide range of applications in different sectors.