EDUCATION
May 23, 2024
4,800 legal exam questions: How we tested AI thoroughly.
LEXam measures legal AI performance in a realistic way for the first time – using data from real Swiss exams.
Before you delegate legal analyses to AI, you deserve more than promises – you deserve robust figures.
In recent months, we have worked together with leading institutions and experts to make the performance of AI models for legal tasks measurable. Our goal: to establish a scientifically based standard that shows which systems are actually capable of answering complex legal questions today.
A Project with Strong Partners
For this, we collaborated with the Federal Court, ETH Zurich, the University of Zurich, Omnilex, and other research partners. In this consortium, a total of 340 exam papers from legal education were collected – supplemented by thousands of essay responses and multiple-choice questions.
The result of this collaboration is the paper – an open-source benchmark covering various legal areas and task formats. LEXam allows for the evaluation of AI models not only based on general language ability but specifically in the discipline that demands the most precision: legal reasoning.
What Makes LEXam Unique
While many AI studies are based on English-language datasets, LEXam primarily covers Swiss law in German – an area that is uncharted territory for global models like GPT or Gemini.
The benchmark data includes:
1,660 multiple-choice questions
2,867 open legal exam questions
Materials from various legal fields (civil law, criminal law, public law, etc.)
Original tasks from real legal state exams
This provides LEXam with a previously unique level of realism for the application of AI in legal consulting.
How Current Models Performed
We tested 12 leading language models on LEXam. The results are clear:
✅ New models like Gemini-2.5-Pro significantly outperform previous generations in legal reasoning and structured argumentation.
❌ However, general AI models noticeably lose accuracy as soon as it comes to specific legal questions beyond their training focus.
Therefore, it is not enough to rely on general benchmarks or marketing promises. Anyone looking to use AI seriously in law firms must regularly empirically verify performance.
What This Means for Your Law Firm
For us at Omnilex, this work is not an academic exercise. It is the foundation on which we build our technology for you.
We have developed Omnilex from the very beginning to work specifically for legal orders – with the aim of being better in Swiss and German law than any general model. At the same time, we continuously test new versions of language models to ensure that you are always working on the most precise and efficient basis.
In short: We ensure that your AI solution is backed by research. So you can stay one decisive step ahead in client discussions, compliance processes, or written submissions.
A Big Thank You
A special thanks goes to our CTO Etienne Salimbeni, our Company Advisor Elliott Ash, and Joel Niklaus, who led the project. We would also like to thank the contributing members:
Yu Fan, Jingwei Ni, Yoan Hermstrüwer, Yinya Huang, Mubashara Akhtar, Oliver Dreyer, Daniel Brunner, Markus Leippold, Mrinmaya Sachan, Alexander Stremitzer, Yang Tian, Jakob Merane, Florian Geering, and Christoph Engel.