Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2025)

Evaluation of the Pedagogical Performance of Generative Artificial Intelligence Systems: Comparative Study Between chatgpt-4.0 and Deepseek-R1 in the Learning of Mathematics in the Moroccan Baccalaureate

Authors
Omar Oulad Ebrahim1, *, Abderrahim El Mhouti1, Mostafa Allaoui2
1ISISA, FS, Abdelmalek Essaadi University, Tetouan, Morocco
2ENIAD, Mohammed First University, Oujda, Morocco
*Corresponding author. Email: ouladebrahim.omar@etu.uae.ac.ma
Corresponding Author
Omar Oulad Ebrahim
Available Online 2 April 2026.
DOI
10.2991/978-94-6239-634-0_15How to use a DOI?
Keywords
Generative Artificial Intelligence; ChatGPT-4.0; DeepSeek-R1; pedagogical performance; mathematics education; baccalaureate exam
Abstract

The rapid advances in generative AI technologies are revealing a disruptive potential in educational practices, raising new challenges while opening up unprecedented prospects for innovation, particularly in scientific disciplines such as mathematics. This requires an in-depth analysis of the pedagogical performance and limitations associated with this emerging technology. This study aims to evaluate and compare the pedagogical performance of ChatGPT-4.0 and DeepSeek-R1 within various cognitive levels related to learning mathematics at Moroccan baccalaureate level. This research focused on a corpus of 120 mathematical questions extracted from 11 monothematic exercises and three problems from the national scientific baccalaureate exams. The evaluation was based on the systematic administration of these exercises to the two generative AI models examined, in accordance with a standardised methodological protocol designed to guarantee the objectivity and reliability of the results. The answers provided were then collected and marked using an evaluation grid and compared with an official answer key drawn up by the French Ministry of Education. Finally, a comparative statistical analysis was carried out to examine the pedagogical performance of each model. The results of this analysis highlight the superiority of ChatGPT-4.o, with an overall score of 16.12/20 (80.6%), compared with DeepSeek-R1, which obtained an overall score of 12.80/20 (64.8%). On the other hand, DeepSeek-R1 showed notable performance, close to that of ChatGPT-4.o, for monothematic exercises, with a score of 15.95/20 (79.75%), compared with 17.18/20 (85.9%) for ChatGPT-4o. However, both models have limitations, including factual inaccuracies and a lack of contextual accuracy, these being particularly apparent in problems requiring deep mathematical reasoning. This study provides relevant data on the pedagogical performance of two generative AI models in secondary school mathematics education. It thus contributes to the discussion of the challenges associated with learning and the integrity of assessments, particularly in the context of online teaching.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2025)
Series
Atlantis Highlights in Social Sciences, Education and Humanities
Publication Date
2 April 2026
ISBN
978-94-6239-634-0
ISSN
2667-128X
DOI
10.2991/978-94-6239-634-0_15How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Omar Oulad Ebrahim
AU  - Abderrahim El Mhouti
AU  - Mostafa Allaoui
PY  - 2026
DA  - 2026/04/02
TI  - Evaluation of the Pedagogical Performance of Generative Artificial Intelligence Systems: Comparative Study Between chatgpt-4.0 and Deepseek-R1 in the Learning of Mathematics in the Moroccan Baccalaureate
BT  - Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2025)
PB  - Atlantis Press
SP  - 186
EP  - 196
SN  - 2667-128X
UR  - https://doi.org/10.2991/978-94-6239-634-0_15
DO  - 10.2991/978-94-6239-634-0_15
ID  - Ebrahim2026
ER  -