Evaluation of the Pedagogical Performance of Generative Artificial Intelligence Systems: Comparative Study Between chatgpt-4.0 and Deepseek-R1 in the Learning of Mathematics in the Moroccan Baccalaureate

Omar Oulad Ebrahim; Abderrahim El Mhouti; Mostafa Allaoui

doi:10.2991/978-94-6239-634-0_15

<Previous Article In Volume

Next Article In Volume>

Evaluation of the Pedagogical Performance of Generative Artificial Intelligence Systems: Comparative Study Between chatgpt-4.0 and Deepseek-R1 in the Learning of Mathematics in the Moroccan Baccalaureate

Authors

Omar Oulad Ebrahim¹^{, *}, Abderrahim El Mhouti¹, Mostafa Allaoui²

¹ISISA, FS, Abdelmalek Essaadi University, Tetouan, Morocco

²ENIAD, Mohammed First University, Oujda, Morocco

^*Corresponding author. Email: ouladebrahim.omar@etu.uae.ac.ma

Corresponding Author

Omar Oulad Ebrahim

Available Online 2 April 2026.

DOI: 10.2991/978-94-6239-634-0_15 How to use a DOI?
Keywords: Generative Artificial Intelligence; ChatGPT-4.0; DeepSeek-R1; pedagogical performance; mathematics education; baccalaureate exam
Abstract: The rapid advances in generative AI technologies are revealing a disruptive potential in educational practices, raising new challenges while opening up unprecedented prospects for innovation, particularly in scientific disciplines such as mathematics. This requires an in-depth analysis of the pedagogical performance and limitations associated with this emerging technology. This study aims to evaluate and compare the pedagogical performance of ChatGPT-4.0 and DeepSeek-R1 within various cognitive levels related to learning mathematics at Moroccan baccalaureate level. This research focused on a corpus of 120 mathematical questions extracted from 11 monothematic exercises and three problems from the national scientific baccalaureate exams. The evaluation was based on the systematic administration of these exercises to the two generative AI models examined, in accordance with a standardised methodological protocol designed to guarantee the objectivity and reliability of the results. The answers provided were then collected and marked using an evaluation grid and compared with an official answer key drawn up by the French Ministry of Education. Finally, a comparative statistical analysis was carried out to examine the pedagogical performance of each model. The results of this analysis highlight the superiority of ChatGPT-4.o, with an overall score of 16.12/20 (80.6%), compared with DeepSeek-R1, which obtained an overall score of 12.80/20 (64.8%). On the other hand, DeepSeek-R1 showed notable performance, close to that of ChatGPT-4.o, for monothematic exercises, with a score of 15.95/20 (79.75%), compared with 17.18/20 (85.9%) for ChatGPT-4o. However, both models have limitations, including factual inaccuracies and a lack of contextual accuracy, these being particularly apparent in problems requiring deep mathematical reasoning. This study provides relevant data on the pedagogical performance of two generative AI models in secondary school mathematics education. It thus contributes to the discussion of the challenges associated with learning and the integrity of assessments, particularly in the context of online teaching.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2025)
Series: Atlantis Highlights in Social Sciences, Education and Humanities
Publication Date: 2 April 2026
ISBN: 978-94-6239-634-0
ISSN: 2667-128X
DOI: 10.2991/978-94-6239-634-0_15 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Omar Oulad Ebrahim
AU  - Abderrahim El Mhouti
AU  - Mostafa Allaoui
PY  - 2026
DA  - 2026/04/02
TI  - Evaluation of the Pedagogical Performance of Generative Artificial Intelligence Systems: Comparative Study Between chatgpt-4.0 and Deepseek-R1 in the Learning of Mathematics in the Moroccan Baccalaureate
BT  - Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2025)
PB  - Atlantis Press
SP  - 186
EP  - 196
SN  - 2667-128X
UR  - https://doi.org/10.2991/978-94-6239-634-0_15
DO  - 10.2991/978-94-6239-634-0_15
ID  - Ebrahim2026
ER  -

download .riscopy to clipboard