An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning
Qiubo Huang, Guozheng Fang, Keyuan Jiang
Available Online May 2019.
- https://doi.org/10.2991/cnci-19.2019.40How to use a DOI?
- Code Plagiarism Detection, Relevant Features, XGBoost, Incremental Learning.
- Code plagiarism is a serious problem in the teaching evaluation process, and the programming assignment is related to the student's grades. Therefore, it is especially important to detect code plagiarism submitted by students. As all the codes submitted are kept in the database, and the data are gradually accumulated day by day. In this case, we propose a detection approach based on relevant features and XGBoost incremental learning. First, we describe the definitions of the relevant features of the code submission record in the Online Judge system, as well as the algorithm details such as calculating code similarity, code style similarity and the level of concentration of plagiarism targets, etc. Then, we use information gain to filter out some irrelevant features, and use the performance metrics such as Accuracy, Macro F1-Score, AUC and ROC curve to select the learning model. Finally, the XGBoost incremental learning algorithm is used to optimize the system implementation, and the accuracy of the model is up to 97.9% during evaluation test.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Qiubo Huang AU - Guozheng Fang AU - Keyuan Jiang PY - 2019/05 DA - 2019/05 TI - An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning BT - Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019) PB - Atlantis Press SP - 269 EP - 276 SN - 2352-538X UR - https://doi.org/10.2991/cnci-19.2019.40 DO - https://doi.org/10.2991/cnci-19.2019.40 ID - Huang2019/05 ER -