Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019)

An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning

Authors
Qiubo Huang, Guozheng Fang, Keyuan Jiang
Corresponding Author
Qiubo Huang
Available Online May 2019.
DOI
10.2991/cnci-19.2019.40How to use a DOI?
Keywords
Code Plagiarism Detection, Relevant Features, XGBoost, Incremental Learning.
Abstract

Code plagiarism is a serious problem in the teaching evaluation process, and the programming assignment is related to the student's grades. Therefore, it is especially important to detect code plagiarism submitted by students. As all the codes submitted are kept in the database, and the data are gradually accumulated day by day. In this case, we propose a detection approach based on relevant features and XGBoost incremental learning. First, we describe the definitions of the relevant features of the code submission record in the Online Judge system, as well as the algorithm details such as calculating code similarity, code style similarity and the level of concentration of plagiarism targets, etc. Then, we use information gain to filter out some irrelevant features, and use the performance metrics such as Accuracy, Macro F1-Score, AUC and ROC curve to select the learning model. Finally, the XGBoost incremental learning algorithm is used to optimize the system implementation, and the accuracy of the model is up to 97.9% during evaluation test.

Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019)
Series
Advances in Computer Science Research
Publication Date
May 2019
ISBN
10.2991/cnci-19.2019.40
ISSN
2352-538X
DOI
10.2991/cnci-19.2019.40How to use a DOI?
Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Qiubo Huang
AU  - Guozheng Fang
AU  - Keyuan Jiang
PY  - 2019/05
DA  - 2019/05
TI  - An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning
BT  - Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019)
PB  - Atlantis Press
SP  - 269
EP  - 276
SN  - 2352-538X
UR  - https://doi.org/10.2991/cnci-19.2019.40
DO  - 10.2991/cnci-19.2019.40
ID  - Huang2019/05
ER  -