Proceedings of the 2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology

Detecting similarity of R functions via a fusion of multiple heuristic methods

Authors
Maciej Bartoszuk, Marek Gagolewski
Corresponding author
Bartoszuk
Keywords
R, plagiarism and code cloning detection, fuzzy proximity relations, aggregation, program dependence graph, t-norms.
Abstract
In this paper we describe recent advances in our R code similarity detection algorithm. We propose a modification of the Program Dependence Graph (PDG) procedure used in the GPLAG system that better fits the nature of functional programming languages like R. The major strength of our approach lies in a proper aggregation of outputs of multiple plagiarism detection methods, as it is well known that no single technique gives perfect results. It turns out that the incorporation of the PDG algorithm significantly improves the recall ratio, i.e. it is better in indicating true positive cases of plagiarism or code cloning patterns. The implemented system is available as web application at http://SimilaR.Rexamine.com/.
Download article (PDF)