Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025)

Handling Skewed Online Click Data: A Performance Analysis of Fraud Detection via Resampling Strategies

Authors
Deepti Sisodia1, Lokesh Singh1, *
1Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India
*Corresponding author. Email: singh.lokesh@manipal.edu
Corresponding Author
Lokesh Singh
Available Online 31 December 2025.
DOI
10.2991/978-94-6463-978-0_46How to use a DOI?
Keywords
pay-per-click; click fraud detection; class imbalance; data-level resampling; ensemble classifiers; online advertising,.
Abstract

In the domain of online advertising, particularly within the Pay-per-Click (PPC) framework, identifying fraudulent entities remains a critical challenge for the data mining research community. The inherent class imbalance—where fraudulent publishers represent a minor fraction compared to legitimate ones poses significant obstacles for predictive modeling. This pronounced skew in the dataset often results in biased learning models, which tend to favor the majority class, thereby compromising fraud detection efficacy. To systematically address these class imbalance constraints, this study proposes a comprehensive empirical framework that integrates nine supervised learning algorithms, eight distinct data-level sampling techniques, and a triad of evaluation metrics. The approach explores various sampling paradigms—including under-sampling, over-sampling, and hybrid methods—within the context of imbalanced and noisy clickstream datasets. The objective is to determine the most effective sampling strategy for enhancing classifier robustness in the presence of severe class distribution asymmetry. Each classifier’s hyperparameters are meticulously tuned to enhance model generalizability and elevate precision. To ensure reliable performance estimation, experiments are conducted using 10-fold cross-validation. Performance evaluation prioritizes average precision, recall, and F1-score metrics, as overall accuracy is often inadequate in reflecting the true performance of models under class-imbalanced conditions.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025)
Series
Advances in Engineering Research
Publication Date
31 December 2025
ISBN
978-94-6463-978-0
ISSN
2352-5401
DOI
10.2991/978-94-6463-978-0_46How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Deepti Sisodia
AU  - Lokesh Singh
PY  - 2025
DA  - 2025/12/31
TI  - Handling Skewed Online Click Data: A Performance Analysis of Fraud Detection via Resampling Strategies
BT  - Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025)
PB  - Atlantis Press
SP  - 537
EP  - 547
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-978-0_46
DO  - 10.2991/978-94-6463-978-0_46
ID  - Sisodia2025
ER  -