Handling Skewed Online Click Data: A Performance Analysis of Fraud Detection via Resampling Strategies
- DOI
- 10.2991/978-94-6463-978-0_46How to use a DOI?
- Keywords
- pay-per-click; click fraud detection; class imbalance; data-level resampling; ensemble classifiers; online advertising,.
- Abstract
In the domain of online advertising, particularly within the Pay-per-Click (PPC) framework, identifying fraudulent entities remains a critical challenge for the data mining research community. The inherent class imbalance—where fraudulent publishers represent a minor fraction compared to legitimate ones poses significant obstacles for predictive modeling. This pronounced skew in the dataset often results in biased learning models, which tend to favor the majority class, thereby compromising fraud detection efficacy. To systematically address these class imbalance constraints, this study proposes a comprehensive empirical framework that integrates nine supervised learning algorithms, eight distinct data-level sampling techniques, and a triad of evaluation metrics. The approach explores various sampling paradigms—including under-sampling, over-sampling, and hybrid methods—within the context of imbalanced and noisy clickstream datasets. The objective is to determine the most effective sampling strategy for enhancing classifier robustness in the presence of severe class distribution asymmetry. Each classifier’s hyperparameters are meticulously tuned to enhance model generalizability and elevate precision. To ensure reliable performance estimation, experiments are conducted using 10-fold cross-validation. Performance evaluation prioritizes average precision, recall, and F1-score metrics, as overall accuracy is often inadequate in reflecting the true performance of models under class-imbalanced conditions.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Deepti Sisodia AU - Lokesh Singh PY - 2025 DA - 2025/12/31 TI - Handling Skewed Online Click Data: A Performance Analysis of Fraud Detection via Resampling Strategies BT - Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025) PB - Atlantis Press SP - 537 EP - 547 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-978-0_46 DO - 10.2991/978-94-6463-978-0_46 ID - Sisodia2025 ER -