Benchmarking TabPFN Against Traditional and Relational Models for Provider Fraud Detection
- DOI
- 10.2991/978-94-6239-693-7_77How to use a DOI?
- Keywords
- Healthcare fraud detection; TabPFN; Graph neural networks; Provider-level modeling; Cost-sensitive learning
- Abstract
Healthcare fraud detection is inherently difficult to model because the available data combine several challenges at once. The number of confirmed fraudulent providers is typically very small compared to legitimate ones. The features are diverse and providers are indirectly connected through shared patients and treatment patterns. Although gradient boosted decision trees remain widely used in operational systems due to their reliability on structured data, recent developments in pretrained tabular models and graph based learning suggest that alternative modelling strategies may capture these complexities more effectively. [1]
In this study, we evaluate a various modelling approaches including Logistic Regression, LightGBM, CatBoost, TabPFN, and a GraphSAGE based provider classifier on the Healthcare Provider Fraud Detection Analysis dataset. To ensure a fair comparison, we construct a provider-level representation by aggregating inpatient, outpatient, and beneficiary information into a single feature set. We further assess model performance using a cost-sensitive evaluation framework that jointly considers discrimination ability, probability calibration, and expected investigation utility. [2]
Our experiments showed TabPFN consistently provides the strongest overall ranking performance without requiring task specific tuning. At the same time, the lightweight GraphSAGE model remains competitive by incorporating provider beneficiary relationships, which proves particularly helpful for identifying borderline cases where tabular signals alone are less decisive. To encourage reproducibility and follow up work, we make our full Python pipeline publicly available for future research on relational and foundation models in insurance fraud detection. [3]
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Dipak Argade AU - K. Gandeeban PY - 2026 DA - 2026/06/16 TI - Benchmarking TabPFN Against Traditional and Relational Models for Provider Fraud Detection BT - Proceedings of the International Conference on Intelligent Systems for a Sustainable Future (ISSF 2026) PB - Atlantis Press SP - 785 EP - 792 SN - 2589-4919 UR - https://doi.org/10.2991/978-94-6239-693-7_77 DO - 10.2991/978-94-6239-693-7_77 ID - Argade2026 ER -