International Journal of Computational Intelligence Systems

Volume 12, Issue 2, 2019, Pages 929 - 936

Distributed Synthetic Minority Oversampling Technique

Authors
Sakshi Hooda1, *, Suman Mann2
1Research Scholar, IPU, New Delhi, India
2Associate Professor, MSIT, New Delhi, India
*Corresponding author. Email: sakshihoodars@gmail.com
Corresponding Author
Sakshi Hooda
Received 1 May 2019, Accepted 10 July 2019, Available Online 30 July 2019.
DOI
10.2991/ijcis.d.190719.001How to use a DOI?
Keywords
SMOTE; apache spark; prediction; machine learning; imbalanced classification
Abstract

Real world problems for prediction usually try to predict rare occurrences. Application of standard classification algorithm is biased toward against these rare events, due to this data imbalance. Typical approaches to solve this data imbalance involve oversampling these “rare events” or under sampling the majority occurring events. Synthetic Minority Oversampling Technique is one technique that addresses this class imbalance effectively. However, the existing implementations of SMOTE fail when data grows and can't be stored on a single machine. In this paper present our solution to address the “big data challenge.” We provide a distributed version of SMOTE by using scalable k-means++ and M-Trees. With this implementation of SMOTE, we were able to oversample the “rare events” and achieve results which are better than the existing python version of SMOTE.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
12 - 2
Pages
929 - 936
Publication Date
2019/07/30
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.d.190719.001How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Sakshi Hooda
AU  - Suman Mann
PY  - 2019
DA  - 2019/07/30
TI  - Distributed Synthetic Minority Oversampling Technique
JO  - International Journal of Computational Intelligence Systems
SP  - 929
EP  - 936
VL  - 12
IS  - 2
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.190719.001
DO  - 10.2991/ijcis.d.190719.001
ID  - Hooda2019
ER  -