Proceedings of the 2nd International Conference on Modelling, Identification and Control

Data Preprocessing and Classification for Taproot site data sets of Panax notoginseng

Authors
Huang Dao, He Jin
Corresponding Author
Huang Dao
Available Online August 2015.
DOI
10.2991/mic-15.2015.29How to use a DOI?
Keywords
AdaBoost.M1, authentic-region herbs, Random Forest; data preprocessing
Abstract

The herbs from different producing regions have differences in the active constituents and efficacy. The quality of the herb from the authentic region is better than other producing regions. Nowadays, many peddlers substitute non-authentic herbs for authentic-region herbs in order to make more money. So it is important to distinguish herbs between different producing regions. This paper studies the data preprocessing and classification of taproot site data sets of Panax notoginseng from three different producing regions. Compare the effect of data preprocessing includes data standardization, instance selection, attribute selection and try to find out the best method and parameter settings for the data sets. Finally, we use different classification algorithms to classify the preprocessed data and compare the classification performance to find the optimal classification algorithm for the data sets. The classification performance in the experiment was evaluated by Percent Correct (PC), Mean Squared Error (MSE), Kappa Statistics (KS), Area Under ROC (AUR), Mean Absolute Error (MAE). The results shows that using decimal scaling to standardize the data and choose the subset of attribute {1,2,4,6,7,8}is suitable for the data and Random Forest algorithm and AdaBoost.M1 algorithm are the optimal classification algorithm for this data sets which has better classification performance.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2nd International Conference on Modelling, Identification and Control
Series
Advances in Intelligent Systems Research
Publication Date
August 2015
ISBN
10.2991/mic-15.2015.29
ISSN
1951-6851
DOI
10.2991/mic-15.2015.29How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Huang Dao
AU  - He Jin
PY  - 2015/08
DA  - 2015/08
TI  - Data Preprocessing and Classification for Taproot site data sets of Panax notoginseng
BT  - Proceedings of the 2nd International Conference on Modelling, Identification and Control
PB  - Atlantis Press
SP  - 131
EP  - 134
SN  - 1951-6851
UR  - https://doi.org/10.2991/mic-15.2015.29
DO  - 10.2991/mic-15.2015.29
ID  - Dao2015/08
ER  -