Development and Application of a Chinese Webpage Suicide Information Mining System (SIMS)
Penglai Chen, Jing Chai, Lu Zhang, Debin Wang
Available Online December 2013.
- https://doi.org/10.2991/icaiees-13.2013.37How to use a DOI?
- suicide, news, blogs, data mining, support system
- Background: Suicide is becoming a major public health issue worldwide and in China. With growing popularity of the internet as well as participation of ordinary people in posting and transmitting messages on the internet, webpage coverage and its impact of reported suicide events has been and will continue to increase rapidly. Objectives: This study aims at designing and piloting a convenient Chinese webpage suicide information mining system (SIMS) to help search and filter required data from the internet and discover potential features, trends and causality of suicide. Methods: SIMS utilizes Microsoft Visual Studio 2008 as development platform, SQL 2008 as database manager, and C# as programming language. It collects raw Chinese webpage data via popular search engines; cleans the raw data via completeness, duplication and relevance checks using trained models, string or sentence comparison plus minimum manual help; translates the cleaned texts into quantitative data through models and supervised fuzzy recognition; and analyzes and visualizes suicide related variables by self-programmed algorithms. Results: The SIMS developed comprises five main functions i.e., suicide news collection, suicide blogs collection, data filtering and cleaning, data extraction and translation, blog suicide ideation estimation and data analysis and presentation. Data collection provides a user-friendly interface for retrieving suicide-related news and blogs from Chinese webpages and downloading them into SIMS database. Data filtering and cleaning performs completeness, duplication and relevance checks of the data gathered. Data extraction and translation facilitates deriving structured data about suicide from unstructured texts. Suicide ideation estimation assigns values of the suicide risks to blogs under concern using default or user built models. SIMS produces a set of useful indicators including frequencies and compositions of webpage reported suicide events and blogs with high suicidal ideations published in total and by different genders, ages, regions, causes, methods, years, months and hours etc. presented in easily understandable diagrams. Conclusions: SIMS provides a novel and practical means for monitoring and understanding suicide. It proposes useful aspects as well as tools for analyzing the features and trends of suicide using data derived from the internet as supplements to traditional suicide reporting and epidemiology surveys. Although SIMS was designed specifically for suicide, the overall architecture, strategies and techniques can be easily adapted to similar systems for addressing other diseases or health problems.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Penglai Chen AU - Jing Chai AU - Lu Zhang AU - Debin Wang PY - 2013/12 DA - 2013/12 TI - Development and Application of a Chinese Webpage Suicide Information Mining System (SIMS) BT - Proceedings of the 2013 International Conference on Advanced Information Engineering and Education Science (ICAIEES 2013) PB - Atlantis Press SP - 136 EP - 142 SN - 1951-6851 UR - https://doi.org/10.2991/icaiees-13.2013.37 DO - https://doi.org/10.2991/icaiees-13.2013.37 ID - Chen2013/12 ER -