Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath
- Rohmat Gunawan, Alam Rahmatulloh, Irfan Darmawan, Firman Firdaus
- Corresponding Author
- Rohmat Gunawan
Available Online March 2019.
- https://doi.org/10.2991/icoiese-18.2019.50How to use a DOI?
- DOM, Regex, Web Scraping, Xpath
- Data collection is the initial stage of research. There are various data sources on the internet that can be used in the research process. The process of taking data or information from sites on the internet is called web scraping. Some methods of web scraping include Regular Expression (Regex), HTML DOM and XPath. This study ai to determine the performance of the three methods of web scraping. The Comparison is done by testing each method when retrieving data from the target website, then measuring the performance of the process and comparing it. Process time, memory usage, and data consumption are used as measurement parameters in the experiment. The results of the experiment show that web scraping with the regex method is the smallest in memory usage compared to the HTML DOM method, and Xpath. While HTML DOM requires the least amount of time and the smallest data consumption compared to Regular Expression and Xpath methods.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Rohmat Gunawan AU - Alam Rahmatulloh AU - Irfan Darmawan AU - Firman Firdaus PY - 2019/03 DA - 2019/03 TI - Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath BT - 2018 International Conference on Industrial Enterprise and System Engineering (ICoIESE 2018) PB - Atlantis Press SN - 2589-4943 UR - https://doi.org/10.2991/icoiese-18.2019.50 DO - https://doi.org/10.2991/icoiese-18.2019.50 ID - Gunawan2019/03 ER -