Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023)

Natural Language Processing Approach to Extract Compound Information from PubChem

Authors
Rehan Khan1, *, Preenon Bagchi1, Krutanjali Patil1
1Institute of Biosciences and Technology, MGM University, Chht. Sambhajinagar, India
*Corresponding author. Email: khanrehan9395@gmail.com
Corresponding Author
Rehan Khan
Available Online 17 November 2023.
DOI
10.2991/978-94-6463-294-1_6How to use a DOI?
Keywords
Natural language processing; SMILES; PubChem; Compounds; Properties; Python
Abstract

PubChem is one of the largest and most comprehensive databases of its kind containing information on millions of chemical compounds including their structures properties and biological activities. The Natural Language Processing (NLP) approach to extract compound information from PubChem has several advantages including improved accuracy and efficiency compared to manual methods and the ability to handle large amounts of data. NLP plays a significant role in extracting compound information from PubChem by enabling the processing of unstructured and semi-structured text data and by allowing for the identification of chemical compound names and the extraction of relevant information from text data. Simplified Molecular Input Line Entry Specification (SMILES) representations are also used in computational chemistry and drug discovery where they can be used to predict properties of compounds such as their stability reactivity and toxicity. This information is then used by researchers to design and optimize new drugs and chemical compounds. In this work we have extracted the compound information from PubChem using natural language processing can be approached in several steps they are Define the target information Data acquisition Text pre-processing Named entity recognition Relation extraction Entity linking Output generation. The results of a natural language processing approach to extract compound information from PubChem have the potential to greatly aid research efforts in chemistry pharmacology and other related fields.

In conclusion, SMILES representations are a powerful tool for identifying chemical compounds. By representing the structure of a chemical compound as a string of characters, SMILES representations make it possible to process and analyze chemical compounds using computers, enabling scientists and researchers to make new discoveries and advancements in the field of chemistry.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023)
Series
Advances in Health Sciences Research
Publication Date
17 November 2023
ISBN
10.2991/978-94-6463-294-1_6
ISSN
2468-5739
DOI
10.2991/978-94-6463-294-1_6How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Rehan Khan
AU  - Preenon Bagchi
AU  - Krutanjali Patil
PY  - 2023
DA  - 2023/11/17
TI  - Natural Language Processing Approach to Extract Compound Information from PubChem
BT  - Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023)
PB  - Atlantis Press
SP  - 64
EP  - 71
SN  - 2468-5739
UR  - https://doi.org/10.2991/978-94-6463-294-1_6
DO  - 10.2991/978-94-6463-294-1_6
ID  - Khan2023
ER  -