International Journal of Computational Intelligence Systems

Volume 14, Issue 1, 2021, Pages 744 - 757

Exploring the Landscape, Hot Topics, and Trends of Electronic Health Records Literature with Topics Detection and Evolution Analysis

Authors
Yuxing Qian1, 2, 3, *, ORCID, Zhenni Ni1, 2, ORCID, Wenxuan Gui1, 2, ORCID, Yunmei Liu1, 2, ORCID
1School of Information Management, Wuhan University, Wuhan, 430072, China
2Center for Studies of Information Resources, Wuhan University, Wuhan, 430072, China
3Big Data Institute, Wuhan University, Wuhan, 430072, China
*Corresponding author. Email: qianyuxing@whu.edu.cn
Corresponding Author
Yuxing Qian
Received 1 October 2020, Accepted 31 January 2021, Available Online 10 February 2021.
DOI
10.2991/ijcis.d.210203.006How to use a DOI?
Keywords
Electronic health records; Bibliometrics; Social network analysis; lda2vec
Abstract

Electronic health records (EHRs)-related publications grow rapidly. It is helpful for experts and scholars in various disciplines to better understand the research landscape, hot topics, and trends of EHRs. We collected 13,438 records of EHRs research literature bibliometrics data from the Web of Science. We mainly performed the descriptive statistical analysis, social network analysis, and topic modeling with lda2vec to reveal the publications growth trend, research subjects distribution, and topics of EHRs researches. The EHRs research mainly included four topics: (i) population health, disease risk prediction, and primary care; (ii) technology, ethics, and privacy security; (iii) quality improvement, user acceptance, and engagement; (iv) information systems application and impact. EHRs have gone through the establishment, utilization, and high-level development and application. Research topics emerging in recent years have primarily focused on the social determinants of health, the application of deep learning models, the development and utilization of the patient portal, the mining of explicit and tacit knowledge, and the provision of decision support.

Copyright
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Electronic health records (EHRs) are longitudinal collections of patient and population electronic health information [1]. In recent years, EHRs have attracted the attention of many scholars and have been used in the diseases diagnosis [2], prediction [3], and screening [4]. With the rapid growth of EHRs research literature, some scholars summarized existing literature and conducted qualitative reviews. However, they were mainly aimed at specific diseases [5], technology [68], or applications in different medical fields [911], but lacked large-scale literature analysis.

It is important to sort out and present the overall landscape and the development process of EHRs. Because such large-scale and multidisciplinary of EHRs research literature has caused great trouble for researchers to grasp the profile and frontiers of EHRs. Previous reviews on EHRs were more domain-specific but weak in presenting a comprehensive overview that could promote mutual understanding and cooperation among scholars from different disciplines. Our study aims to integrate the publications on EHRs from various disciplines or research fields, perform comprehensive and multi-dimensional analysis on large-scale EHRs publications.

Social network analysis (SNA) and topic modeling are efficient methods to deal with large-scale bibliography data. SNA is based on network and graph theory [12]. Bibliometrics researches took the advantage of SNA to visualize co-keywords network for topics detection [1316]. SNA can also identify hotspots in a field to provide general knowledge and direct future research [17]. However, previous studies used SNA rarely considered subjects co-occurrence and temporal evolution of the co-keywords network.

The topic model is a statistical model for text mining, which can extract the latent semantic structure in the document collection. The topic model has also been widely used in the bibliometric analysis in many fields, such as computer science [18], management science [19], and medicine [20]. Latent Dirichlet allocation (LDA) is the most popular topic model [21]. In the medical field, LDA has been applied to literature analysis in mental health nursing [22], gastroenterology [23], etc. With the development of topic modeling, more complex models with better performance were proposed, such as lda2vec [24], an extension of LDA, that can not only mine the hidden semantics in the document but also grasp the context relationship well. Lda2vec has been applied to opinion mining and sentiment analysis [25,26] and also has good performance in medical topic mining such as urothelial cancer-related papers [27].

In summary, SNA is mainly based on co-occurrence relationships in the bibliographic data, while topic modeling focuses on titles and abstracts in the bibliographic data. When these methods are used alone, the bibliographic data cannot be fully utilized, therefore, with the combination of SNA and topic modeling, we can better explore the EHRs research literature and gain insights into the present research landscape.

The novelty of our study includes the following aspects. Firstly, compared to the previous reviews on EHRs research, our study tries to represent EHRs research landscape, hot topics, and trends with a comprehensive overview through multiple perspectives, such as publications growth, the temporal evolution of subjects and topics. Our study is not limited to a specific subject that can fully reflect the interdisciplinary characteristic of EHRs research. Secondly, compared to the current bibliometrics researches that used a single method, our study combines SNA and topic modeling to identify the current EHRs research topics from multiple perspectives such as subjects, keywords, authors, titles, and abstracts. In terms of SNA, we consider subjects co-occurrence to present EHRs research's interdisciplinary characteristic, we also present temporal change in the co-keywords network to reveal the development process of EHRs research. In terms of topic modeling, we use lda2vec that involves more text features than LDA.

We mainly focus on the following three research questions:

  • RQ1: How is the growth trend of EHRs publications?

  • RQ2: How is the subject distribution of EHRs research? To answer this question, we consider the subject's evolution and co-occurrence of EHRs research, as well as the journal distribution.

  • RQ3: What topics are included in EHRs research? To answer this question, we consider the topics distribution and key authors' contribution to different topics, we also try to explore the trends of each topic.

2. MATERIALS AND METHODS

2.1. Data Sources

The bibliographic data was collected from the Web of Science (WOS) Core Collection on January 7, 2020. The search query was defined as follows:

TS (Topic) = (“electronic health records” OR “electronic health record”) AND LA (Languages) = (English) AND DT (Document Types) = (“Article” OR “Proceedings Paper” OR “Review”) AND PY (Year Published) = (1900–2019).

Then, 13,552 records were returned. After excluding papers of the early access document type, we obtained 13,438 records and downloaded bibliographic data of these records as our data set.

Our data set included 10,372 articles, 740 reviews, and 2,575 proceedings papers. It should be noticed that some papers had more than one document type.

2.2. Descriptive Statistics Analysis

First, we counted the number of publications each year and analyzed the overall growth trend. Second, based on the subject classification in WOS, ten research subjects ranking highest in EHRs research were selected and a river map was created. In the river map, the width of the river indicates the percentage of research papers in their research subjects in the current year. Besides, the publication volume of different journals was counted, and the distribution of journal focused topics and subjects were analyzed.

2.3. Social Network Analysis

According to the classification of research subjects in WOS, we constructed and clustered the subject's co-occurrence network to reflect the interdisciplinary status of EHRs researches.

Then, after combining synonyms, singular and plural forms, such as “e-health” and “ehealth,” “electronic health records” and “EHRs,” “technology” and “technologies,” we constructed a co-keywords network for keywords with word frequency above 20. A timeline graph of topic clustering was drawn by VOSviewer. The topic clustering timeline graph was arranged horizontally for different topics, and the keyword under each topic was arranged vertically according to the average published year of the publication that contained this keyword. It can reflect the changes of different topics over time.

The cited frequency of authors in EHRs literature can reflect their contribution to this field. Based on the keywords of the paper and the authors of the citations, we constructed and clustered a citation author-keyword two model network [28]. The connection between nodes has three patterns: there is a co-citation relationship between authors (author to author), a co-occurrence relationship between keywords (keyword to keyword), and a co-occurrence relationship between authors and keywords (author to keyword).

2.4. Topic Detection with lda2vec

In the aspect of the topic analysis, based on the lda2vec algorithm, the title and abstract data were combined and clustered, and then the topics under different clusters were analyzed.

Lda2vec is a topic model that combines the LDA with the word2vec. Lda2vec builds representations over both words and documents by mixing word2vec's skip-gram architecture with the Dirichlet-optimized sparse topic matrix [24]. The total loss of the lda2vec model L is the sum of the skip-gram negative sampling loss (SGNS) ijLijneg with the addition of a Dirichlet-likelihood term over document weights Ld (1). In Equation (2), cj means context vector j and wi means word vector i. The SGNS ijLijneg is minimized when the model can discriminate pairs of cj,wi that appear in observed data from “negatively” sampled pairs cjwl drawn at random. Context vector cj in lda2vec is the sum of word vector wj and document vector dj, which allows lda2vec to embed both word and document vectors into the same space (3). The document vector dj in (4) is decomposed into a set of latent topic vectors, and each weight pjk denotes the membership of document j in the topic k. The Dirichlet likelihood for the weight is defined in (5). α and λ are constants set separately to n1 and 200, and n is the number of topics.

L=ijLijneg+Ld(1)
Lijneg=logσcjwi+l=0nlogσcjwl(2)
cj=wj+dj(3)
dj=pj0t0+pj1t1++pjktk++pjntn(4)
Ld=λjkα1logpjk(5)

A total of 13,438 items related to EHRs were retrieved from WoS. Each item contained a title and the corresponding abstract. The Natural Language Toolkit (NLTK) of python was used to clean and normalize the raw data. First, stop words and general vocabulary in the abstract, such as background, objective, method, result, conclusion, were removed because these words are meaningless without context. Besides, the DOI, copyright information, and non-letter characters were removed. Then, lemmatization was performed to reduce the inflectional forms from each word to a common base.

Figure 1

The trend of electronic health records (EHRs) publications growth.

Figure 2

Subject evolution river map.

Figure 3

Subject co-occurrence network in electronic health records (EHRs) research.

After preprocessing the title and abstract, this paper used the lda2vec algorithm to cluster them. Since the lda2vec model does not have the best method to determine the number of topics, LDA was performed for exploratory analysis, and a possible number of topics were selected based on the perplexity curve. Then according to the possible optimal number of topics, the lda2vec model was performed in turn and the results of topic clustering were compared, and finally, the optimal number of topics was determined. After that, we reviewed the most representative topic words and articles under different topics and summarized the topic content. And then, dimensionality reduction and visualization of the clustering results were performed through t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, we corresponded each document to time windows from 2015 to 2019, calculated the topic intensity under different time windows, and obtained the topic heat map.

3. RESULTS

3.1. The Publications Growth Trend

The of EHRs publications growth trend is shown in Figure 1.

According to the moving average of the number of publications, it could be divided into three stages.

The first stage is from 1994 to 2008. The EHRs research was in the birth and construction stage. During this period, the number of related papers was relatively few, and most of them were conference papers. Papers and related conferences were mainly distributed in European countries. For example, the European Conference on the Construction of Electronic Health Records (held in 1997), the 16th Medical Informatics Europe Conference (held in 2000), and the 11th World Conference on Medical Informatics (held in 2004). These conferences included many EHRs research papers. After that, research on EHRs was no longer limited to the European region. Especially, there was a rapid growth in publications from authors in the United States.

The second stage is from 2009 to 2016. The EHRs research had entered a rapid development stage. Nearly 70 EHRs-related papers were published in the Health Care Information Reform Conference held in Canada in 2009. In the same year, the US Congress and the Obama administration introduced the “Health Information Technology for Economic and Clinical Health Act,” advocating the meaningful use of EHRs [29], and providing an important foundation to overcome barriers to EHRs promotion in the United States and accelerate healthcare informatization reform. From 2010 to 2016, the growth rate of EHRs publications has remained at about 20%. In 2013, the growth rate reached nearly 40%.

The third stage is from 2017 to the present. EHRs research has entered a relatively mature stage. A considerable amount of publications increased each year. However, the growth rate in this stage is slower than that in the previous stage.

3.2. Research Subjects Evolution and Co-occurrence

A river map was created to present 10 research subjects ranking highest in EHRs research, as shown in Figure 2.

EHRs research involves multiple research subjects. The three most relevant research subjects are “health care sciences and services,” “medical informatics,” and “computer science.”

From 1994 to 1999, EHRs were first proposed in “general internal medicine,” but in the following years, there was not much continuous research in this research subject. During this period, “medical informatics” was the main research subject, and “information science and library science,” “engineering,” “nursing science” were involved.

From 2000 to 2007, related literature mainly focused on “computer science.” Computer science greatly promoted the development and application of EHRs in various research subjects. At that time, EHRs were applied in “public, environment and occupational health” for the first time. EHRs also became a stable and continuous research hot spot in “general internal medicine” and “engineering.”

From 2008 to 2019, EHRs were progressively applied to various subjects, focusing on “health care science and services,” and the number of subjects involved gradually increased, such as “pharmacology and pharmacy,” “pediatrics,” etc.

The co-occurrence relations between subjects were shown in Figure 3. The weight of each node is represented by the degree. The larger the node is, the more subjects it connects, demonstrating that this subject was more widely used in EHRs. The weight of the edge is represented by the co-occurrence frequency, where a thicker edge means a higher co-occurrence frequency between two subjects. Five clusters were obtained by cluster analysis.

Cluster 1 includes “medical informatics,” “information science and library science,” and “health care science and services.” These three subjects are directly related to the birth of EHRs. “Medical informatics” and “information science & library science” regard computer science as technology support, thus they are closely related to computer science and its interdisciplinary applications.

Cluster 2 includes “biology” and its related subjects, such as “biotechnology and applied microbiology” and “mathematical and computational biology.” This type of research often utilizes the technology of computer science and the theories and methods in the field of probability and mathematical statistics.

Cluster 3 primarily includes all kinds of computer science and related subjects required for the realization and application of EHRs, such as information systems, artificial intelligence, telecommunication, automation, robots, etc.

Cluster 4 mainly includes social issues extended by EHRs, such as economics, management, ethics, and laws, as well as applications in social environments, such as education, psychiatry, psychology, and substance abuse.

Cluster 5 includes medicine and various related subjects, reflecting the main application areas of EHRs, including “public, environment and occupational health,” “nursing,” “pharmacology and pharmacy,” “pediatrics,” internal medicine, oncology, heart and cardiovascular systems, endocrinology, and metabolism, infectious diseases, etc.

3.3. Journal Distribution

Journals with the top 10 publication volume were shown in Figure 4.

Figure 4

Journals with top 10 publication volume.

Among the top ten journals, eight of them belong to “medical informatics,” five of them belong to “computer science” and five of them belong to “health care science and services.”

The Journal with the highest publication volume is Journal of the American Medical Informatics Association that has a wide range of topics and disciplines, including valuation and implementation of information systems, and patient safety. Four journals, including International Journal of Medical Informatics, Applied Clinical Informatics, BMC Medical Informatics and Decision Making, and Journal of Medical Systems, focus on the development, implementation, and evaluation of information systems. Three journals, including Plos One, Journal of Medical Internet Research, and Journal of General Internal Medicine, focus on the study of population health, prevention, clinical care. Besides, Journal of Biomedical Informatics is inclined to information management and knowledge representation in biomedical science, such as ontology and semantic interoperability. CIN-COMPUTERS INFORMATICS NURSING, a journal for nursing science, has the topic of implementation of the information system, integration, and standardization of nursing language.

3.4. Topic Distribution

3.4.1. Co-keyword analysis

Four topics were clustered, and the timeline diagram of EHRs topic clusters is shown in Figure 5. Keywords in each topic were subdivided and we summarized the main content as shown in Table 1.

Figure 5

Co-keywords network clustering.

Topic 1 mainly focused on population health and risk prediction in primary care. Based on the EHRs, a large scale of patients' health information can be integrated and analyzed to strengthen the management of population health.

Topic 2 was related to medical informatics related technology. As an information resource, the integration of health data in EHRs involved issues such as knowledge management, information security, and privacy. Technologies such as machine learning were used to analyze health data in EHRs to provide decision support.

Topic 3 mainly reflected medical quality improvement and user acceptance. The early research in this topic focused on doctors' or nurses' adoption behavior of information technology. After the popularity of health information technologies, the research object has changed from medical staff to patients.

Topic 4 reflected the evaluation of medical information systems' performance and impact. Early research under this topic focused on the implementation of medical information systems. Later research focused on the impact and performance of these systems.

Cluster (Nodes) Representative Node Main Content
1 (238) Diabetes, cardiovascular disease, cancer, Alzheimer disease obesity, stroke, asthma, HIV, comorbidity, infection Diseases
Pevalence, morbidity, mortality, survival, incidence Disease statistics
Children, adult, women, adolescent, elderly patient, veteran Research subjects
Randomized controlled trial, meta-analysis, cohort, questionnaire, systematic review, follow up, trial, selection Research methods
Predictive modeling, predictor, trend, risk factor, risk assessment, association, validity, reliability Risk prediction
Regression, logistic regression Data analysis methods
Surgery, oncology, registry Departments
Administrative data, patient-reported outcome, general practice research database Data types
2 (140) Natural language processing, machine learning, deep learning, text mining, artificial intelligence, de-identification, cloud computing, blockchain Technologies
Clinical data, clinical note, social media, biobank Data types
Semantic interoperability, ontology, knowledge representation, information extraction, information storage and retrieval, data integration, data warehouse Application in information science
Security, authentication, privacy, confidentiality, consent, ethics, informed consent, trust, policy Ethical issues
Biomedical informatics, clinical informatics, health informatics, medical informatics, genetics, genomics, pharmacogenomics Subjects
Drug safety, drug interaction, adverse drug reaction, pharmacovigilance Drug safety
Clinical trial, clinical research, comparative effectiveness research, review, survey Research methods
3 (86) Adoption, satisfaction, attitude, perception, acceptance, perspective, behavior Information systems researches
Physician, patient, physician, clinician, general practitioner, primary care physician user
Randomized trial, qualitative research, controlled trial, practice-based research, national survey Research methods
Patient portal, self-management, health services research, patient-centered care, patient participation, patient satisfaction Patients engagement
Ambulatory care, preventive care, communication, efficiency, decision- making, care coordination, productivity, burnout Quality improvement
4 (122) Computerized medical records system, order entry system, hospital information system, clinical information system, health information system, decision support system, medical education, education EHR Medical systems and applications
Costs, benefits, performance, time, feasibility, usability, unintended consequence, workflow, Factors of information systems
Pediatric, intensive care unit, emergency department, intensive care Departments
Nurse, hospitalized patient, hospitals, student Users
Medication error, medical error, patient outcome, adverse drug event, patient safety, alert fatigue Impact
Table 1

Main content of topic clustering.

Figure 6

Author-keyword two-mode network in electronic health records (EHRs) citations.

3.4.2. Two-mode network of keyword-author

The two-mode network of keyword-author is shown in Figure 6. The nodes starting with an “@” in the figure represent the first author of the publication cited by EHR literature, and the node size represents the author's cited frequency. The nodes starting with a “#” represent keywords, and the size of the node indicates the frequency of the keywords.

The two most-cited authors in EHRs research were Blumenthal D and Bates DW. Blumenthal D's researches [2931] were of great significance for patient-centered research, clinical research, and quality improvement. While Bates et al. [3234] made outstanding contributions to the clinical decision support system and the computerization of electronic medical records.

Through the co-occurrence analysis between keywords, we discovered the popular topics of EHRs, such as patient-centered-related content (patient safety, patient satisfaction, patient participation, patient portal), privacy security (ethics, privacy, informed, security, access control), research on chronic diseases (diabetes, hypertension), clinical decision support (clinical decision system, clinical trials), drug safety (adverse drug reactions, pharmacovigilance, adverse drugs events), precision medicine (genomics, personalized medicine, pharmacogenomics), nursing informatization (nursing informatization, nursing literature), medical ontology (information storage and retrieval, semantics, systematic clinical term sets, knowledge representation).

As for the co-occurrence relationship between the citation author and the keywords. Papers related to personal health records usually refer to Tang's research [35]. The meaningful use of EHRs is closely related to Desroches [36] and Blumenthal and Tavenner [29]. Singh et al. [37,38] and Sittig and Singh [39] have made important contributions to patient safety. Adler-Milstein et al.'s researches [40,41] were of great significance to the exchange of hospital and health information. Hripcsak and Albers [42] and Denny et al. [43,44] are closely related to phenotypes. Ammenwerth et al. [4547] has made outstanding contributions to the research on EHRs evaluation related topics. The relevant researches of Beale [48] and Beale et al. [49] are important in the aspects of open electronic medical records, prototype research, and semantic interoperability.

Topics Keywords Content
Topic 1 Injury, readmission, ICU, AKI, LOS, inpatient, hospitalization, Braden, predictor, logistic Risk assessment
Topic 2 Technique, mining, image, deep, NLP, entity, representation, machine, neural, temporal Artificial intelligence applications
Topic 3 Genomic, informatics, genetic, variant, sequencing, genomics, pharmacogenomics, PCEHR, PHRS, biobanks Bioinformatics
Topic 4 Portal, online, literacy, social, intention, perceived, engagement, acceptance, caregiver, coordination User intention
Topic 5 Resident, incentive, satisfaction, meaningful, burnout, student, ambulatory, payment, teaching, productivity Adoption
Topic 6 Antibiotic, antimicrobial, prescribing, ARI, stewardship, infection, culture, inappropriate, animal, ASP Infection control
Topic 7 Asthma, infant, HIV, pregnancy, birth, HPV, maternal, vaccination, mother, symptom Maternal and child health
Topic 8 Referral, PCP, cessation, tobacco, smoking, consultation, palliative, hospice, screen, SBIRT Medical service
Topic 9 CPOE, usability, pharmacist, computerized, reconciliation, handoff, search, errors, DDI, ADR Application and impact
Topic 10 Archetype, terminology, openehr, SNOMED, mapping, semantic, ontology, interoperability, LOINC, standardized Knowledge representation
Topic 11 Hypertension, blood, cardiovascular, obesity, incidence, BMI, weight, stroke, men, CVD Cardiovascular disease
Topic 12 Ehealth, exchange, device, mobile, hie, sharing, country, infrastructure, telemedicine, ICT Health information exchange
Table 2

Topics and keywords of EHRs.

3.4.3. Topic modeling

This paper used the lda2vec algorithm to cluster the title and abstract, set the number of topics to 12. The topics and keywords of EHRs are shown in Table 2. T-SNE algorithm was used for reducing the dimension of the result data and its visualization, as shown in Figure 7.

Figure 7

Topic clustering scatter diagram of electronic health records (EHRs).

Among the 12 topics, topic 1, topic 6, topic 7, topic 8, and topic 11 all belong to population health and risk prediction in primary care. Topic 1 focuses on risk prediction and prevention. The most concerning disease is acute kidney injury (AKI), while logistic regression is the most commonly used algorithm for prediction. Topic 6 involves infection control, especially research related to antibiotic prescribing in primary care practices. For example, Meeker et al. [50] analyzed the effect of behavioral interventions on inappropriate antibiotic prescribing; Gerber et al. [51] analyzed the effect of an outpatient antimicrobial stewardship intervention on broad-spectrum antibiotic prescribing.

Topic 7 focuses on diseases related to women and children, such as and cervical cancer. Topic 8 involves medical services in primary health care. Terms in topic 11 focus on related to cardiovascular diseases, such as hypertension, heart disease, stroke, etc. Studies in topic 11 focus on using EHR data to analyze the incidence and trends of diseases [52,53].

Topic 2, topic 3, topic 10, and topic 12 all belong to medical informatics and related technologies. These topics focus on the application of advanced technologies in computer science. Words in topic 2 are mainly related to the application of technologies, such as natural language processing and deep learning. Nadkarni et al. provided an introduction of natural language processing and tutorial of system design [54], and Ching et al. discussed opportunities and obstacles for deep learning in biology and medicine, such as patient classification, fundamental biological processes, and treatment of patients [7]. Topic 3 mainly contains words related to bioinformatics, such as genomics, pharmacogenomics, bioinformatics base, etc. Studies in this topic involve using EHRs to drive discovery in disease genomics [55,56], pharmacogenomics [57,58]. Meanwhile, this topic also involved personal health records and corresponding privacy and informed consent issues. Topic 10 focuses on knowledge representation, including terminology, ontology, semantics, standardization, etc. Topic 12 takes health information exchange as the main content, involving telemedicine, sharing, security.

Topics 4 and 5 are both related to system quality improvement and acceptance. Topic 4 is mainly about patients' intentions, such as patient activation [59], patient participation, and acceptance. It primarily studies user behavior through qualitative analysis, and research on the electronic patient portal is an important content under this topic. Goldzweig et al. [60] systematically review the effect of patient portals on clinical care, including patients' health outcomes, satisfaction, efficiency, and attitudes, etc. Topic 5 centers on the adoption of information systems, including the use, the cost, and the satisfaction of EHRs, and its impact on productivity and health worker burnout. For example, Jha et al. estimated the prevalence of adoption of EHRs in US hospitals [61], DesRoches et al. assessed physicians' adoption of EHRs, their satisfaction with related information systems, the perceived effectiveness of these systems, and possible barriers to adoption [36]. This topic also deals with the application of EHRs in student education.

Topic 9 belongs to the application of information systems and the avoidance of nursing errors. The words of this topic were mainly related to drug research, such as adverse drug reactions, drug interactions. Related studies involve detecting pharmacovigilance signals by mining data in EHRs database [62,63]. The application and impact of EHRs and related information systems are the main content of topic 9, such as its impact on time efficiency of physicians and nurses [64], healthcare quality [65], structure, process, and outcomes within primary care [66], and reduction in medication errors in hospitals [67].

A Heatmap of EHRs-related topics from 2015 to 2019 is shown in Figure 8.

Figure 8

Heatmap of electronic health records (EHRs)-related topics from 2015 to 2019.

As can be seen from Figure 8 and Table 2, topic 2 “artificial intelligence applications” has fewer fluctuations and maintains high popularity. It shows that the advanced algorithm technology in computer science, especially deep learning related technology, has drawn high attention in the past five years.

Some topics have gradually increased in popularity, such as topic 1 “risk assessment,” topic 7 “maternal and child health,” topic 8 “medical service” and topic 11 “cardiovascular disease.” It shows that population health and risk prediction have attracted much attention in the field of EHRs in the past five years. Besides, topic 10 “knowledge representation” and topic 12 “health information exchange” have shown a decreasing trend of popularity. It might indicate that the standards and specifications for health information representation and sharing had been primarily developed. Besides, the number of papers on some topics have been stable in the last five years, including topic 3 “bioinformatics,” topic 4 “user intention,” topic 5 “adoption,” topic 6 “infection control” and topic 9 “application and impact.”

4. DISCUSSION

4.1. Main Findings

In this study, we explored the landscape of EHRs through publications growth, distributions of EHRs research subjects, journals, authors, and topics. Research on EHRs started in 1994 and was mainly concentrated in European countries in the early days, with conference papers as the main focus. With the rise of medical informatics and the gradual expansion of information technology to the world, the United States has taken a leading position in this field.

The EHRs covers a wide range of disciplines, not only involving medical informatics, information science and library science, health care science and services, computer science, and other disciplines, but also widely used in medicine, pharmacy, and biology-related disciplines. Furthermore, economics, management, law, education, psychology, and other social disciplines are also pertinent.

We performed both SNA and topic modeling to detect topics. Although the number of topics is different depending on the method used, based on our research results, we summarize the current EHRs research topics and their trends into the following four points: (i) population health, disease risk prediction, and primary care; (ii) technology, ethics, and privacy security; (iii) quality improvement, user acceptance, and engagement; (iv) information systems application and impact.

4.1.1. Population health, disease risk prediction, and primary care

Researches under this topic mainly focus on disease screening and risk prediction of target populations based on EHRs, and research subjects involved are mainly epidemiology and pharmacoepidemiology. The types of diseases screened and predicted are mostly serious diseases that difficult to overcome, such as cancer [68], diabetes [69], and Alzheimer's disease [70]. The research subjects cover people of different ages, genders, and occupations, including not only children [71], youth [72], and the elderly [73], but also women [74] and veterans [75].

Traditional population health management researches were mainly based on questionnaire surveys, selecting specific populations for randomized controlled trials to study the burden of disease in public health. After the popularization of EHRs, researchers focus on large-scale data analysis and predict disease risks through technical methods such as logistic regression and deep learning, to better monitor the prevalence and distribution of diseases.

The latest research hotspots under this topic are the social determinants of health [7678]. The social determinants of health, such as economy, education, safety, etc., have a significant impact on morbidity and mortality. At present, doctors cannot obtain relevant data in patient care and population health management [79]. How to incorporate community data with EHRs becomes a hot topic.

4.1.2. Technology, ethics, and privacy security

In this topic, researches mainly focus on the utilization of technologies and methods from computer science and information science to deal with data from EHRs as well as clinical records, social media data, and biological databases. Their tasks include the construction of medical ontology, data integration, knowledge management, information storage and retrieval, data warehouse construction, etc.

Early researches focused on the standardization of EHRs and the development of terminology and prototypes. At the same time, to reduce medical errors, researchers began to discuss how to develop shared computer-interpretable clinical guidelines. Subsequently, the medical data is integrated and managed as an information resource, which involves information interoperability, information sharing, information storage, retrieval, and extraction. Based on the integrated medical information, machine learning, deep learning, and other technical means are used to analyze medical text and image data to provide further decision support.

According to the different data sources used, different research directions are extended, such as drug safety, imaging diagnosis, and basic biological processes of human diseases. With the development of technologies such as image recognition and natural language processing and the large-scale integration of EHRs, the frontiers of research on EHRs in recent years are mainly predictive models, precision medicine [80]. Various statistical or deep learning-based models are applied to EHRs, especially for embedded representations of learning medical concepts, such as diseases, medicines, and laboratory tests, to achieve disease analysis and prediction. How to associate and activate medical databases and knowledge in the literature to provide support for clinical decision-making has become a research hotspot.

The related policy is an important factor for motivating and regulating the implementation of health information technology [29,30], playing a key role in the health information exchange, the secondary use of EHRs, and the development of genetic pharmacology. Security and privacy issues not only involve informed consent and trust of the patients, but also related technical implementation issues, such as cloud computing, blockchain technology, and access control. Personal health information is considered to be one of the most private personal information, so the privacy and security protection of patient data is the key research content under this topic. In terms of the privacy of EHRs, it is necessary to control access to patient medical data to ensure its confidentiality, and how to define clear strategies and access control mechanisms is another important research content. In terms of the security of EHRs, blockchain technology has attracted much attention. Blockchain can provide a safer mechanism for medical data exchange health information in the healthcare industry by ensuring its security on a decentralized peer-to-peer network, thereby completely changing the way of sharing and storing patient EHRs.

4.1.3. Quality improvement, user acceptance, and engagement

The popularization and utilization of information systems of EHRs have greatly improved quality in medical practice. This topic mainly reflects the improvement of the medical quality of EHRs and the doctors' acceptance of EHRs and related information systems. The early-stage research of this topic mainly focused on the improvement of healthcare and disease prevention by information technology. With the “meaningful use” policy incentives in the United States [29,81,82], the research focuses on doctors' and nurses' adoption of information technology and analyzes the factors affecting doctors' and nurses' acceptance and satisfaction with health information technology. After the popularization of health information technology, the research objects have changed from doctors to patients, such as health service research, inpatient portal research [83], and how to improve patient engagement [84].

4.1.4. Information systems application and impact

This topic reflects the application and evaluation of medical information systems and the impact on error avoidance. The application scenarios are mostly in pediatrics, intensive care units, and emergency departments. The main research objects include nurses, hospitals, institutions, and students.

Early research on the application of medical information systems focused on computerized medical record systems. Electronic medical records and EHRs can often be used mutually, but EHRs contain more content than electronic medical records, and electronic medical records are used as data sources for EHRs [85]. At this time, the research emphasizes the impact of the computerization of medical records in different situations, such as nursing documentation [86], and the simplex function of medical order entry systems [87], hospital information systems [88]. With the advancement of information technology and the popularization and standardization of medical record systems, decision support systems have emerged. Decision support systems are not only the computerization of medical information but more importantly, they obtain knowledge from large patient databases and medical literature to provide support for clinical decision-making. The research under this topic then focuses on the evaluation of medical information systems in different situations, not only including the considerations of system implementation, such as availability, feasibility, cost, benefit, time, workflow, but also the impact on the nursing quality, such as reducing medical errors, reducing medication errors, and improving patient prognosis. With the development and maturity of the application of the medical system, EHRs began to be used in medical education, such as the application of the academic electronic medical record (AEMR) system to improve nursing students' grasp of health care decision-making and nursing informatics [89].

With the development of EHRs, its application has gradually extended to various fields and multiple medical links, and commercial EHRs and personal health records have emerged. On the one hand, the adoption of EHRs requires huge costs, but on the other hand, it also brings potential nursing quality improvements. Therefore, the research at this stage focuses on the adoption and quality measurement of EHRs. At the same time, policymakers have played an important role in the promotion of EHRs. Under the financial incentives of policy [29], EHRs have achieved higher-level applications and integration, realizing the ordering, analysis, and integration of EHRs.

4.2. Contributions

In this study, we performed an in-depth exploration of EHRs research based on bibliography data, evaluated the publications growth trends, subjects, journals, authors, and topics to provide the basic knowledge of the development, application, and practice of EHRs.

In terms of theoretical contributions, the analysis methods in this paper can be extended to other areas with interdisciplinary characteristics to conduct similar research.

In terms of practical contributions, firstly, our study enables researchers to fully understand the origin, evolution, and frontier of EHRs research, and guide their research topic selection, literature collection, and literature review. Secondly, based on the analysis of the journals in our study, personalized literature recommendations can be provided to EHRs researchers according to the topics of each journal, helping them to collect literature and select journals for submission. Thirdly, it would also contribute to their understanding, learning, and communication with people from different disciplines and new insights or ideas may emerge. Furthermore, it can provide opportunities for interdisciplinary cooperation for researchers and finally promotes the development of EHRs.

4.3. Limitations

There are some limitations to this study. Firstly, the WOS database was the only source of bibliographic data in this study that cannot include all EHRs publications. In the future, data sources should be expanded. Secondly, due to the limitation of paper length, we cannot cover all the analysis of the cooperation (authors, institutions, and countries) as well as in-depth citation analysis. Future research can be fine-grained from the perspective of cooperation and citations to improve the understanding of the status of EHRs research.

5. CONCLUSION

EHRs publications have grown rapidly in the past 10 years, and the current annual growth has been stable. The development of EHRs research requires the cooperation of multiple disciplines whether it is technology, medicine, law, or ethics. EHRs research topics and their trends into the following four points: (i) population health, disease risk prediction, and primary care; (ii) technology, ethics and privacy security; (iii) quality improvement, user acceptance and engagement; (iv) information systems application and impact. From the perspective of the development process, EHRs have gone through the establishment, utilization, and high-level development and application. Emerging research topics in recent years have primarily focused on the social determinants of health, the application of deep learning models, the development and utilization of patient portals, the mining of explicit and tacit knowledge, and the provision of decision support.

CONFLICTS OF INTEREST

The authors declare that they have no competing interests.

AUTHORS' CONTRIBUTIONS

Yuxing Qian and Zhenni Ni contributed equally to this work. The study was conceived and designed by Yuxing Qian. Data collection and analysis were performed by Zhenni Ni. The manuscript was written by Yuxing Qian and Zhenni Ni, and edited by Wenxuan Gui and Yumei Liu.

ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China (Nos. 71661167007 and 71420107026), the National Key Research and Development Program of China (No. 2018YFC0806904-03), and the Key Research and Development Program of Hubei Province (No. 2020BAB026).

REFERENCES

24.C.E. Moody, Mixing dirichlet topic models and word embeddings to make lda2vec, 2016. arXiv preprint https://arxiv.org/abs/1605.02019
48.T. Beale, Archetypes and the EHR, B. Blobel and P. Pharow (editors), Advanced Health Telematics and Telemedicine: The Magdeburg Expert Summit Textbook, I O S Press, Amsterdam, Netherlands, 2003, pp. 238-244. https://www.iospress.nl/book/advanced-health-telematics-and-telemedicine/
49.T. Beale et al., OpenEHR architecture overview, OpenEHR Found, Vol. 7, 2006. https://specifications.openehr.org/releases/BASE/latest/architecture_overview.html#_architecture_overview
85.J.L. Habib, EHRs, meaningful use, and a model EMR, Drug Benefit Trends, Vol. 22, 2010, pp. 99-101. https://www.patientcareonline.com/view/ehrs-meaningful-use-and-model-emr
Journal
International Journal of Computational Intelligence Systems
Volume-Issue
14 - 1
Pages
744 - 757
Publication Date
2021/02/10
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.d.210203.006How to use a DOI?
Copyright
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Yuxing Qian
AU  - Zhenni Ni
AU  - Wenxuan Gui
AU  - Yunmei Liu
PY  - 2021
DA  - 2021/02/10
TI  - Exploring the Landscape, Hot Topics, and Trends of Electronic Health Records Literature with Topics Detection and Evolution Analysis
JO  - International Journal of Computational Intelligence Systems
SP  - 744
EP  - 757
VL  - 14
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.210203.006
DO  - 10.2991/ijcis.d.210203.006
ID  - Qian2021
ER  -