ABEE: automated bio entity extraction from biomedical text documents

Document

Cited in

DOI	https://doi.org/10.1108/DTA-04-2022-0151
Published date	21 April 2023
Date	21 April 2023
Pages	222-244
Subject Matter	Library & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Metadata,Information & knowledge management,Information & communications technology,Internet
Author	Ashutosh Kumar,Aakanksha Sharaff

ABEE: automated bio entity

extraction from biomedical text

documents

Ashutosh Kumar and Aakanksha Sharaﬀ

Department of Computer Science and Engineering, National Institute of Technology

Raipur, Raipur, India

Abstract

Purpose –The purpose of this study was to design a multitask learning model so that biomedical entities

can be extracted without having any ambiguity from biomedical texts.

Design/methodology/approach –In the proposed automated bio entity extraction (ABEE) model,

a multitask learning model has been introduced with the combination of single-task learning models. Our

model used Bidirectional Encoder Representations from Transformers to train the single-task learning model.

Then combined model’s outputs so that we can ﬁnd the verity of entities from biomedical text.

Findings –The proposed ABEE model targeted unique gene/protein, chemical and disease entities from the

biomedical text. The ﬁnding is more important in terms of biomedical research like drug ﬁnding and clinical

trials. This research aids not only to reduce the eﬀort of the researcher but also to reduce the cost of new drug

discoveries and new treatments.

Research limitations/implications –As such, there are no limitations with the model, but the research

team plans to test the model with gigabyte of data and establish a knowledge graph so that researchers can

easily estimate the entities of similar groups.

Practical implications –As far as the practical implication concerned, the ABEE model will be helpful in

various natural language processing task as in information extraction (IE), it plays an important role in the

biomedical named entity recognition and biomedical relation extraction and also in the information retrieval

task like literature-based knowledge discovery.

Social implications –During the COVID-19 pandemic, the demands for this type of our work

increased because of the increase in the clinical trials at that time. If this type of research has been

introduced previously, then it would have reduced the time and eﬀort for new drug discoveries in this area.

Originality/value –In this work we proposed a novel multitask learning model that is capable to extract

biomedical entities from the biomedical text without any ambiguity. The proposed model achieved state-of-

the-art performance in terms of precision, recall and F1 score.

Keywords Biomedical entity extraction, Neural network, Single-task learning, Multitask learning,

Biomedical entity extraction, Bio data mining

Paper type Research paper

1. Introduction

Biomedical named entity recognition (BioNER) is a critical task when it comes to obtaining

biomedical insights from unstructured biomedical texts. In recent research, BioNER has played

a vital role in the identiﬁcation of biological entities and its associated extraction task.

The authors gratefully acknowledge the Department of Computer Science and Engineering of the National

Institute of Technology Raipur for providing infrastructure and facilities necessary for this work.

Funding: This research is not funded by any ﬁnancial institution.

Authors’contributions: A.K. and A.S. hypothesized and designed the idea of ABEE model. A.K.

developed ABEE.A.K. and A.S. experimented and analyzed the results. A.S.,as the supervisor of A.K.,

guided this researchwork. All authors read the ﬁnal manuscript carefully and approved it.

Availability of data: All the corpora are openly licensed and available at https://github.com/cambridgeltl/

MTL-Bioinformatics-2016/tree/master/data and https://github.com/SKumarAshutosh/ABEE.

Declaration of competing interests: The authors declare that they have no competing interests.

ThecurrentissueandfulltextarchiveofthisjournalisavailableonEmeraldInsightat:

https://www.emerald.com/insight/2514-9288.htm

222

Received 11 April 2022

Revised 2 September 2022

Accepted 19 September2022

Data Technologies and

Applications

Vol. 57 No. 2, 2023

pp. 222-244

2514-9288

DOI 10.1108/DTA-04-2022-0151

DTA

57,2

The key task of BioNER is to recognize biological entities like genes, proteins, chemicals,

symptoms and diseases. Most of the research has focused only on extracting the biomedical

named entities because most of the biomedical systems are highly dependent on these entities

and direct access to such biomedical information are possible only after BioNER. Building such

aBioNERsystemisalsoaverydiﬃcult task for the richness of biomedical literature. For

training and evaluation, a highly accurate BioNER system requires manually annotated

biomedical data. Most of the annotated biomedical datasets have been developed and

provided openly for BioNER research. Basically, BioNER is a task of extracting biomedical

entities from medical text documents. Earlier studies investigating BioNER fall under three

categories: statistical machine learning, dictionary and rule-based methods (Wang et al.,2018;

Yao et al., 2015). Rule-based methods are highly dependent on a variety of separate class rules.

Rule-based approaches can be deﬁned simply as “Just go ahead and write the rules”(Chiticariu

et al., 2013). Many handcrafted and heuristic rules were used to identify the combination of

named entities and their context in previously rule-based system entities (Li et al., 2020a). These

techniques were predominant in the early, as well as recent BioNER systems (Alfred et al., 2014).

Although it is a very important task to list all model structure rules of BioNER, handcrafted

techniques of this magnitude always entail high cost of system engineering.

Dictionary-based approaches are considered as the basic approach, and these are highly

dependent on existing biological vocabularies and lexicons. Generally, the dictionary-based

method is used to identify the biomedical entity hidden in the text. Basically, if the term in a list

matches the word or group of words in a document, then it is identiﬁed as an entity. Their

performance and simplicity make these systems usable more extensively. Though this method

is found to be extremely reliable, it has a weak recall. A BioNER framework based on dictionary

methods can extract certain biological entities from the biological text which are described in

a dictionary. However, these dictionary-based methods are incapable of handling biological

entities, which are not present in the dictionary and usually cause low-recalls (Tasneem and

Archana, 2016). Tuason et al. (2004) reported that errors in spelling and diﬀerences in character

and word level caused low recall. The problem with the ﬁxed-length vocabulary is that it is of

ﬁxed size. New terms are added very rapidly by researchers and scientists communities across

the world, and rendering the majority of such vocabulary obsolete is quite diﬃcult.

Low precision and recall mentioned in the dictionary-based methods has required several

improvements. Anexample is the creation of orthographic variations to obtain the terms for

a biomedical resource and to incorporate them in the primary lists (Tsuruoka and Tsujii,

2003). The extendedlist can be used thereafter to do exact matchingof strings. While most of

these improvements were tested, dictionary-based methods are frequently paired with

advanced methods of named entity recognition (NER). Statistical machine learning methods

consider BioNERto be an issue of sequence labeling, where the goalis to determine the right

sequence of labels for a speciﬁc input sentence. Theoretically, Hidden Markov models

(HMMs) established by Deng et al. (2017) and promoted by others have strong modeling

ability to the time signal analysis, so much so as to become a research hotspot. HMMs

generallydeal with time-series data, whichhave been successfully used in speechrecognition,

behavior recognition, character recognition and fault diagnosis. The maximum entropy

models or Maximum Entropy Markov Models provide a probabilistic framework that can

combine diverse pieces of contextual evidence to estimate the probability of a certain class.

The essential principle behind the maximum entropy approach is to create a model that

satisﬁes all knownconstraints; however, it treatsthe unknowns uniformly (Dong et al.,2005).

The conditionalrandom ﬁelds (CRFs) approachis used as a sequence labeling, in which CRFs

act as a model,and the probability distribution is a functionof variables, which are dependent

on both observation features and state transition. This model predicts the most likely label

sequence of a given observation set, and under conditional independence between

observations, it can use any arbitrary observational feature (Lee et al.,2018). Support

ABEE for

biomedical

NER task

223

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

ABEE: automated bio entity extraction from biomedical text documents

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users