Power your Literature Surveillance with AI-driven automation
arrow-left Back to Blogs
Nov 29th, 2023
Power your Literature Surveillance with AI-driven Automation 

In the highly regulated landscape of pharmacovigilance (PV), ensuring patient safety and health outcomes is paramount. With increasing focus on post-marketing drug surveillance and drug safety, literature monitoring has taken a center spot in PV. Literature surveillance involves monitoring scientific literature on adverse drug reactions and safety concerns related to pharmaceutical products.

Challenges in Medical Literature Monitoring

According to a report by Elsevier, literature is quoted as the fourth largest source for Adverse Events Reporting. As per the SJR dataset, more than 1.4 million medical articles were published in 2022, averaging 4000 articles per day. The ever-growing volume of scientific literature presents challenges in monitoring all pertinent publications for a product. Moreover, collecting relevant articles from multiple databases leads to duplicate references. These duplicate entries can skew data analysis and reporting, therefore vigilant deduplication processes become mandatory to maintain data accuracy and integrity.    

Further, the regulatory authorities mandate specific timelines for reporting the cases; in such a scenario, logging receipts of references is critical. Tagging day zero for newly identified references is essential for tracking safety signals accurately. However, defining the start date can be ambiguous, especially when references become available at different times through various sources. 

Addressing these challenges and ensuring comprehensive literature surveillance demands significant manual efforts. The initial phase of examining articles that may contain pertinent information involves creating search strategies within extensive literature databases. This process ensures that pharmacovigilance teams can have a high level of certainty in capturing all pertinent literature. Subsequently, an individual evaluates the abstracts to determine if obtaining the full article is necessary. PV experts have grappled with labor-intensive tasks, including manual search strategy creation, database integration, de-duplication, logging receipts, case identification and much more. These tasks are not only cumbersome but also susceptible to human error. 

DF Literature Monitor – Datafoundry’s AI and ML-Powered Solution for Literature Surveillance

At Datafoundry, we have harnessed the power of Artificial Intelligence (AI) to automate the literature monitoring process through our product – DF Literature Monitor. Datafoundry has developed algorithms for Optical Character Recognition (OCR), Named Entity Recognition (NER), Relationship Extraction (RE), Natural Language Understanding (NLU), and Natural Language Generation (NLG) using a collection of biomedical datasets comprising more than 30 million records.   

In an earlier article, we spoke about the various aspects of literature monitoring where we have achieved accuracy and efficiency by automating the process with minimal manual intervention. Here, we will delve into the critical task of identifying the most relevant articles for case creation.

When conducting a drug-related search across multiple literature databases, it consistently yields results numbering in the thousands, and occasionally even tens of thousands of articles. Identifying a valid case within the extensive corpus of published literature can be as challenging a task as locating a needle within a haystack. 

To overcome this challenge, one creates a search strategy. There are several references available on the European Medicines Agency (EMA) website, which guide the creation of the most optimal search strategy. One of the most crucial tasks in literature surveillance is the creation and maintenance of search strategies for multiple databases, as the syntax for the search string in each database varies. These strategies serve as the compass guiding PV experts towards relevant articles. However, they demand constant assessment and updates to ensure that no critical information slips through the cracks. In our quest to simplify this process, while ensuring workflow efficiency and robustness, we built: 

The Intelligent Search Engine in DF Literature Monitor

At its core, DF Literature Monitor’s Intelligent Search Engine has a three-level curated process to ensure minimal manual intervention and maximum efficiency: 

1. Multi-Database Article Ingestion

  • a. Intelligent Search Algorithm
  • b. Article Ingestion Frequency Management
  • c. Ingested Article Deduplication

2. ICSR-Relevant Article Identification 

  • a. Named Entity Recognition (NER) Implementation
  • b. Relation Extraction (RE) Model Usage
  • c. Article Tagging and Display Ranking

3. Continuous Learning and Improvement 


Multi-Database Article Ingestion

Natural Language Processing (NLP) techniques are used to understand and extract valuable insights from medical literature. The process initiates with minimal input from the user by triggering the search engine to identify and extract relevant articles. The seamless integration with 20+ open access scientific and medical literature data bases covers 1000s of journals world-wide and creates a consolidated view of the PV-relevant articles. The databases include PubMed Central, BioMed Central and Elsevier, among others. 

Intelligent Search Engine   

The AI-driven system ensures that the hassle of search strategy creation can be replaced by a smart search engines that extracts articles and assigns a relevancy-based ranking to these articles. Behind this process lies a Machine Learning (ML) algorithm with the ability to fetch articles relevant to a product from all configured databases. The algorithm leverages minimal input data, such as drug indications and known adverse events, to ingest pertinent articles. It navigates this vast sea of data with precision and speed, ensuring maximum coverage while drastically reducing the risk of missing critical information buried within the digital haystack. 

Article Ingestion Frequency Management

Regulatory bodies mandate article searches to be conducted at specified intervals, for example, EMA mandates article searches to be conducted no less frequently than once a week for specific products. DF Literature Monitor ingests articles daily from all the configured sources. It allows users to schedule article listing on a timeline that suits their regulatory needs, ranging from daily to yearly. It ensures that the automated process of AI-processing and listing the latest articles is conducted at user-defined intervals. This not only enhances efficiency but also ensures that PV experts are consistently armed with the most up-to-date information, supporting proactive decision-making in the specified time limit.



Article Deduplication 

Multi-pronged article deduplication technique is used in DF Literature Monitor to identify and remove duplicate articles from multiple databases. While ingesting the latest relevant articles, the algorithm extracts descriptive metadata such as title, abstract and authors. The metadata information is utilised to eliminate duplicate articles while indexing only the unique articles. Automating de-duplication ensures that the burden of manual de-duplication is eliminated, allowing safety surveillance teams to make the most of available scientific literature.  

ICSR-relevant Article Identification

DF Literature Monitor processes the ingested articles to identify minimum safety information (MSI) such as identifiable patients, suspect drugs, adverse events, and a primary reporter. The article processing uses a custom ML algorithm that utilizes Named Entity Recognition (NER) and Relation Extraction pipeline to ensure that the articles with potential adverse events are tagged and listed on the top of the listing screen. 

Named Entity Recognition (NER) Implementation

NER involves the identification and classification of specific entities within the text, such as names of suspect drugs, patient’s information (age, age group and gender), reported adverse events and primary reporter name. In this case, NER is applied to thoroughly scrutinize the article’s title, abstract, and full text, effectively extracting critical MSI. This process ensures that relevant entities are adequately tagged, laying the foundation for identifying potential adverse events. 

Named Entity Recognition

Relation Extraction (RE) Model Usage

RE operates in tandem with NER. RE focuses on understanding the relationships between the identified entities. For instance, it helps establish the connection between a suspect drug and an adverse event, using hypergraph-based machine learning. Hypergraphs model higher-order relationships where multiple elements can be connected simultaneously.  

Article Tagging and Display Ranking

Utilizing the NER and RE models, DF Literature Monitor tags the articles with extracted entities and determines relevancy. As a result, articles that provide the most relevant and crucial safety information rise to the top of the listing, thereby optimizing the efficiency and productivity of PV experts by prioritizing their review. 

Article Tagging and Display Ranking

Continuous Learning and Improvement

Datafoundry’s AI/ML models are built with advanced continuous learning capabilities. The self-learning models incrementally update their internal parameters and knowledge base when they are exposed to new data. They employ online learning techniques to efficiently integrate incoming data streams, adapt to evolving data distributions, and address concept drift. Additionally, memory management mechanisms ensure that pertinent information is retained while discarding obsolete data. This empowers our models to enhance their performance over time, making them particularly suited for applications where data is dynamic and ever evolving, such as medical literature. 

Datafoundry’s commitment to data-driven excellence in patient safety and health outcomes is evident in its pioneering use of AI and ML technologies in various aspects of Pharmacovigilance. By utilising intelligent search algorithm, integrating a multitude of databases, and identifying ICSR-relevant articles, Datafoundry is transforming the landscape of medical literature monitoring. PV experts can redirect their focus from strategy creation to higher-value tasks like data interpretation and decision-making. With the precision and efficiency of DF Literature Monitor, safety vigilance teams can truly transform literature surveillance.  

Understanding and Harnessing the Exponential Growth of Life Sciences Data  

Read Next