It is estimated that top pharma companies and research institutions commission around 20 literature review projects each on an average per year. Regulatory agencies expecting a Literature Review section in key regulatory documents such as Safety Aggregate Reports, Clinical Evaluation Reports (CERs) for devices etc. Literature survey findings are used to strengthen the regulatory submissions for drugs, biologics and medical device authorizations. US FDA guidance requires literature surveys as sources of information on safety, efficacy and ‘other known uses’ information.
With 1000s of articles published across 100s of journals each year, medical reviewers spend days doing title and abstract review, followed by a scope refinement and then weeks in full article reviews, data extraction and synthesis of information, before setting out to write the literature review report. The reports could be lengthy (depending on the number of articles under review), and often require summarisation using charts and synopsis to be included in a regulatory submission and for executive review.
Literature databases such as PubMed, Embase, Google Scholar, Cochrane Reviews, Ovid and Science Direct require the researcher/librarian to create search strategies using database-specific search query structures and vocabulary. Conference abstracts and dissertations are another source that needs to be handled separately. The researchers generally create a search strategy first, review the results, refine the strategy and then review the titles and abstracts collected from multiple sources. Literature search software and reference/citation management tools are used in the process. And yet, the bulk of the process is manual. When there are different contributors and reviewers – the process could take several weeks for a single post-marketing annual report- not to speak of the challenge to constantly monitor the databases for new articles, and requesting procurement to buy articles from journals that are behind a pay wall.
This is where AI/ML technologies can significantly reduce the manual effort and improve the focus on the quality of the research. Activities such as searching from multiple literature databases on a frequent/real time basis, screening of abstracts and full text articles for relevancy, data extraction from selected sections of specific articles, compiling the extracted data into summary tables and graphs/charts – can be automated using powerful deep learning models based on bio-medical data sets. Add workflow automation and a collaborative authoring platform to the mix – one could give an entirely novel user experience to the medical affairs personnel. They could use the AI-assisted functionality and spend their time more productively – reviewing the results generated by the algorithm using dynamic filters, and writing up a narrative connecting the data-driven synthesis offered by the program.
At Datafoundry, we made a promising beginning on the possibilities of using AI/ML to improve literature monitoring and review. One of our customers, a leading consumer health products company has terabytes of clinical data. They wished to explore if data from the internal documents could be combined with literature articles from external sources, allowing for a more systematic and efficient literature review process. The Datafoundry machine learning group took up the challenge and designed a comprehensive literature search and review solution where most of the heavy lifting is performed by the deep learning models we call MediLP (Medical NLP). The OCR models used for digitizing data from scanned PDFs and images gave out 96% accuracy. The NER models could work with close to 100% accuracy to extract and structure the data set. The pilot application is successful and the team is now gearing up to build it into an enterprise-grade SaaS product.