Alejandro Rodríguez-González*, Roberto Costumero, Marcos Martinez-Romero, Mark D. Wilkinson and Ernestina Menasalvas-Ruiz Pages 573 - 582 ( 10 )
Background: The development of diagnostic decision support systems (DDSS) requires having a reliable and consistent knowledge based on diseases and their symptoms, signs, and diagnostic tests. Physicians are typically the source of this knowledge but it is not always possible to obtain all the desired information from them. Other valuable sources are medical books and articles describing the diagnosis of diseases, but again, extracting this information is a hard and time-consuming task.
Objective: In this paper we present the results of our research to compare two well-known tools that are used to perform NLP in medical domain. In this context we have used these tools to perform the operation of Name Entity Recognition to extract diagnostic terms from texts contained in MedLine Plus articles.
Method: We have used Web scraping, natural language processing (NLP) techniques, a variety of publicly available sources of diagnostic knowledge and two widely known medical concept identifiers, MetaMap and cTAKES, to extract diagnostic criteria for infectious diseases from MedLine Plus articles.
Results: A performance comparison of MetaMap and cTAKES is presented being visible that although the differences between both systems are not really significant there are some palpable differences in the results provided by the system.
Conclusion: The extraction of diagnostic terms is a very important task for the creation of databases with this information. The use of NLP systems capable of extraction, those terms from texts are very valuable tools that need to be implemented and evaluated in order to obtain the maximum accuracy on this process.
Diagnostic knowledge, information extraction, CDSS, DDSS, NLP, MetaMap, cTAKES.
Centro de Tecnologia Biomedica, Universidad Politecnica de Madrid, Pozuelo de Alarcon, Centro de Tecnologia Biomedica, Universidad Politecnica de Madrid, Pozuelo de Alarcon, Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, Centro de Biotecnologia y Genomica de Plantas UPM – INIA, Centro de Tecnologia Biomedica, Universidad Politecnica de Madrid, Pozuelo de Alarcon