natural language processing


Summary: Computer processing of a language with rules that reflect and describe current usage rather than prescribed usage.

Top Publications

  1. Ananthakrishnan A, Cai T, Savova G, Cheng S, Chen P, Perez R, et al. Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19:1411-20 pubmed publisher
    ..disease leveraging the combination of codified data and information from clinical text notes using natural language processing. Using the electronic medical records of 2 large academic centers, we created data marts for Crohn's ..
  2. Speier W, Arnold C, Lu J, Taira R, Pouratian N. Natural language processing with dynamic classification improves P300 speller accuracy and bit rate. J Neural Eng. 2012;9:016004 pubmed publisher
    ..With integration of natural language processing, we observed significant improvements in accuracy and 40-60% increases in bit rate for all six subjects ..
  3. Peissig P, Rasmussen L, Berg R, Linneman J, McCarty C, Waudby C, et al. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc. 2012;19:225-34 pubmed publisher
    ..We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify ..
  4. Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, et al. Concept annotation in the CRAFT corpus. BMC Bioinformatics. 2012;13:161 pubmed publisher
    ..The corpus, annotation guidelines, and other associated resources are freely available at ..
  5. Elkin P, Froehling D, Wahner Roedler D, Brown S, Bailey K. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med. 2012;156:11-8 pubmed publisher
    ..influenza virus infection in patients with upper respiratory tract symptoms, and the ability of a natural language processing technique to identify definitional clinical features from free-text encounter notes...
  6. Li Y, Salmasian H, Harpaz R, Chase H, Friedman C. Determining the reasons for medication prescriptions in the EHR using knowledge and natural language processing. AMIA Annu Symp Proc. 2011;2011:768-76 pubmed
    ..The method utilizes drug-indication knowledge that we acquired, and natural language processing. Evaluation showed the method obtained a sensitivity of 62.8%, specificity of 93...
  7. Ferraro J, Daumé H, DuVall S, Chapman W, Harkema H, Haug P. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation. J Am Med Inform Assoc. 2013;20:931-9 pubmed publisher
    b>Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives...
  8. Bejan C, Vanderwende L, Wurfel M, Yetisgen Yildiz M. Assessing pneumonia identification from time-ordered narrative reports. AMIA Annu Symp Proc. 2012;2012:1119-28 pubmed
    In this paper, we present a natural language processing system that can be used in hospital surveillance applications with the purpose of identifying patients with pneumonia...
  9. Garla V, Brandt C. Ontology-guided feature engineering for clinical text classification. J Biomed Inform. 2012;45:992-8 pubmed publisher
    ..We have released all tools developed as part of this study as open source, available at ..

More Information


  1. Ferrández O, South B, Shen S, Friedlin F, Samore M, Meystre S. BoB, a best-of-breed automated text de-identification system for VHA clinical documents. J Am Med Inform Assoc. 2013;20:77-83 pubmed publisher
    ..Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods...
  2. Kang N, Singh B, Afzal Z, van Mulligen E, Kors J. Using rule-based natural language processing to improve disease normalization in biomedical text. J Am Med Inform Assoc. 2013;20:876-81 pubmed publisher
    ..In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization...
  3. Zheng K, Mei Q, Yang L, Manion F, Balis U, Hanauer D. Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing. AMIA Annu Symp Proc. 2011;2011:1630-8 pubmed
    ..Such differences could have a significant impact on the performance of natural language processing tools, necessitating these two different types of documents being differentially treated.
  4. Grouin C, Deleger L, Rosier A, Temal L, Dameron O, Van Hille P, et al. Automatic computation of CHA2DS2-VASc score: information extraction from clinical texts for thromboembolism risk assessment. AMIA Annu Symp Proc. 2011;2011:501-10 pubmed
    ..In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical ..
  5. Lehman L, Saeed M, Long W, Lee J, Mark R. Risk stratification of ICU patients using topic models inferred from unstructured progress notes. AMIA Annu Symp Proc. 2012;2012:505-11 pubmed
    ..72 was achieved. Thus, the clinical topics that were extracted and used to augment the SAPS-I algorithm significantly improved the performance of the baseline algorithm. ..
  6. Liu M, Shah A, Jiang M, Peterson N, Dai Q, Aldrich M, et al. A study of transportability of an existing smoking status detection module across institutions. AMIA Annu Symp Proc. 2012;2012:577-86 pubmed
    ..b>Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module ..
  7. Thompson W, Rasmussen L, Pacheco J, Peissig P, Denny J, Kho A, et al. An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms. AMIA Annu Symp Proc. 2012;2012:911-20 pubmed
    ..However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria. ..
  8. Zhai H, Lingren T, Deleger L, Li Q, Kaiser M, Stoutenborough L, et al. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. J Med Internet Res. 2013;15:e73 pubmed publisher
    A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard...
  9. Hunter J, Freer Y, Gatt A, Reiter E, Sripada S, Sykes C. Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse. Artif Intell Med. 2012;56:157-72 pubmed publisher
    ..However, it proved difficult to handle electronic data that was intended primarily for display to the medical staff, and considerable engineering effort would be required to create a deployable system from our proof-of-concept software. ..
  10. Toyabe S. Detecting inpatient falls by using natural language processing of electronic medical records. BMC Health Serv Res. 2012;12:448 pubmed publisher
    ..determine whether it is possible to promptly detect serious injuries after inpatient falls by using a natural language processing method and to determine which data source is the most suitable for this purpose...
  11. Kilicoglu H, Bergler S. Biological event composition. BMC Bioinformatics. 2012;13 Suppl 11:S7 pubmed publisher
    In recent years, biological event extraction has emerged as a key natural language processing task, aiming to address the information overload problem in accessing the molecular biology literature...
  12. GARVIN J, DuVall S, South B, Bray B, Bolton D, Heavirland J, et al. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. J Am Med Inform Assoc. 2012;19:859-66 pubmed publisher
    ..Our goals were to build a natural language processing system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to ..
  13. Percha B, Nassif H, Lipson J, Burnside E, Rubin D. Automatic classification of mammography reports by BI-RADS breast tissue composition class. J Am Med Inform Assoc. 2012;19:913-6 pubmed publisher
    ..Since large-scale studies of breast cancer rely heavily on breast tissue composition information, this method could facilitate this research by helping mine large datasets to correlate breast composition with other covariates. ..
  14. Dublin S, Baldwin E, Walker R, Christensen L, Haug P, Jackson M, et al. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22:834-41 pubmed publisher
    This study aimed to develop Natural Language Processing (NLP) approaches to supplement manual outcome validation, specifically to validate pneumonia cases from chest radiograph reports...
  15. Workman T, Stoddart J. Rethinking information delivery: using a natural language processing application for point-of-care data discovery. J Med Libr Assoc. 2012;100:113-20 pubmed publisher
    This paper examines the use of Semantic MEDLINE, a natural language processing application enhanced with a statistical algorithm known as Combo, as a potential decision support tool for clinicians...
  16. Rocktäschel T, Weidlich M, Leser U. ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics. 2012;28:1633-40 pubmed publisher
    ..1% on the SCAI corpus, outperforming the only other freely available chemical NER tool, OSCAR4, by 10.8 percentage points. ChemSpot is freely available at: ..
  17. Cui L, Bozorgi A, Lhatoo S, Zhang G, Sahoo S. EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. AMIA Annu Symp Proc. 2012;2012:1191-200 pubmed
    ..By extending the cTAKES natural language processing tool developed at the Mayo Clinic, EpiDEA implements specialized functions to address the unique ..
  18. Salmasian H, Freedberg D, Friedman C. Deriving comorbidities from medical records using natural language processing. J Am Med Inform Assoc. 2013;20:e239-42 pubmed publisher
    ..We processed the notes using the MedLEE natural language processing system, and wrote queries to extract comorbidities automatically from its structured output...
  19. Rea S, Pathak J, Savova G, Oniki T, Westberg L, Beebe C, et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform. 2012;45:763-71 pubmed publisher
    ..Based on the demonstration, observed challenges for standardization of EHR data for interoperable secondary use are discussed. ..
  20. Bejan C, Xia F, Vanderwende L, Wurfel M, Yetisgen Yildiz M. Pneumonia identification using statistical feature selection. J Am Med Inform Assoc. 2012;19:817-23 pubmed publisher
    This paper describes a natural language processing system for the task of pneumonia identification...
  21. Tablan V, Roberts I, Cunningham H, Bontcheva K. a platform for large-scale, open-source text processing on the cloud. Philos Trans A Math Phys Eng Sci. 2013;371:20120071 pubmed publisher
    ..In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud ..
  22. Albright D, Lanfranchi A, Fredriksen A, Styler W, Warner C, Hwang J, et al. Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc. 2013;20:922-30 pubmed publisher
    ..clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components...
  23. Wagholikar K, MacLaughlin K, Henry M, Greenes R, Hankey R, Liu H, et al. Clinical decision support with automated text processing for cervical cancer screening. J Am Med Inform Assoc. 2012;19:833-9 pubmed publisher
    ..Overall, the study demonstrates that free text in the EMR can be effectively utilized through natural language processing to develop clinical decision support tools.
  24. Xu Y, Liu J, Wu J, Wang Y, Tu Z, Sun J, et al. A classification approach to coreference in discharge summaries: 2011 i2b2 challenge. J Am Med Inform Assoc. 2012;19:897-905 pubmed publisher
    ..The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries. ..
  25. Wu S, Liu H, Li D, Tao C, Musen M, Chute C, et al. Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. J Am Med Inform Assoc. 2012;19:e149-56 pubmed
    ..The semantic groups of mapped terms may differ slightly from institution to institution, but they differ greatly when moving to the biomedical literature domain. ..
  26. Xu H, Fu Z, Shah A, Chen Y, Peterson N, Chen Q, et al. Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA Annu Symp Proc. 2011;2011:1564-72 pubmed
    ..This paper describes an algorithm combining machine learning and natural language processing to detect patients with colorectal cancer (CRC) from entire EHRs at Vanderbilt University Hospital...
  27. Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013;20:806-13 pubmed publisher
    The Sixth Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing Challenge for Clinical Records focused on the temporal relations in clinical narratives...
  28. Carroll R, Thompson W, Eyler A, Mandelin A, Cai T, Zink R, et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012;19:e162-9 pubmed the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models...
  29. Kim J, Nguyen N, Wang Y, Tsujii J, Takagi T, Yonezawa A. The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011. BMC Bioinformatics. 2012;13 Suppl 11:S1 pubmed publisher
    ..Particularly, in terms of protein coreference resolution the best system achieved 34% in F-score. Detailed analysis performed on the results improves our insight into the problem and suggests the directions for further improvements. ..
  30. Verspoor K, Cohen K, Lanfranchi A, Warner C, Johnson H, Roeder C, et al. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics. 2012;13:207 pubmed publisher
    ..Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance ..
  31. Kang N, van Mulligen E, Kors J. Comparing and combining chunkers of biomedical text. J Biomed Inform. 2011;44:354-60 pubmed publisher
    ..The combination of chunker results by a simple voting scheme can further improve performance and allows for different precision-recall settings. ..
  32. Zweigenbaum P, Demner Fushman D, Yu H, Cohen K. Frontiers of biomedical text mining: current progress. Brief Bioinform. 2007;8:358-75 pubmed
    ..In this article we review the current state of the art in biomedical text mining or 'BioNLP' in general, focusing primarily on papers published within the past year. ..
  33. Morrison F, Li L, Lai A, Hripcsak G. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?. J Am Med Inform Assoc. 2009;16:37-9 pubmed publisher
    ..The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text. ..
  34. Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, Salakoski T. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics. 2008;9 Suppl 11:S2 pubmed publisher
    ..These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided. ..
  35. Rink B, Harabagiu S, Roberts K. Automatic extraction of relations between medical concepts in clinical texts. J Am Med Inform Assoc. 2011;18:594-600 pubmed publisher
    ..When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available. ..
  36. Pakhomov S, Jacobsen S, Chute C, Roger V. Agreement between patient-reported symptoms and their documentation in the medical record. Am J Manag Care. 2008;14:530-9 pubmed
    ..information forms between January 1, 2006, and June 30, 2006, were compared with those identified by natural language processing of the text of clinical notes from care providers...
  37. Kim H, Park H, Drake B. Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations. BMC Bioinformatics. 2007;8 Suppl 9:S6 pubmed
    ..Using known gene relationships of a given gene, we can determine the number of factors used in the reduced rank matrix and retrieve unrecognized genes related with the given gene by LSI/SVD or GR/NMF. ..
  38. Cohen K, Palmer M, Hunter L. Nominalization and alternations in biomedical language. PLoS ONE. 2008;3:e3158 pubmed publisher
    ..Nonetheless, the sublanguage model applies to biomedical language. We also report on a previously undescribed alternation involving an adjectival present participle. ..
  39. Doan S, Kawazoe A, Conway M, Collier N. Towards role-based filtering of disease outbreak reports. J Biomed Inform. 2009;42:773-80 pubmed publisher
    ..We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task. ..
  40. D Avolio L, Nguyen T, Farwell W, Chen Y, Fitzmeyer F, Harris O, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc. 2010;17:375-82 pubmed publisher
    ..set, the automated retrieval console (ARC) iteratively calculated performance of combinations of natural language processing-derived features and supervised classification algorithms...
  41. Agarwal S, Yu H. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion. Bioinformatics. 2009;25:3174-80 pubmed publisher
    ..95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at- ..
  42. Aramaki E, Miura Y, Tonoike M, Ohkuma T, Masuichi H, Waki K, et al. Extraction of adverse drug effects from clinical records. Stud Health Technol Inform. 2010;160:739-43 pubmed
    ..information is contained in records, and (2) automatic extracting accuracy of the current standard Natural Language Processing (NLP) system. Results revealed that 7...
  43. Jimeno A, Jimenez Ruiz E, Lee V, Gaudan S, Berlanga R, Rebholz Schuhmann D. Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics. 2008;9 Suppl 3:S3 pubmed publisher
    ..Bioinformatics 2008, 24:296-298). ..
  44. Demner Fushman D, Chapman W, McDonald C. What can natural language processing do for clinical decision support?. J Biomed Inform. 2009;42:760-72 pubmed publisher
    ..b>natural language processing (NLP) is instrumental in using free-text information to drive CDS, representing clinical knowledge and ..
  45. Wang X, Matthews M. Distinguishing the species of biomedical named entities for term identification. BMC Bioinformatics. 2008;9 Suppl 11:S6 pubmed publisher
    ..6%. This paper shows that, in the context of identifying terms involving multiple model organisms, integration of an accurate species disambiguation system can significantly improve the performance of term identification systems. ..
  46. Uzuner O, SOLTI I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc. 2010;17:514-8 pubmed publisher
    The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration ..
  47. Zhang H, Fiszman M, Shin D, Miller C, Rosemblat G, Rindflesch T. Degree centrality for semantic abstraction summarization of therapeutic studies. J Biomed Inform. 2011;44:830-8 pubmed publisher
    ..The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47). ..
  48. Chung G. Sentence retrieval for abstracts of randomized controlled trials. BMC Med Inform Decis Mak. 2009;9:10 pubmed publisher
    ..Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically ..
  49. Childs L, Enelow R, Simonsen L, Heintzelman N, Kowalski K, Taylor R. Description of a rule-based system for the i2b2 challenge in natural language processing for clinical data. J Am Med Inform Assoc. 2009;16:571-5 pubmed publisher
    ..The authors describe their methodology and discuss the results of applying Lockheed Martin's rule-based natural language processing (NLP) capability, ClinREAD...
  50. Stevenson M, Guo Y. Disambiguation in the biomedical domain: the role of ambiguity type. J Biomed Inform. 2010;43:972-81 pubmed publisher
    ..Analysis reveals that these differences are caused by the nature of each ambiguity type. These results should be taken into account when deciding which information to use for WSD and the level of performance that can be expected. ..
  51. Fan J, Friedman C. Semantic reclassification of the UMLS concepts. Bioinformatics. 2008;24:1971-3 pubmed publisher
    ..The new classification is useful for auditing the original UMLS semantic classification and for building biomedical text mining applications. ..
  52. Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel H. Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics. 2008;9:207 pubmed publisher
    ..Current work is focused on improving the accuracy of detection of entities as well as entity boundaries, which will also greatly improve the relation extraction performance. ..
  53. Nadkarni P, Ohno Machado L, Chapman W. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18:544-51 pubmed publisher
    To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design...