BMC Medical Informatics and Decision Making - Latest Articles
The latest research articles published by BMC Medical Informatics and Decision Making

  • Identification of pneumonia and influenza deaths using the death certificate pipeline
    Background: Death records are a rich source of data, which can be used to assist with public surveillance and/or decision support. However, to use these type of data for such purposes it has to be transformed into a coded format to make it computable. Because the cause of death in the certificates is reported as free text, encoding the data is currently the single largest barrier of using death certificates for surveillance. Therefore, the purpose of this study was to demonstrate the feasibility of using a pipeline, composed of a detection rule and a natural language processor, for the real time encoding of death certificates using the identification of pneumonia and influenza cases as an example and demonstrating that its accuracy is comparable to existing methods. Results: A Death Certificates Pipeline (DCP) was developed to automatically code death certificates and identify pneumonia and influenza cases. The pipeline used MetaMap to code death certificates from the Utah Department of Health for the year 2008. The output of MetaMap was then accessed by detection rules which flagged pneumonia and influenza cases based on the Centers of Disease and Control and Prevention (CDC) case definition. The output from the DCP was compared with the current method used by the CDC and with a keyword search. Recall, precision, positive predictive value and F-measure with respect to the CDC method were calculated for the two other methods considered here. The two different techniques compared here with the CDC method showed the following recall/ precision results: DCP: 0.998/0.98 and keyword searching: 0.96/0.96. The F-measure were 0.99 and 0.96 respectively (DCP and keyword searching). Both the keyword and the DCP can run in interactive form with modest computer resources, but DCP shows superior performance. Conclusion: The pipeline proposed here for coding death certificates and the detection of cases is feasible and can be extended to other conditions. This method provides an alternative that allows for coding free-text death certificates in real time that may increase its utilization not only in the public health domain but also for biomedical researchers and developers.Trial RegistrationThis study did not involved any clinical trials.

  • Recognition of medication information from discharge summaries using ensembles of classifiers
    Background: Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. Methods: We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. Results: Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. Conclusions: Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.

  • Leveraging H1N1 infection transmission modeling with proximity sensor microdata
    Background: The contact networks between individuals can have a profound impact on the evolution of aninfectious outbreak within a network. The impact of the interaction between contact networkand disease dynamics on infection spread has been investigated using both synthetic andempirically gathered micro-contact data, establishing the utility of micro-contact data forepidemiological insight. However, the infection models tied to empirical contact data werehighly stylized and were not calibrated or compared against temporally coincident infectionrates, or omitted critical non-network based risk factors such as age or vaccination status. Methods: In this paper we present an agent-based simulation model firmly grounded in diseasedynamics, incorporating a detailed characterization of the natural history of infection, and 13weeks worth of micro-contact and participant health and risk factor information gatheredduring the 2009 H1N1 flu pandemic. Results: We demonstrate that the micro-contact data-based model yields results consistent with thecase counts observed in the study population, derive novel metrics based on the logarithm ofthe time degree for evaluating individual risk based on contact dynamic properties, andpresent preliminary findings pertaining to the impact of internal network structures on thespread of disease at an individual level. Conclusions: Through the analysis of detailed output of Monte Carlo ensembles of agent based simulationswe were able to recreate many possible scenarios of infection transmission using anempirically grounded dynamic contact network, providing a validated and groundedsimulation framework and methodology. We confirmed recent findings on the importance ofcontact dynamics, and extended the analysis to new measures of the relative risk of differentcontact dynamics. Because exponentially more time spent with others correlates to a linearincrease in infection probability, we conclude that network dynamics have an important, butnot dominant impact on infection transmission for H1N1 transmission in our studypopulation.

  • Identification of methicillin-resistant Staphylococcus aureus within the Nation's Veterans Affairs Medical Centers using natural language processing
    Background: Accurate information is needed to direct healthcare systems' efforts to control methicillinresistantStaphylococcus aureus (MRSA). Assembling complete and correct microbiologydata is vital to understanding and addressing the multiple drug-resistant organisms in ourhospitals. Methods: Herein, we describe a system that securely gathers microbiology data from the Department ofVeterans Affairs (VA) network of databases. Using natural language processing methods, weapplied an information extraction process to extract organisms and susceptibilities from thefree-text data. We then validated the extraction against independently derived electronic dataand expert annotation. Results: We estimate that the collected microbiology data are 98.5% complete and that methicillinresistantStaphylococcus aureus was extracted accurately 99.7% of the time. Conclusions: Applying natural language processing methods to microbiology records appears to be apromising way to extract accurate and useful nosocomial pathogen surveillance data. Bothscientific inquiry and the data's reliability will be dependent on the surveillance system'scapability to compare from multiple sources and circumvent systematic error. The datasetconstructed and methods used for this investigation could contribute to a comprehensiveinfectious disease surveillance system or other pressing needs.

  • Studying the potential impact of automated document classification on scheduling a systematic review update
    Background: Systematic Reviews (SRs) are an essential part of evidence-based medicine, providingsupport for clinical practice and policy on a wide range of medical topics. However,producing SRs is resource-intensive, and progress in the research they review leads to SRsbecoming outdated, requiring updates. Although the question of how and when to update SRshas been studied, the best method for determining when to update is still unclear,necessitating further research. Methods: In this work we study the potential impact of a machine learning-based automated system forproviding alerts when new publications become available within an SR topic. Some of thesenew publications are especially important, as they report findings that are more likely toinitiate a review update. To this end, we have designed a classification algorithm to identifyarticles that are likely to be included in an SR update, along with an annotation schemedesigned to identify the most important publications in a topic area. Using an SR databasecontaining over 70,000 articles, we annotated articles from 9 topics that had received anupdate during the study period. The algorithm was then evaluated in terms of the overallcorrect and incorrect alert rate for publications meeting the topic inclusion criteria, as well asin terms of its ability to identify important, update-motivating publications in a topic area. Results: Our initial approach, based on our previous work in topic-specific SR publicationclassification, identifies over 70% of the most important new publications, while maintaininga low overall alert rate. Conclusions: We performed an initial analysis of the opportunities and challenges in aiding the SR updateplanning process with an informatics-based machine learning approach. Alerts could be auseful tool in the planning, scheduling, and allocation of resources for SR updates, providingan improvement in timeliness and coverage for the large number of medical topics needingSRs. While the performance of this initial method is not perfect, it could be a usefulsupplement to current approaches to scheduling an SR update. Approaches specificallytargeting the types of important publications identified by this work are likely to improve Results:


About ResearchInformatics.org

ResearchInformatics.org is an open-access portal for discussion, information sharing, and collaboration among those working to advance the rapidly developing field of clinical research informatics (CRI). We hope that you find the content useful and that you use our interactive features to contribute your knowledge and experience for the benefit of our community. If you haven't already done so, please register to take full advantage of the site's resources.