Drug-induced Adverse Events

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.
BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.
Database (Oxford). 2016;2016:
Authors: Lee K, Lee S, Park S, Kim S, Kim S, Choi K, Tan AC, Kang J
Abstract
Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.Database URL:http://infos.korea.ac.kr/bronco.
PMID: 27074804 [PubMed - indexed for MEDLINE]
Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall.
Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall.
Database (Oxford). 2016;2016:
Authors: Lowe DM, O'Boyle NM, Sayle RA
Abstract
Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease relationships (CIDs). LeadMine is a dictionary/grammar-based entity recognizer and was used to recognize and normalize both chemicals and diseases to Medical Subject Headings (MeSH) IDs. The disease lexicon was obtained from three sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon. Composite entities (e.g. heart and lung disease) were detected and mapped to their composite MeSH IDs. For CIDs, we developed a simple pattern-based system to find relationships within the same sentence. Our system was evaluated in the BioCreative V Chemical-Disease Relation task and achieved very good results for both disease concept ID recognition (F1-score: 86.12%) and CIDs (F1-score: 52.20%) on the test set. As our system was over an order of magnitude faster than other solutions evaluated on the task, we were able to apply the same system to the entirety of MEDLINE allowing us to extract a collection of over 250 000 distinct CIDs.
PMID: 27060160 [PubMed - indexed for MEDLINE]
Natural language processing to ascertain two key variables from operative reports in ophthalmology.
Natural language processing to ascertain two key variables from operative reports in ophthalmology.
Pharmacoepidemiol Drug Saf. 2017 Jan 03;:
Authors: Liu L, Shorstein NH, Amsden LB, Herrinton LJ
Abstract
PURPOSE: Antibiotic prophylaxis is critical to ophthalmology and other surgical specialties. We performed natural language processing (NLP) of 743 838 operative notes recorded for 315 246 surgeries to ascertain two variables needed to study the comparative effectiveness of antibiotic prophylaxis in cataract surgery. The first key variable was an exposure variable, intracameral antibiotic injection. The second was an intraoperative complication, posterior capsular rupture (PCR), which functioned as a potential confounder. To help other researchers use NLP in their settings, we describe our NLP protocol and lessons learned.
METHODS: For each of the two variables, we used SAS Text Miner and other SAS text-processing modules with a training set of 10 000 (1.3%) operative notes to develop a lexicon. The lexica identified misspellings, abbreviations, and negations, and linked words into concepts (e.g. "antibiotic" linked with "injection"). We confirmed the NLP tools by iteratively obtaining random samples of 2000 (0.3%) notes, with replacement.
RESULTS: The NLP tools identified approximately 60 000 intracameral antibiotic injections and 3500 cases of PCR. The positive and negative predictive values for intracameral antibiotic injection exceeded 99%. For the intraoperative complication, they exceeded 94%.
CONCLUSION: NLP was a valid and feasible method for obtaining critical variables needed for a research study of surgical safety. These NLP tools were intended for use in the study sample. Use with external datasets or future datasets in our own setting would require further testing. Copyright © 2017 John Wiley & Sons, Ltd.
PMID: 28052483 [PubMed - as supplied by publisher]
An Evolving Ecosystem for Natural Language Processing in Department of Veterans Affairs.
An Evolving Ecosystem for Natural Language Processing in Department of Veterans Affairs.
J Med Syst. 2017 Feb;41(2):32
Authors: Garvin JH, Kalsy M, Brandt C, Luther SL, Divita G, Coronado G, Redd D, Christensen C, Hill B, Kelly N, Treitler QZ
Abstract
In an ideal clinical Natural Language Processing (NLP) ecosystem, researchers and developers would be able to collaborate with others, undertake validation of NLP systems, components, and related resources, and disseminate them. We captured requirements and formative evaluation data from the Veterans Affairs (VA) Clinical NLP Ecosystem stakeholders using semi-structured interviews and meeting discussions. We developed a coding rubric to code interviews. We assessed inter-coder reliability using percent agreement and the kappa statistic. We undertook 15 interviews and held two workshop discussions. The main areas of requirements related to; design and functionality, resources, and information. Stakeholders also confirmed the vision of the second generation of the Ecosystem and recommendations included; adding mechanisms to better understand terms, measuring collaboration to demonstrate value, and datasets/tools to navigate spelling errors with consumer language, among others. Stakeholders also recommended capability to: communicate with developers working on the next version of the VA electronic health record (VistA Evolution), provide a mechanism to automatically monitor download of tools and to automatically provide a summary of the downloads to Ecosystem contributors and funders. After three rounds of coding and discussion, we determined the percent agreement of two coders to be 97.2% and the kappa to be 0.7851. The vision of the VA Clinical NLP Ecosystem met stakeholder needs. Interviews and discussion provided key requirements that inform the design of the VA Clinical NLP Ecosystem.
PMID: 28050745 [PubMed - in process]
Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing.
Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing.
J Digit Imaging. 2017 Jan 03;:
Authors: Hassanpour S, Bay G, Langlotz CP
Abstract
We built a natural language processing (NLP) method to automatically extract clinical findings in radiology reports and characterize their level of change and significance according to a radiology-specific information model. We utilized a combination of machine learning and rule-based approaches for this purpose. Our method is unique in capturing different features and levels of abstractions at surface, entity, and discourse levels in text analysis. This combination has enabled us to recognize the underlying semantics of radiology report narratives for this task. We evaluated our method on radiology reports from four major healthcare organizations. Our evaluation showed the efficacy of our method in highlighting important changes (accuracy 99.2%, precision 96.3%, recall 93.5%, and F1 score 94.7%) and identifying significant observations (accuracy 75.8%, precision 75.2%, recall 75.7%, and F1 score 75.3%) to characterize radiology reports. This method can help clinicians quickly understand the key observations in radiology reports and facilitate clinical decision support, review prioritization, and disease surveillance.
PMID: 28050714 [PubMed - as supplied by publisher]
Classification of clinically useful sentences in clinical evidence resources.
Classification of clinically useful sentences in clinical evidence resources.
J Biomed Inform. 2016 Apr;60:14-22
Authors: Morid MA, Fiszman M, Raja K, Jonnalagadda SR, Del Fiol G
Abstract
UNLABELLED: Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care.
OBJECTIVE: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs.
MATERIALS AND METHODS: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure.
RESULTS: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62).
CONCLUSIONS: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.
PMID: 26774763 [PubMed - indexed for MEDLINE]
Investigation of the cutaneous penetration behavior of dexamethasone loaded to nano-sized lipid particles by EPR spectroscopy, and confocal Raman and laser scanning microscopy.
Investigation of the cutaneous penetration behavior of dexamethasone loaded to nano-sized lipid particles by EPR spectroscopy, and confocal Raman and laser scanning microscopy.
Eur J Pharm Biopharm. 2016 Dec 30;:
Authors: Lohan SB, Saeidpour S, Solik A, Schanzer S, Richter H, Dong P, Darvin ME, Bodmeier R, Patzelt A, Zoubari G, Unbehauen M, Haag R, Lademann J, Teutloff C, Bittl R, Meinke MC
Abstract
An improvement of the penetration efficiency combined with the controlled release of actives in the skin can facilitate the medical treatment of skin diseases immensely. Dexamethasone (Dx), a synthetic glucocorticoid, is frequently used for the treatment of inflammatory skin diseases. To investigate the penetration of nano-sized lipid particles (NLP) loaded with Dx in comparison to a commercially available base cream, different techniques were applied. Electron paramagnetic resonance (EPR) spectroscopy was used to monitor the penetration of Dx, which was covalently labeled with the spin probe 3-(Carboxy)-2,2,5,5-tetramethyl-1-pyrrolidinyloxy (PCA). The penetration into hair follicles was studied using confocal laser scanning microscopy (CLSM) with curcumin-loaded NLP. The penetration of the vehicle was followed by confocal Raman microscopy (CRM). Penetration studies using excised porcine skin revealed a more than twofold higher penetration efficiency for DxPCA into the stratum corneum (SC) after 24 h incubation compared to 4 h incubation when loaded to the NLP, whereas when applied in the base cream, almost no further penetration was observed beyond 4 h. The distribution of DxPCA within the SC was investigated by consecutive tape stripping. The release of DxPCA from the base cream after 24 h in deeper SC layers and the viable epidermis was shown by EPR. For NLP, no release from the carrier was observed, although DxPCA was detectable in the skin after the complete SC was removed. This phenomenon can be explained by the penetration of the NLP into the hair follicles. However, penetration profiles measured by CRM indicate that NLP did not penetrate as deep into the SC as the base cream formulation. In conclusion, NLP can improve the accumulation of Dx in the skin and provide a reservoir within the SC and in the follicular infundibula.
PMID: 28043865 [PubMed - as supplied by publisher]
Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.
Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.
Biomed Res Int. 2016;2016:6598307
Authors: Wang Y, Li R, Zhou Y, Ling Z, Guo X, Xie L, Liu L
Abstract
BACKGROUND: Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty.
RESULTS: Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods.
CONCLUSIONS: We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.
PMID: 27057545 [PubMed - indexed for MEDLINE]
Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing.
Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing.
Pain Med. 2016 Dec 29;:
Authors: Haller IV, Renier CM, Juusola M, Hitz P, Steffen W, Asmus MJ, Craig T, Mardekian J, Masters ET, Elliott TE
Abstract
OBJECTIVES: Clinical guidelines for the use of opioids in chronic noncancer pain recommend assessing risk for aberrant drug-related behaviors prior to initiating opioid therapy. Despite recent dramatic increases in prescription opioid misuse and abuse, use of screening tools by clinicians continues to be underutilized. This research evaluated natural language processing (NLP) together with other data extraction techniques for risk assessment of patients considered for opioid therapy as a means of predicting opioid abuse.
DESIGN: Using a retrospective cohort of 3,668 chronic noncancer pain patients with at least one opioid agreement between January 1, 2007, and December 31, 2012, we examined the availability of electronic health record structured and unstructured data to populate the Opioid Risk Tool (ORT) and other selected outcomes. Clinician-documented opioid agreement violations in the clinical notes were determined using NLP techniques followed by manual review of the notes.
RESULTS: Confirmed through manual review, the NLP algorithm had 96.1% sensitivity, 92.8% specificity, and 92.6% positive predictive value in identifying opioid agreement violation. At the time of most recent opioid agreement, automated ORT identified 42.8% of patients as at low risk, 28.2% as at moderate risk, and 29.0% as at high risk for opioid abuse. During a year following the agreement, 22.5% of patients had opioid agreement violations. Patients classified as high risk were three times more likely to violate opioid agreements compared with those with low/moderate risk. CONCLUSION : Our findings suggest that NLP techniques have potential utility to support clinicians in screening chronic noncancer pain patients considered for long-term opioid therapy.
PMID: 28034982 [PubMed - as supplied by publisher]
Temporal data representation, normalization, extraction, and reasoning: A review from clinical domain.
Temporal data representation, normalization, extraction, and reasoning: A review from clinical domain.
Comput Methods Programs Biomed. 2016 May;128:52-68
Authors: Madkour M, Benhaddou D, Tao C
Abstract
BACKGROUND AND OBJECTIVE: We live our lives by the calendar and the clock, but time is also an abstraction, even an illusion. The sense of time can be both domain-specific and complex, and is often left implicit, requiring significant domain knowledge to accurately recognize and harness. In the clinical domain, the momentum gained from recent advances in infrastructure and governance practices has enabled the collection of tremendous amount of data at each moment in time. Electronic health records (EHRs) have paved the way to making these data available for practitioners and researchers. However, temporal data representation, normalization, extraction and reasoning are very important in order to mine such massive data and therefore for constructing the clinical timeline. The objective of this work is to provide an overview of the problem of constructing a timeline at the clinical point of care and to summarize the state-of-the-art in processing temporal information of clinical narratives.
METHODS: This review surveys the methods used in three important area: modeling and representing of time, medical NLP methods for extracting time, and methods of time reasoning and processing. The review emphasis on the current existing gap between present methods and the semantic web technologies and catch up with the possible combinations.
RESULTS: The main findings of this review are revealing the importance of time processing not only in constructing timelines and clinical decision support systems but also as a vital component of EHR data models and operations.
CONCLUSIONS: Extracting temporal information in clinical narratives is a challenging task. The inclusion of ontologies and semantic web will lead to better assessment of the annotation task and, together with medical NLP techniques, will help resolving granularity and co-reference resolution problems.
PMID: 27040831 [PubMed - indexed for MEDLINE]
Blending water- and nutrient-source wastewaters for cost-effective cultivation of high lipid content microalgal species Micractinium inermum NLP-F014.
Blending water- and nutrient-source wastewaters for cost-effective cultivation of high lipid content microalgal species Micractinium inermum NLP-F014.
Bioresour Technol. 2015 Dec;198:388-94
Authors: Park S, Kim J, Yoon Y, Park Y, Lee T
Abstract
The possibility of utilizing blended wastewaters from different streams was investigated for cost-efficient microalgal cultivation. The influent of a domestic wastewater treatment plant and the liquid fertilizer from a swine wastewater treatment plant were selected as water- and nutrient-source wastewaters, respectively. The growth of Micractinium inermum NLP-F014 in the blended wastewater medium without any pretreatment was comparable to that in Bold's Basal Medium. The optimum blending ratio of 5-15% (vv(-1)) facilitated biomass production up to 5.7 g-dry cell weight (DCW) L(-1), and the maximum biomass productivity (1.03 g-DCWL(-1)d(-1)) was achieved after three days of cultivation. Nutrient depletion induced lipid accumulation in the cell up to 39.1% (ww(-1)) and the maximum lipid productivity was 0.19 g-FAMEL(-1)d(-1). These results suggest that blending water- and nutrient-source wastewaters at a proper ratio without pretreatment can significantly cut costs in microalgae cultivation for biodiesel production.
PMID: 26409109 [PubMed - indexed for MEDLINE]
Public Understanding of Science in turbulent times III: Deficit to dialogue, champions to critics.
Public Understanding of Science in turbulent times III: Deficit to dialogue, champions to critics.
Public Underst Sci. 2016 Feb;25(2):186-97
Authors: Smallman M
Abstract
As part of the 20th Anniversary of the Public Understanding of Science journal, the journal has been reflecting on how the field and journal have developed. This research note takes a closer look at some of the trends, considering the journal's 50 most cited papers and using IRaMuTeQ, an open-source computer text analysis technique. The research note presents data that show that the move within public engagement from deficit to dialogue has been followed by a further shift from championing dialogue to criticising its practice. This shift has taken place alongside a continued, but changing, interest in media coverage, surveys and models of public understanding.
PMID: 25234052 [PubMed - indexed for MEDLINE]
Detection of clinically important colorectal surgical site infection using Bayesian network.
Detection of clinically important colorectal surgical site infection using Bayesian network.
J Surg Res. 2016 Oct 05;209:168-173
Authors: Sohn S, Larson DW, Habermann EB, Naessens JM, Alabbad JY, Liu H
Abstract
BACKGROUND: Despite extensive efforts to monitor and prevent surgical site infections (SSIs), real-time surveillance of clinical practice has been sparse and expensive or nonexistent. However, natural language processing (NLP) and machine learning (i.e., Bayesian network analysis) may provide the methodology necessary to approach this issue in a new way. We investigated the ability to identify SSIs after colorectal surgery (CRS) through an automated detection system using a Bayesian network.
MATERIALS AND METHODS: Patients who underwent CRS from 2010 to 2012 and were captured in our institutional American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) comprised our cohort. A Bayesian network was applied to detect SSIs using risk factors from ACS-NSQIP data and keywords extracted from clinical notes by NLP. Two surgeons provided expertise informing the Bayesian network to identify clinically meaningful SSIs (CM-SSIs) occurring within 30 d after surgery.
RESULTS: We used data from 751 CRS cases experiencing 67 (8.9%) SSIs and 78 (10.4%) CM-SSIs. Our Bayesian network detected ACS-NSQIP-captured SSIs with a receiver operating characteristic area under the curve of 0.827, but this value increased to 0.892 when using surgeon-identified CM-SSIs.
CONCLUSIONS: A Bayesian network coupled with NLP has the potential to be used in real-time SSI surveillance. Moreover, surgeons identified CM-SSI not captured under current NSQIP definitions. Future efforts to expand CM-SSI identification may lead to improved and potentially automated approaches to survey for postoperative SSI in clinical practice.
PMID: 28032554 [PubMed - as supplied by publisher]
Seqenv: linking sequences to environments through text mining.
Seqenv: linking sequences to environments through text mining.
PeerJ. 2016;4:e2690
Authors: Sinclair L, Ijaz UZ, Jensen LJ, Coolen MJ, Gubry-Rangin C, Chroňáková A, Oulas A, Pavloudi C, Schnetzer J, Weimann A, Ijaz A, Eiler A, Quince C, Pafilis E
Abstract
Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
PMID: 28028456 [PubMed]
Machine learning to assist risk-of-bias assessments in systematic reviews.
Machine learning to assist risk-of-bias assessments in systematic reviews.
Int J Epidemiol. 2016 Feb;45(1):266-77
Authors: Millard LA, Flach PA, Higgins JP
Abstract
BACKGROUND: Risk-of-bias assessments are now a standard component of systematic reviews. At present, reviewers need to manually identify relevant parts of research articles for a set of methodological elements that affect the risk of bias, in order to make a risk-of-bias judgement for each of these elements. We investigate the use of text mining methods to automate risk-of-bias assessments in systematic reviews. We aim to identify relevant sentences within the text of included articles, to rank articles by risk of bias and to reduce the number of risk-of-bias assessments that the reviewers need to perform by hand.
METHODS: We use supervised machine learning to train two types of models, for each of the three risk-of-bias properties of sequence generation, allocation concealment and blinding. The first model predicts whether a sentence in a research article contains relevant information. The second model predicts a risk-of-bias value for each research article. We use logistic regression, where each independent variable is the frequency of a word in a sentence or article, respectively.
RESULTS: We found that sentences can be successfully ranked by relevance with area under the receiver operating characteristic (ROC) curve (AUC) > 0.98. Articles can be ranked by risk of bias with AUC > 0.72. We estimate that more than 33% of articles can be assessed by just one reviewer, where two reviewers are normally required.
CONCLUSIONS: We show that text mining can be used to assist risk-of-bias assessments.
PMID: 26659355 [PubMed - indexed for MEDLINE]
Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.
Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.
Methods Inf Med. 2015;54(6):548-52
Authors: Divita G, Carter M, Redd A, Zeng Q, Gupta K, Trautner B, Samore M, Gundlapalli A
Abstract
INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare".
OBJECTIVES: This paper describes the scale-up efforts at the VA Salt Lake City Health Care System to address processing large corpora of clinical notes through a natural language processing (NLP) pipeline. The use case described is a current project focused on detecting the presence of an indwelling urinary catheter in hospitalized patients and subsequent catheter-associated urinary tract infections.
METHODS: An NLP algorithm using v3NLP was developed to detect the presence of an indwelling urinary catheter in hospitalized patients. The algorithm was tested on a small corpus of notes on patients for whom the presence or absence of a catheter was already known (reference standard). In planning for a scale-up, we estimated that the original algorithm would have taken 2.4 days to run on a larger corpus of notes for this project (550,000 notes), and 27 days for a corpus of 6 million records representative of a national sample of notes. We approached scaling-up NLP pipelines through three techniques: pipeline replication via multi-threading, intra-annotator threading for tasks that can be further decomposed, and remote annotator services which enable annotator scale-out.
RESULTS: The scale-up resulted in reducing the average time to process a record from 206 milliseconds to 17 milliseconds or a 12- fold increase in performance when applied to a corpus of 550,000 notes.
CONCLUSIONS: Purposely simplistic in nature, these scale-up efforts are the straight forward evolution from small scale NLP processing to larger scale extraction without incurring associated complexities that are inherited by the use of the underlying UIMA framework. These efforts represent generalizable and widely applicable techniques that will aid other computationally complex NLP pipelines that are of need to be scaled out for processing and analyzing big data.
PMID: 26534722 [PubMed - indexed for MEDLINE]
Understanding the Relationship between Social Cognition and Word Difficulty. A Language Based Analysis of Individuals with Autism Spectrum Disorder.
Understanding the Relationship between Social Cognition and Word Difficulty. A Language Based Analysis of Individuals with Autism Spectrum Disorder.
Methods Inf Med. 2015;54(6):522-9
Authors: Aramaki E, Shikata S, Miyabe M, Usuda Y, Asada K, Ayaya S, Kumagaya S
Abstract
BACKGROUND: Few quantitative studies have been conducted on the relationship between society and its languages. Individuals with autistic spectrum disorder (ASD) are known to experience social hardships, and a wide range of clinical information about their quality of life has been provided through numerous narrative analyses. However, the narratives of ASD patients have thus far been examined mainly through qualitative approaches.
OBJECTIVES: In this study, we analyzed adults with ASD to quantitatively examine the relationship between language abilities and ASD severity scores.
METHODS: We generated phonetic transcriptions of speeches by 16 ASD adults at an ASD workshop, and divided the participants into 2 groups according to their Social Responsiveness Scale(TM), 2nd Edition (SRS(TM)-2) scores (where higher scores represent more severe ASD): Group A comprised high-scoring ASD adults (SRS(TM)-2 score: ≥ 76) and Group B comprised low- and intermediate-scoring ASD adults (SRS(TM)-2 score: < 76). Using natural language processing (NLP)-based analytical methods, the narratives were converted into numerical data according to four language ability indicators, and the relationships between the language ability scores and ASD severity scores were compared.
RESULTS AND DISCUSSION: Group A showed a marginally negative correlation with the level of Japanese word difficulty (p < .10), while the "social cognition" subscale of the SRS(TM)-2 score showed a significantly negative correlation (p < .05) with word difficulty. When comparing only male participants, Group A demonstrated a significantly lower correlation with word difficulty level than Group B (p < .10).
CONCLUSION: Social communication was found to be strongly associated with the level of word difficulty in speech. The clinical applications of these findings may be available in the near future, and there is a need for further detailed study on language metrics designed for ASD adults.
PMID: 26391807 [PubMed - indexed for MEDLINE]
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.
Database (Oxford). 2016;2016:
Authors: Singhal A, Leaman R, Catlett N, Lemberger T, McEntyre J, Polson S, Xenarios I, Arighi C, Lu Z
Abstract
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.
PMID: 28025348 [PubMed - in process]
Direct transcriptional activation of BT genes by NLP transcription factors is a key component of the nitrate response in Arabidopsis.
Direct transcriptional activation of BT genes by NLP transcription factors is a key component of the nitrate response in Arabidopsis.
Biochem Biophys Res Commun. 2016 Dec 23;:
Authors: Sato T, Maekawa S, Konishi M, Yoshioka N, Sasaki Y, Maeda H, Ishida T, Kato Y, Yamaguchi J, Yanagisawa S
Abstract
Nitrate modulates growth and development, functioning as a nutrient signal in plants. Although many changes in physiological processes in response to nitrate have been well characterized as nitrate responses, the molecular mechanisms underlying the nitrate response are not yet fully understood. Here, we show that NLP transcription factors, which are key regulators of the nitrate response, directly activate the nitrate-inducible expression of BT1 and BT2 encoding putative scaffold proteins with a plant-specific domain structure in Arabidopsis. Interestingly, the 35S promoter-driven expression of BT2 partially rescued growth inhibition caused by reductions in NLP activity in Arabidopsis. Furthermore, simultaneous disruption of BT1 and BT2 affected nitrate-dependent lateral root development. These results suggest that direct activation of BT1 and BT2 by NLP transcriptional activators is a key component of the molecular mechanism underlying the nitrate response in Arabidopsis.
PMID: 28025145 [PubMed - as supplied by publisher]
[Pharmacists' Behavior in Clinical Practice: Results from a Questionnaire Survey of Pharmacy Students].
[Pharmacists' Behavior in Clinical Practice: Results from a Questionnaire Survey of Pharmacy Students].
Yakugaku Zasshi. 2016;136(2):351-8
Authors: Nakada A, Akagawa K, Yamamoto H, Kato Y, Yamamoto T
Abstract
A questionnaire survey was performed to obtain pharmacy students' impressions of pharmacists' behavior, to classify these based on professionalism, and to analyze the relationship between these experiences and students' satisfaction with their clinical practice in Japan. The questionnaire was answered by 327 5th-year pharmacy school students upon completing clinical practice at community pharmacies from 2011 to 2012. They rated their satisfaction with their clinical practice using a 6-point Likert scale, and provided descriptions of their experience such as, "This health provider is professional", or "What a great person he/she is as a health provider". We counted the words and then categorized the responses into 10 traits, as defined by the American Pharmaceutical Association Academy of Students of Pharmacy-American Association of Colleges of Pharmacy, Council of Deans Task Force on Professionalism 1999, using text mining. We analyzed the relationship between their experiences with respectful persons, and satisfaction, using the Mann-Whitney U-test (significance level<0.05). Most students (337 of 364, 92.6%) reported experiences with respectful health providers. These students experienced significantly more satisfaction than did other students (p<0.001). We analyzed 343 sentences written by 261 students, using text mining analysis after excluding unsuitable responses. The word most used was "patient" (121 times). Many students noted their impression that the pharmacists had answered patients' questions. Of the 10 trait categories, "professional knowledge and skills" was mentioned most often (151 students).
PMID: 26831812 [PubMed - indexed for MEDLINE]