Drug-induced Adverse Events

Community challenges in biomedical text mining over 10 years: success, failure and the future.

Tue, 2016-11-01 07:43
Related Articles

Community challenges in biomedical text mining over 10 years: success, failure and the future.

Brief Bioinform. 2016 Jan;17(1):132-44

Authors: Huang CC, Lu Z

Abstract
One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.

PMID: 25935162 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Literature mining, gene-set enrichment and pathway analysis for target identification in Behçet's disease.

Sun, 2016-10-30 07:20
Related Articles

Literature mining, gene-set enrichment and pathway analysis for target identification in Behçet's disease.

Clin Exp Rheumatol. 2016 Sep-Oct;34 Suppl 102(6):101-110

Authors: Wilson P, Larminie C, Smith R

Abstract
OBJECTIVES: To use literature mining to catalogue Behçet's associated genes, and advanced computational methods to improve the understanding of the pathways and signalling mechanisms that lead to the typical clinical characteristics of Behçet's patients. To extend this technique to identify potential treatment targets for further experimental validation.
METHODS: Text mining methods combined with gene enrichment tools, pathway analysis and causal analysis algorithms.
RESULTS: This approach identified 247 human genes associated with Behçet's disease and the resulting disease map, comprising 644 nodes and 19220 edges, captured important details of the relationships between these genes and their associated pathways, as described in diverse data repositories. Pathway analysis has identified how Behçet's associated genes are likely to participate in innate and adaptive immune responses. Causal analysis algorithms have identified a number of potential therapeutic strategies for further investigation.
CONCLUSIONS: Computational methods have captured pertinent features of the prominent disease characteristics presented in Behçet's disease and have highlighted NOD2, ICOS and IL18 signalling as potential therapeutic strategies.

PMID: 27791955 [PubMed - in process]

Categories: Literature Watch

CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database.

Sun, 2016-10-30 07:20
Related Articles

CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database.

Nucleic Acids Res. 2016 Oct 26;:

Authors: Jia B, Raphenya AR, Alcock B, Waglechner N, Guo P, Tsang KK, Lago BA, Dave BM, Pereira S, Sharma AN, Doshi S, Courtot M, Lo R, Williams LE, Frye JG, Elsayegh T, Sardar D, Westman EL, Pawlowski AC, Johnson TA, Brinkman FS, Wright GD, McArthur AG

Abstract
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis.

PMID: 27789705 [PubMed - as supplied by publisher]

Categories: Literature Watch

DrugCentral: online drug compendium.

Sun, 2016-10-30 07:20
Related Articles

DrugCentral: online drug compendium.

Nucleic Acids Res. 2016 Oct 26;:

Authors: Ursu O, Holmes J, Knockel J, Bologa CG, Yang JJ, Mathias SL, Nelson SJ, Oprea TI

Abstract
DrugCentral (http://drugcentral.org) is an open-access online drug compendium. DrugCentral integrates structure, bioactivity, regulatory, pharmacologic actions and indications for active pharmaceutical ingredients approved by FDA and other regulatory agencies. Monitoring of regulatory agencies for new drugs approvals ensures the resource is up-to-date. DrugCentral integrates content for active ingredients with pharmaceutical formulations, indexing drugs and drug label annotations, complementing similar resources available online. Its complementarity with other online resources is facilitated by cross referencing to external resources. At the molecular level, DrugCentral bridges drug-target interactions with pharmacological action and indications. The integration with FDA drug labels enables text mining applications for drug adverse events and clinical trial information. Chemical structure overlap between DrugCentral and five online drug resources, and the overlap between DrugCentral FDA-approved drugs and their presence in four different chemical collections, are discussed. DrugCentral can be accessed via the web application or downloaded in relational database format.

PMID: 27789690 [PubMed - as supplied by publisher]

Categories: Literature Watch

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

Thu, 2016-10-27 06:37
Related Articles

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

J Biomed Semantics. 2016;7:22

Authors: Han X, Kim JJ, Kwoh CK

Abstract
BACKGROUND: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems.
METHODS: Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition.
RESULTS AND CONCLUSION: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.

PMID: 27127603 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions.

Thu, 2016-10-27 06:37
Related Articles

MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions.

J Biomed Semantics. 2016;7:18

Authors: Blank CE, Cui H, Moore LR, Walls RL

Abstract
BACKGROUND: MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature.
RESULTS: MicrO currently has ~14550 classes (~2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by ~24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO's Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology.
CONCLUSIONS: By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we intend MicrO to be a powerful new tool to increase the computing power of bioinformatics tools such as the automated text mining of prokaryotic taxonomic descriptions using natural language processing. We also intend MicrO to support the development of new bioinformatics tools that aim to develop new connections between microbial phenotypes and genotypes (i.e., the gene content in genomes). Future ontology development will include incorporation of pathogenic phenotypes and prokaryotic habitats.

PMID: 27076900 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

Thu, 2016-10-27 06:37
Related Articles

EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

Database (Oxford). 2016;2016:

Authors: Pafilis E, Buttigieg PL, Ferrell B, Pereira E, Schnetzer J, Arvanitidis C, Jensen LJ

Abstract
The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/.

PMID: 26896844 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Classification of radiology reports for falls in an HIV study cohort.

Wed, 2016-10-26 06:17
Related Articles

Classification of radiology reports for falls in an HIV study cohort.

J Am Med Inform Assoc. 2016 Apr;23(e1):e113-7

Authors: Bates J, Fodeh SJ, Brandt CA, Womack JA

Abstract
OBJECTIVE: To identify patients in a human immunodeficiency virus (HIV) study cohort who have fallen by applying supervised machine learning methods to radiology reports of the cohort.
METHODS: We used the Veterans Aging Cohort Study Virtual Cohort (VACS-VC), an electronic health record-based cohort of 146 530 veterans for whom radiology reports were available (N=2 977 739). We created a reference standard of radiology reports, represented each report by a feature set of words and Unified Medical Language System concepts, and then developed several support vector machine (SVM) classifiers for falls. We compared mutual information (MI) ranking and embedded feature selection approaches. The SVM classifier with MI feature selection was chosen to classify all radiology reports in VACS-VC.
RESULTS: Our SVM classifier with MI feature selection achieved an area under the curve score of 97.04 on the test set. When applied to all the radiology reports in VACS-VC, 80 416 of these reports were classified as positive for a fall. Of these, 11 484 were associated with a fall-related external cause of injury code (E-code) and 68 932 were not, corresponding to 29 280 patients with potential fall-related injuries who could not have been found using E-codes.
DISCUSSION: Feature selection was crucial to improving the classifier's performance. Feature selection with MI allowed us to select the number of discriminative features to use for classification, in contrast to the embedded feature selection method, in which the number of features is chosen automatically.
CONCLUSION: Machine learning is an effective method of identifying patients who have suffered a fall. The development of this classifier supplements the clinical researcher's toolkit and reduces dependence on under-coded structured electronic health record data.

PMID: 26567329 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Tue, 2016-10-25 06:04
Related Articles

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Database (Oxford). 2016;2016:

Authors: Wei CH, Peng Y, Leaman R, Davis AP, Mattingly CJ, Li J, Wiegers TC, Lu Z

Abstract
Manually curating chemicals, diseases and their relationships is significantly important to biomedical research, but it is plagued by its high cost and the rapid growth of the biomedical literature. In recent years, there has been a growing interest in developing computational approaches for automatic chemical-disease relation (CDR) extraction. Despite these attempts, the lack of a comprehensive benchmarking dataset has limited the comparison of different techniques in order to assess and advance the current state-of-the-art. To this end, we organized a challenge task through BioCreative V to automatically extract CDRs from the literature. We designed two challenge tasks: disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. To assist system development and assessment, we created a large annotated text corpus that consisted of human annotations of chemicals, diseases and their interactions from 1500 PubMed articles. 34 teams worldwide participated in the CDR task: 16 (DNER) and 18 (CID). The best systems achieved an F-score of 86.46% for the DNER task--a result that approaches the human inter-annotator agreement (0.8875)--and an F-score of 57.03% for the CID task, the highest results ever reported for such tasks. When combining team results via machine learning, the ensemble system was able to further improve over the best team results by achieving 88.89% and 62.80% in F-score for the DNER and CID task, respectively. Additionally, another novel aspect of our evaluation is to test each participating system's ability to return real-time results: the average response time for each team's DNER and CID web service systems were 5.6 and 9.3 s, respectively. Most teams used hybrid systems for their submissions based on machining learning. Given the level of participation and results, we found our task to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction. Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.

PMID: 26994911 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

HerDing: herb recommendation system to treat diseases using genes and chemicals.

Tue, 2016-10-25 06:04
Related Articles

HerDing: herb recommendation system to treat diseases using genes and chemicals.

Database (Oxford). 2016;2016:

Authors: Choi W, Choi CH, Kim YR, Kim SJ, Na CS, Lee H

Abstract
In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding.

PMID: 26980517 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Application of dynamic topic models to toxicogenomics data.

Sat, 2016-10-22 08:22

Application of dynamic topic models to toxicogenomics data.

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):368

Authors: Lee M, Liu Z, Huang R, Tong W

Abstract
BACKGROUND: All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data.
RESULTS: A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms.
CONCLUSIONS: We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering hidden patterns embedded in time series gene expression profiles to gain enhanced understanding of dynamic behavior of gene regulation in the biological system.

PMID: 27766956 [PubMed - in process]

Categories: Literature Watch

Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.

Sat, 2016-10-22 08:22

Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):350

Authors: Roy S, Curry BC, Madahian B, Homayouni R

Abstract
BACKGROUND: The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs.
RESULTS: For approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms.
CONCLUSIONS: LSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin .

PMID: 27766940 [PubMed - in process]

Categories: Literature Watch

Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams.

Sat, 2016-10-22 08:22

Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams.

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):335

Authors: Eshleman R, Singh R

Abstract
BACKGROUND: Adverse drug events (ADEs) constitute one of the leading causes of post-therapeutic death and their identification constitutes an important challenge of modern precision medicine. Unfortunately, the onset and effects of ADEs are often underreported complicating timely intervention. At over 500 million posts per day, Twitter is a commonly used social media platform. The ubiquity of day-to-day personal information exchange on Twitter makes it a promising target for data mining for ADE identification and intervention. Three technical challenges are central to this problem: (1) identification of salient medical keywords in (noisy) tweets, (2) mapping drug-effect relationships, and (3) classification of such relationships as adverse or non-adverse.
METHODS: We use a bipartite graph-theoretic representation called a drug-effect graph (DEG) for modeling drug and side effect relationships by representing the drugs and side effects as vertices. We construct individual DEGs on two data sources. The first DEG is constructed from the drug-effect relationships found in FDA package inserts as recorded in the SIDER database. The second DEG is constructed by mining the history of Twitter users. We use dictionary-based information extraction to identify medically-relevant concepts in tweets. Drugs, along with co-occurring symptoms are connected with edges weighted by temporal distance and frequency. Finally, information from the SIDER DEG is integrate with the Twitter DEG and edges are classified as either adverse or non-adverse using supervised machine learning.
RESULTS: We examine both graph-theoretic and semantic features for the classification task. The proposed approach can identify adverse drug effects with high accuracy with precision exceeding 85 % and F1 exceeding 81 %. When compared with leading methods at the state-of-the-art, which employ un-enriched graph-theoretic analysis alone, our method leads to improvements ranging between 5 and 8 % in terms of the aforementioned measures. Additionally, we employ our method to discover several ADEs which, though present in medical literature and Twitter-streams, are not represented in the SIDER databases.
CONCLUSIONS: We present a DEG integration model as a powerful formalism for the analysis of drug-effect relationships that is general enough to accommodate diverse data sources, yet rigorous enough to provide a strong mechanism for ADE identification.

PMID: 27766937 [PubMed - in process]

Categories: Literature Watch

Prioritization of Susceptibility Genes for Ectopic Pregnancy by Gene Network Analysis.

Fri, 2016-10-21 14:10
Related Articles

Prioritization of Susceptibility Genes for Ectopic Pregnancy by Gene Network Analysis.

Int J Mol Sci. 2016 Feb 01;17(2):

Authors: Liu JL, Zhao M

Abstract
Ectopic pregnancy is a very dangerous complication of pregnancy, affecting 1%-2% of all reported pregnancies. Due to ethical constraints on human biopsies and the lack of suitable animal models, there has been little success in identifying functionally important genes in the pathogenesis of ectopic pregnancy. In the present study, we developed a random walk-based computational method named TM-rank to prioritize ectopic pregnancy-related genes based on text mining data and gene network information. Using a defined threshold value, we identified five top-ranked genes: VEGFA (vascular endothelial growth factor A), IL8 (interleukin 8), IL6 (interleukin 6), ESR1 (estrogen receptor 1) and EGFR (epidermal growth factor receptor). These genes are promising candidate genes that can serve as useful diagnostic biomarkers and therapeutic targets. Our approach represents a novel strategy for prioritizing disease susceptibility genes.

PMID: 26840308 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

A novel strategy of profiling the mechanism of herbal medicines by combining network pharmacology with plasma concentration determination and affinity constant measurement.

Wed, 2016-10-19 19:38

A novel strategy of profiling the mechanism of herbal medicines by combining network pharmacology with plasma concentration determination and affinity constant measurement.

Mol Biosyst. 2016 Oct 18;12(11):3347-3356

Authors: Chen L, Lv D, Wang D, Chen X, Zhu Z, Cao Y, Chai Y

Abstract
Herbal medicines have long been widely used in the treatment of various complex diseases in China. However, the active constituents and therapeutic mechanisms of many herbal medicines remain undefined. Therefore, the identification of the active components and target proteins in these herbal medicines is a formidable task in herbal medicine research. In this study, we proposed a strategy, which integrates network pharmacology with biomedical analysis and surface plasmon resonance (SPR) to predict the active ingredients and potential targets of herbal medicine Sophora flavescens or Kushen in Chinese, and evaluate its anti-fibrosis activity. First, we applied a virtual HTDocking platform to predict the potential targets of Kushen related to liver fibrosis by selecting five crucial protein targets based on network parameters and text mining. Then, we identified nine components in mice plasma after oral administration of Kushen extract and determined the plasma concentration of each compound. Binding affinities between the nine potential active compounds and five target proteins were detected by SPR assays. Finally, we constructed a multi-parameter network model on the basis of three important parameters to tentatively explain the anti-fibrosis mechanism of Kushen. The results not only provide evidence for the therapeutic mechanism of Kushen but also shed new light on the activity-based analysis of other Chinese herbal medicines.

PMID: 27754507 [PubMed - in process]

Categories: Literature Watch

Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision.

Tue, 2016-10-18 07:17
Related Articles

Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision.

J Mach Learn Res. 2016;17:

Authors: Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ

Abstract
Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet these criteria. Identifying PICO elements in the full-texts of trial reports is thus a critical yet time-consuming step in the systematic review process. We seek to expedite evidence synthesis by developing machine learning models to automatically extract sentences from articles relevant to PICO elements. Collecting a large corpus of training data for this task would be prohibitively expensive. Therefore, we derive distant supervision (DS) with which to train models using previously conducted reviews. DS entails heuristically deriving 'soft' labels from an available structured resource. However, we have access only to unstructured, free-text summaries of PICO elements for corresponding articles; we must derive from these the desired sentence-level annotations. To this end, we propose a novel method - supervised distant supervision (SDS) - that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS. We show that this approach tends to outperform existing methods with respect to automated PICO extraction.

PMID: 27746703 [PubMed - in process]

Categories: Literature Watch

A PubMed-wide study of endometriosis.

Tue, 2016-10-18 07:17
Related Articles

A PubMed-wide study of endometriosis.

Genomics. 2016 Oct 13;:

Authors: Liu JL, Zhao M

Abstract
Endometriosis affects 5-10% of women in reproductive age, leading to dysmenorrhea, pelvic pain and infertility; however, our understanding on the pathogenesis of this disease remains incomplete. In the present study, we performed a systematic analysis of endometriosis-related genes using text mining. Taking text mining results as input, we subsequently generated a filtered gene set by computing the likelihood of finding more than expected occurrences for every gene across the disease-centered subset of the PubMed database. Characterization of this filtered gene set by gene ontology, pathway and network analysis provides clues to the multiple mechanisms hypothesized to be responsible for the establishment of ectopic endometrial tissues, including the migration, implantation, survival and proliferation of ectopic endometrial cells. Finally, using this gene set as "seed", we scanned human genome to predict novel candidate genes based on gene annotations from multiple databases. Our study provides in-depth insights into the pathogenesis of endometriosis.

PMID: 27746014 [PubMed - as supplied by publisher]

Categories: Literature Watch

Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis.

Mon, 2016-10-17 07:02

Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis.

Comput Biol Chem. 2016 Sep 30;65:37-44

Authors: Vyas R, Bapat S, Jain E, Karthikeyan M, Tambe S, Kulkarni BD

Abstract
In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein-protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies.

PMID: 27744173 [PubMed - as supplied by publisher]

Categories: Literature Watch

Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources.

Sun, 2016-10-16 06:46
Related Articles

Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources.

J Biomed Inform. 2016 Oct 11;:

Authors: Kocbek S, Cavedon L, Martinez D, Bain C, Mac Manus C, Haffari G, Zukerman I, Verspoor K

Abstract
OBJECTIVE: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance.
METHODS: Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness.
RESULTS: Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports.
CONCLUSION: Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the most effective combination of data sources depends on the specific disease to be classified.

PMID: 27742349 [PubMed - as supplied by publisher]

Categories: Literature Watch

How users adopt healthcare information: An empirical study of an online Q&A community.

Fri, 2016-10-14 06:12
Related Articles

How users adopt healthcare information: An empirical study of an online Q&A community.

Int J Med Inform. 2016 Feb;86:91-103

Authors: Jin J, Yan X, Li Y, Li Y

Abstract
OBJECTIVES: The emergence of social media technology has led to the creation of many online healthcare communities, where patients can easily share and look for healthcare-related information from peers who have experienced a similar problem. However, with increased user-generated content, there is a need to constantly analyse which content should be trusted as one sifts through enormous amounts of healthcare information. This study aims to explore patients' healthcare information seeking behavior in online communities.
METHODS: Based on dual-process theory and the knowledge adoption model, we proposed a healthcare information adoption model for online communities. This model highlights that information quality, emotional support, and source credibility are antecedent variables of adoption likelihood of healthcare information, and competition among repliers and involvement of recipients moderate the relationship between the antecedent variables and adoption likelihood. Empirical data were collected from the healthcare module of China's biggest Q&A community-Baidu Knows. Text mining techniques were adopted to calculate the information quality and emotional support contained in each reply text. A binary logistics regression model and hierarchical regression approach were employed to test the proposed conceptual model.
RESULTS: Information quality, emotional support, and source credibility have significant and positive impact on healthcare information adoption likelihood, and among these factors, information quality has the biggest impact on a patient's adoption decision. In addition, competition among repliers and involvement of recipients were tested as moderating effects between these antecedent factors and the adoption likelihood. Results indicate competition among repliers positively moderates the relationship between source credibility and adoption likelihood, and recipients' involvement positively moderates the relationship between information quality, source credibility, and adoption decision.
CONCLUSIONS: In addition to information quality and source credibility, emotional support has significant positive impact on individuals' healthcare information adoption decisions. Moreover, the relationships between information quality, source credibility, emotional support, and adoption decision are moderated by competition among repliers and involvement of recipients.

PMID: 26616406 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Pages