Drug-induced Adverse Events

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.
Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.
J Biomed Discov Collab. 2016;7:e1
Authors: Smalheiser NR, Bonifield G
Abstract
In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same individual (using the 2009 Author-ity author name disambiguation dataset as a gold standard). The two metrics are only modestly correlated with each other (r = 0.50), indicating that they capture different aspects of term usage. The article-based metric provides a measure of semantic relatedness, and MeSH term pairs that co-occur more often than expected by chance may reflect relations between the two terms. In contrast, the author metric is indicative of how individuals practice science, and may have value for author name disambiguation and studies of scientific discovery. We have calculated article metrics for all MeSH terms appearing in at least 25 articles in MEDLINE (as of 2014) and author metrics for MeSH terms published as of 2009. The dataset is freely available for download and can be queried at http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html. Handling editor: Elizabeth Workman, MLIS, PhD.
PMID: 27213780 [PubMed - as supplied by publisher]
Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.
Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.
Comput Biol Med. 2016 Apr 20;73:173-185
Authors: Wagner M, Vicinus B, Muthra ST, Richards TA, Linder R, Frick VO, Groh A, Rubie C, Weichert F
Abstract
BACKGROUND: The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining.
METHODS: A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features.
RESULTS: The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable.
CONCLUSIONS: The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable.
PMID: 27208610 [PubMed - as supplied by publisher]
Text Mining of Journal Articles for Sleep Disorder Terminologies.
Text Mining of Journal Articles for Sleep Disorder Terminologies.
PLoS One. 2016;11(5):e0156031
Authors: Lam C, Lai FC, Wang CH, Lai MH, Hsu N, Chung MH
Abstract
OBJECTIVE: Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings.
METHODS: SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms.
RESULTS: MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms.
CONCLUSION: Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
PMID: 27203858 [PubMed - as supplied by publisher]
EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews
EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews
Book. 2016 04
Authors: Paynter R, Bañez LL, Berliner E, Erinoff E, Lege-Matsuura J, Potter S, Uhl S
Abstract
OBJECTIVE: This project's goal was to provide a preliminary sketch of the use of text-mining tools as an emerging methodology within a number of systematic review processes. We sought to provide information addressing pressing questions individuals and organizations face when considering utilizing text-mining tools.
METHODS: We searched the literature to identify and summarize research on the use of text-mining tools within the systematic review context. We conducted telephone interviews with Key Informants (KIs; n=8) using a semi-structured instrument and subsequent qualitative analysis to explore issues surrounding the implementation and use of text-mining tools. Lastly, we compiled a list of text-mining tools to support systematic review methods and evaluated the tools using an informal descriptive appraisal tool.
RESULTS: The literature review identified 122 articles that met inclusion criteria, including two recent systematic reviews on the use of text-mining tools in the screening and data abstraction steps of systematic reviews. In addition to these two steps, a preliminary exploration of the literature on searching and other less-studied steps are presented. Support for the use of text-mining was strong amongst the KIs overall, though most KIs noted some performance caveats and/or areas in which further research is necessary. We evaluated 111 text-mining tools identified from the literature review and KI interviews.
CONCLUSIONS: Text-mining tools are currently being used within several systematic review organizations for a variety of review processes (e.g., searching, screening abstracts), and the published evidence-base is growing fairly rapidly in breadth and levels of evidence. Several outstanding questions remain for future empirical research to address regarding the reliability and validity of using these emerging technologies across a variety of review processes and whether these generalize across the scope of review topics. Guidance on reporting the use of these tools would be useful.
PMID: 27195359
Argo: enabling the development of bespoke workflows and services for disease annotation.
Argo: enabling the development of bespoke workflows and services for disease annotation.
Database (Oxford). 2016;2016
Authors: Batista-Navarro R, Carter J, Ananiadou S
Abstract
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo's potential as an enabling technology for curating disease and phenotypic information from literature.Database URL: http://argo.nactem.ac.uk.
PMID: 27189607 [PubMed - as supplied by publisher]
Identifying a biomarker network for corticosteroid resistance in asthma from bronchoalveolar lavage samples.
Identifying a biomarker network for corticosteroid resistance in asthma from bronchoalveolar lavage samples.
Mol Biol Rep. 2016 May 17;
Authors: Vargas JE, Porto BN, Puga R, Stein RT, Pitrez PM
Abstract
Corticosteroid resistance (CR) is a major barrier to the effective treatment of severe asthma. Hence, a better understanding of the molecular mechanisms involved in this condition is a priority. Network analysis is an emerging strategy to explore this complex heterogeneous disorder at system level to identify a small own network for CR in asthma. Gene expression profile of GSE7368 from bronchoalveolar lavage (BAL) of CR in subjects with asthma was downloaded from the gene expression omnibus (GEO) database and compared to BAL of corticosteroid-sensitive (CS) patients. DEGs were identified by the Limma package in R language. In addition, DEGs were mapped to STRING to acquire protein-protein interaction (PPI) pairs. Topological properties of PPI network were calculated by Centiscape, ClusterOne and BINGO. Subsequently, text-mining tools were applied to design one own cell signalling for CR in asthma. Thirty-five PPI networks were obtained; including a major network consisted of 370 nodes, connected by 777 edges. After topological analysis, a minor PPI network composed by 48 nodes was indentified, which is composed by most relevant nodes of major PPI network. In this subnetwork, several receptors (EGFR, EGR1, ESR2, PGR), transcription factors (MYC, JAK), cytokines (IL8, IL6, IL1B), one chemokine (CXCL1), one kinase (SRC) and one cyclooxygenase (PTGS2) were described to be associated with inflammatory environment and steroid resistance in asthma. We suggest a biomarker network composed by 48 nodes that could be potentially explored with diagnostic or therapeutic use.
PMID: 27188427 [PubMed - as supplied by publisher]
On the unsupervised analysis of domain-specific Chinese texts.
On the unsupervised analysis of domain-specific Chinese texts.
Proc Natl Acad Sci U S A. 2016 May 16;
Authors: Deng K, Bol PK, Li KJ, Liu JS
Abstract
With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.
PMID: 27185919 [PubMed - as supplied by publisher]
Exploring mechanisms of Panax notoginseng saponins in treating coronary heart disease by integrating gene interaction network and functional enrichment analysis.
Exploring mechanisms of Panax notoginseng saponins in treating coronary heart disease by integrating gene interaction network and functional enrichment analysis.
Chin J Integr Med. 2016 May 16;
Authors: Yu G, Wang J
Abstract
OBJECTIVE: To investigate the mechanisms of Panax notoginseng saponins (PNS) in treating coronary heart disease (CHD) by integrating gene interaction network and functional enrichment analysis.
METHODS: Text mining was used to get CHD and PNS associated genes. Gene-gene interaction networks of CHD and PNS were built by the GeneMANIA Cytoscape plugin. Advanced Network Merge Cytoscape plugin was used to analyze the two networks. Their functions were analyzed by gene functional enrichment analysis via DAVID Bioinformatics. Joint subnetwork of CHD network and PNS network was identified by network analysis.
RESULTS: The 11 genes of the joint subnetwork were the direct targets of PNS in CHD network and enriched in cytokine-cytokine receptor interaction pathway. PNS could affect other 85 genes by the gene-gene interaction of joint subnetwork and these genes were enriched in other 7 pathways. The direct mechanisms of PNS in treating CHD by targeting cytokines to relieve the inflammation and the indirect mechanisms of PNS in treating CHD by affecting other 7 pathways through the interaction of joint subnetwork of PNS and CHD network. The genes in the 7 pathways could be potential targets for the immunologic adjuvant, anticoagulant, hypolipidemic, anti-platelet and anti-hypertrophic activities of PNS.
CONCLUSION: The key mechanisms of PNS in treating CHD could be anticoagulant and hypolipidemic which are indicated by analyzing biological functions of hubs in the merged network.
PMID: 27184904 [PubMed - as supplied by publisher]
Celastrol targets IRAKs to block Toll-like receptor 4-mediated nuclear factor-κB activation.
Celastrol targets IRAKs to block Toll-like receptor 4-mediated nuclear factor-κB activation.
J Integr Med. 2016 May;14(3):203-8
Authors: Shen YF, Zhang X, Wang Y, Cao FF, Uzan G, Peng B, Zhang DH
Abstract
OBJECTIVE: Celastrol has been established as a nuclear factor-κB (NF-κB) activation inhibitor; however, the exact mechanism behind this action is still unknown. Using text-mining technology, the authors predicted that interleukin-1 receptor-associated kinases (IRAKs) are potential celastrol targets, and hypothesized that targeting IRAKs might be one way that celastrol inhibits NF-κB. This is because IRAKs are key molecules for some crucial pathways to activate NF-κB (e.g., the interleukin-1 receptor (IL-1R)/Toll-like receptor (TLR) superfamily).
METHODS: The human hepatocellular cell line (HepG2) treated with palmitic acid (PA) was used as a model for stimulating TLR4/NF-κB activation, in order to observe the potential effects of celastrol in IRAK regulation and NF-κB inhibition. The transfection of small interfering RNA was used for down-regulating TLR4, IRAK1 and IRAK4, and the Western blot method was used to detect changes in the protein expressions.
RESULTS: The results showed that celastrol could effectively inhibit PA-caused TLR4-dependent NF-κB activation in the HepG2 cells; PA also activated IRAKs, which were inhibited by celastrol. Knocking down IRAKs abolished PA-caused NF-κB activation.
CONCLUSION: The results for the first time show that targeting IRAKs is one way in which celastrol inhibits NF-κB activation.
PMID: 27181127 [PubMed - in process]
Text mining patents for biomedical knowledge.
Text mining patents for biomedical knowledge.
Drug Discov Today. 2016 May 11;
Authors: Rodriguez-Esteban R, Bundschus M
Abstract
Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery.
PMID: 27179985 [PubMed - as supplied by publisher]
A novel procedure on next generation sequencing data analysis using text mining algorithm.
A novel procedure on next generation sequencing data analysis using text mining algorithm.
BMC Bioinformatics. 2016;17(1):213
Authors: Zhao W, Chen JJ, Perkins R, Wang Y, Liu Z, Hong H, Tong W, Zou W
Abstract
BACKGROUND: Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining.
METHODS: We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure.
RESULTS: The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.
CONCLUSION: The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.
PMID: 27177941 [PubMed - as supplied by publisher]
Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.
Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.
J Biomed Semantics. 2016;7:27
Authors: Mehryary F, Kaewphan S, Hakala K, Ginter F
Abstract
BACKGROUND: Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. In this paper we propose a novel approach for filtering falsely identified triggers from large-scale event databases, thus improving the quality of knowledge extraction.
METHODS: Our method relies on state-of-the-art word embeddings, event statistics gathered from the whole biomedical literature, and both supervised and unsupervised machine learning techniques. We focus on EVEX, an event database covering the whole PubMed and PubMed Central Open Access literature containing more than 40 million extracted events. The top most frequent EVEX trigger words are hierarchically clustered, and the resulting cluster tree is pruned to identify words that can never act as triggers regardless of their context. For rarely occurring trigger words we introduce a supervised approach trained on the combination of trigger word classification produced by the unsupervised clustering method and manual annotation.
RESULTS: The method is evaluated on the official test set of BioNLP Shared Task on Event Extraction. The evaluation shows that the method can be used to improve the performance of the state-of-the-art event extraction systems. This successful effort also translates into removing 1,338,075 of potentially incorrect events from EVEX, thus greatly improving the quality of the data. The method is not solely bound to the EVEX resource and can be thus used to improve the quality of any event extraction system or database.
AVAILABILITY: The data and source code for this work are available at: http://bionlp-www.utu.fi/trigger-clustering/.
PMID: 27175227 [PubMed - as supplied by publisher]
Mining chemical patents with an ensemble of open systems.
Mining chemical patents with an ensemble of open systems.
Database (Oxford). 2016;2016
Authors: Leaman R, Wei CH, Zou C, Lu Z
Abstract
The significant amount of medicinal chemistry information contained in patents makes them an attractive target for text mining. In this manuscript, we describe systems for named entity recognition (NER) of chemicals and genes/proteins in patents, using the CEMP (for chemicals) and GPRO (for genes/proteins) corpora provided by the CHEMDNER task at BioCreative V. Our chemical NER system is an ensemble of five open systems, including both versions of tmChem, our previous work on chemical NER. Their output is combined using a machine learning classification approach. Our chemical NER system obtained 0.8752 precision and 0.9129 recall, for 0.8937 f-score on the CEMP task. Our gene/protein NER system is an extension of our previous work for gene and protein NER, GNormPlus. This system obtained a performance of 0.8143 precision and 0.8141 recall, for 0.8137 f-score on the GPRO task. Both systems achieved the highest performance in their respective tasks at BioCreative V. We conclude that an ensemble of independently-created open systems is sufficiently diverse to significantly improve performance over any individual system, even when they use a similar approach.Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/.
PMID: 27173521 [PubMed - as supplied by publisher]
The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.
The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.
J Med Internet Res. 2016;18(5):e108
Authors: Hao H, Zhang K
Abstract
BACKGROUND: Many Web-based health care platforms allow patients to evaluate physicians by posting open-end textual reviews based on their experiences. These reviews are helpful resources for other patients to choose high-quality doctors, especially in countries like China where no doctor referral systems exist. Analyzing such a large amount of user-generated content to understand the voice of health consumers has attracted much attention from health care providers and health care researchers.
OBJECTIVE: The aim of this paper is to automatically extract hidden topics from Web-based physician reviews using text-mining techniques to examine what Chinese patients have said about their doctors and whether these topics differ across various specialties. This knowledge will help health care consumers, providers, and researchers better understand this information.
METHODS: We conducted two-fold analyses on the data collected from the "Good Doctor Online" platform, the largest online health community in China. First, we explored all reviews from 2006-2014 using descriptive statistics. Second, we applied the well-known topic extraction algorithm Latent Dirichlet Allocation to more than 500,000 textual reviews from over 75,000 Chinese doctors across four major specialty areas to understand what Chinese health consumers said online about their doctor visits.
RESULTS: On the "Good Doctor Online" platform, 112,873 out of 314,624 doctors had been reviewed at least once by April 11, 2014. Among the 772,979 textual reviews, we chose to focus on four major specialty areas that received the most reviews: Internal Medicine, Surgery, Obstetrics/Gynecology and Pediatrics, and Chinese Traditional Medicine. Among the doctors who received reviews from those four medical specialties, two-thirds of them received more than two reviews and in a few extreme cases, some doctors received more than 500 reviews. Across the four major areas, the most popular topics reviewers found were the experience of finding doctors, doctors' technical skills and bedside manner, general appreciation from patients, and description of various symptoms.
CONCLUSIONS: To the best of our knowledge, our work is the first study using an automated text-mining approach to analyze a large amount of unstructured textual data of Web-based physician reviews in China. Based on our analysis, we found that Chinese reviewers mainly concentrate on a few popular topics. This is consistent with the goal of Chinese online health platforms and demonstrates the health care focus in China's health care system. Our text-mining approach reveals a new research area on how to use big data to help health care providers, health care administrators, and policy makers hear patient voices, target patient concerns, and improve the quality of care in this age of patient-centered care. Also, on the health care consumer side, our text mining technique helps patients make more informed decisions about which specialists to see without reading thousands of reviews, which is simply not feasible. In addition, our comparison analysis of Web-based physician reviews in China and the United States also indicates some cultural differences.
PMID: 27165558 [PubMed - in process]
Prioritizing Chemicals for Risk Assessment Using Chemoinformatics: Examples from the IARC Monographs on Pesticides.
Prioritizing Chemicals for Risk Assessment Using Chemoinformatics: Examples from the IARC Monographs on Pesticides.
Environ Health Perspect. 2016 May 10;
Authors: Guha N, Guyton KZ, Loomis D, Barupal DK
Abstract
BACKGROUND: Identifying cancer hazards is the first step towards cancer prevention. The IARC Monographs Programme, which has evaluated nearly 1000 agents for carcinogenic potential since 1971, typically selects agents for hazard identification on the basis of public nominations, expert advice, published data on carcinogenicity, and public health importance.
OBJECTIVES: Here we present a novel and complementary strategy for identifying agents for hazard evaluation using chemoinformatics, database integration and automated text mining.
DISCUSSION: To inform selection among a broad range of pesticides nominated for evaluation, we identified and screened nearly 6000 relevant chemical structures, thereafter systematically compiled information on 980 pesticides, creating chemical similarity network maps that allowed cluster visualization by chemical similarity, pesticide class, and publicly available information concerning cancer epidemiology, cancer bioassays, and carcinogenic mechanisms. For the IARC Monograph meetings that took place in March and June 2015, this approach supported high priority evaluation of glyphosate, malathion, parathion, tetrachlorvinphos, diazinon, DDT, lindane, and 2,4-D.
CONCLUSIONS: This systematic approach, accounting for chemical similarity and overlaying multiple data sources, can be used by risk assessors as well as researchers to systematize, inform and increase efficiency in selecting and prioritizing agents for hazard identification, risk assessment, regulation or further investigation. This approach could be extended to an array of outcomes and agents, including occupational carcinogens, drugs, and foods.
PMID: 27164621 [PubMed - as supplied by publisher]
Convex biclustering.
Convex biclustering.
Biometrics. 2016 May 10;
Authors: Chi EC, Allen GI, Baraniuk RG
Abstract
In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.
PMID: 27163413 [PubMed - as supplied by publisher]
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.
Database (Oxford). 2016;2016
Authors: Li J, Sun Y, Johnson RJ, Sciaky D, Wei CH, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z
Abstract
Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/.
PMID: 27161011 [PubMed - in process]
Polypharmacology in Drug Development: A Minireview of Current Technologies.
Polypharmacology in Drug Development: A Minireview of Current Technologies.
ChemMedChem. 2016 May 6;
Authors: Tan Z, Chaudhai R, Zhang S
Abstract
Polypharmacology, the process in which a single drug is able to bind to multiple targets specifically and simultaneously, is an emerging paradigm in drug development. The potency of a given drug can be increased through the engagement of multiple targets involved in a certain disease. Polypharmacology may also help identify novel applications of existing drugs through drug repositioning. However, many problems and challenges remain in this field. Rather than covering all aspects of polypharmacology, this Minireview is focused primarily on recently reported techniques, from bioinformatics technologies to cheminformatics approaches as well as text-mining-based methods, all of which have made significant contributions to the research of polypharmacology.
PMID: 27154144 [PubMed - as supplied by publisher]
Building a glaucoma interaction network using a text mining approach.
Building a glaucoma interaction network using a text mining approach.
BioData Min. 2016;9:17
Authors: Soliman M, Nasraoui O, Cooper NG
Abstract
BACKGROUND: The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease.
RESULTS: A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx.
CONCLUSIONS: This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of relations that could not be found in existing interaction databases and that were found to be new, in addition to a smaller subnetwork consisting of interconnected clusters of seven glaucoma genes. Future improvements can be applied towards obtaining a better version of this network.
PMID: 27152122 [PubMed]
Data-based Reconstruction of Gene Regulatory Networks of Fungal Pathogens.
Data-based Reconstruction of Gene Regulatory Networks of Fungal Pathogens.
Front Microbiol. 2016;7:570
Authors: Guthke R, Gerber S, Conrad T, Vlaic S, Durmuş S, Çakır T, Sevilgen FE, Shelest E, Linde J
Abstract
In the emerging field of systems biology of fungal infection, one of the central roles belongs to the modeling of gene regulatory networks (GRNs). Utilizing omics-data, GRNs can be predicted by mathematical modeling. Here, we review current advances of data-based reconstruction of both small-scale and large-scale GRNs for human pathogenic fungi. The advantage of large-scale genome-wide modeling is the possibility to predict central (hub) genes and thereby indicate potential biomarkers and drug targets. In contrast, small-scale GRN models provide hypotheses on the mode of gene regulatory interactions, which have to be validated experimentally. Due to the lack of sufficient quantity and quality of both experimental data and prior knowledge about regulator-target gene relations, the genome-wide modeling still remains problematic for fungal pathogens. While a first genome-wide GRN model has already been published for Candida albicans, the feasibility of such modeling for Aspergillus fumigatus is evaluated in the present article. Based on this evaluation, opinions are drawn on future directions of GRN modeling of fungal pathogens. The crucial point of genome-wide GRN modeling is the experimental evidence, both used for inferring the networks (omics 'first-hand' data as well as literature data used as prior knowledge) and for validation and evaluation of the inferred network models.
PMID: 27148247 [PubMed]