Drug-induced Adverse Events

Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis.
Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis.
BMC Bioinformatics. 2015;16 Suppl 16:S3
Authors: Hakala K, Van Landeghem S, Salakoski T, Van de Peer Y, Ginter F
Abstract
BACKGROUND: Modern methods for mining biomolecular interactions from literature typically make predictions based solely on the immediate textual context, in effect a single sentence. No prior work has been published on extending this context to the information automatically gathered from the whole biomedical literature. Thus, our motivation for this study is to explore whether mutually supporting evidence, aggregated across several documents can be utilized to improve the performance of the state-of-the-art event extraction systems.
RESULTS: In the GE task, our re-ranking approach led to a modest performance increase and resulted in the first rank of the official Shared Task results with 50.97% F-score. Additionally, in this paper we explore and evaluate the usage of distributed vector representations for this challenge.
CONCLUSIONS: For the GRN task, we were able to produce a gene regulatory network from the EVEX data, warranting the use of such generic large-scale text mining data in network biology settings. A detailed performance and error analysis provides more insight into the relatively low recall rates.
PMID: 26551766 [PubMed - indexed for MEDLINE]
SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets.
SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets.
Nucleic Acids Res. 2016 Jan 4;44(D1):D1011-7
Authors: Guo J, Liu H, Zheng J
Abstract
Synthetic lethality (SL) is a type of genetic interaction between two genes such that simultaneous perturbations of the two genes result in cell death or a dramatic decrease of cell viability, while a perturbation of either gene alone is not lethal. SL reflects the biologically endogenous difference between cancer cells and normal cells, and thus the inhibition of SL partners of genes with cancer-specific mutations could selectively kill cancer cells but spare normal cells. Therefore, SL is emerging as a promising anticancer strategy that could potentially overcome the drawbacks of traditional chemotherapies by reducing severe side effects. Researchers have developed experimental technologies and computational prediction methods to identify SL gene pairs on human and a few model species. However, there has not been a comprehensive database dedicated to collecting SL pairs and related knowledge. In this paper, we propose a comprehensive database, SynLethDB (http://histone.sce.ntu.edu.sg/SynLethDB/), which contains SL pairs collected from biochemical assays, other related databases, computational predictions and text mining results on human and four model species, i.e. mouse, fruit fly, worm and yeast. For each SL pair, a confidence score was calculated by integrating individual scores derived from different evidence sources. We also developed a statistical analysis module to estimate the druggability and sensitivity of cancer cells upon drug treatments targeting human SL partners, based on large-scale genomic data, gene expression profiles and drug sensitivity profiles on more than 1000 cancer cell lines. To help users access and mine the wealth of the data, we developed other practical functionalities, such as search and filtering, orthology search, gene set enrichment analysis. Furthermore, a user-friendly web interface has been implemented to facilitate data analysis and interpretation. With the integrated data sets and analytics functionalities, SynLethDB would be a useful resource for biomedical research community and pharmaceutical industry.
PMID: 26516187 [PubMed - indexed for MEDLINE]
TaggerOne: Joint Named Entity Recognition and Normalization with Semi-Markov Models.
TaggerOne: Joint Named Entity Recognition and Normalization with Semi-Markov Models.
Bioinformatics. 2016 Jun 9;
Authors: Leaman R, Lu Z
Abstract
MOTIVATION: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization.
METHODS: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput.
RESULTS: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance.
AVAILABILITY: TaggerOne will be made open source upon acceptance. Demonstration available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/demo/TaggerOne/demo.cgi CONTACT: zhiyong.lu@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID: 27283952 [PubMed - as supplied by publisher]
Mining clinical attributes of genomic variants through assisted literature curation in Egas.
Mining clinical attributes of genomic variants through assisted literature curation in Egas.
Database (Oxford). 2016;2016
Authors: Matos S, Campos D, Pinho R, Silva RM, Mort M, Cooper DN, Oliveira JL
Abstract
The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/.
PMID: 27278817 [PubMed - in process]
Overlap in drug-disease associations between clinical practice guidelines and drug structured product label indications.
Overlap in drug-disease associations between clinical practice guidelines and drug structured product label indications.
J Biomed Semantics. 2016;7:37
Authors: Leung TI, Dumontier M
Abstract
BACKGROUND: Clinical practice guidelines (CPGs) recommend pharmacologic treatments for clinical conditions, and drug structured product labels (SPLs) summarize approved treatment indications. Both resources are intended to promote evidence-based medical practices and guide clinicians' prescribing decisions. However, it is unclear how well CPG recommendations about pharmacologic therapies match SPL indications for recommended drugs. In this study, we perform text mining of CPG summaries to examine drug-disease associations in CPG recommendations and in SPL treatment indications for 15 common chronic conditions.
METHODS: We constructed an initial text corpus of guideline summaries from the National Guideline Clearinghouse (NGC) from a set of manually selected ICD-9 codes for each of the 15 conditions. We obtained 377 relevant guideline summaries and their Major Recommendations section, which excludes guidelines for pediatric patients, pregnant or breastfeeding women, or for medical diagnoses not meeting inclusion criteria. A vocabulary of drug terms was derived from five medical taxonomies. We used named entity recognition, in combination with dictionary-based and ontology-based methods, to identify drug term occurrences in the text corpus and construct drug-disease associations. The ATC (Anatomical Therapeutic Chemical Classification) was utilized to perform drug name and drug class matching to construct the drug-disease associations from CPGs. We then obtained drug-disease associations from SPLs using conditions mentioned in their Indications section in SIDER. The primary outcomes were the frequency of drug-disease associations in CPGs and SPLs, and the frequency of overlap between the two sets of drug-disease associations, with and without using taxonomic information from ATC.
RESULTS: Without taxonomic information, we identified 1444 drug-disease associations across CPGs and SPLs for 15 common chronic conditions. Of these, 195 drug-disease associations overlapped between CPGs and SPLs, 917 associations occurred in CPGs only and 332 associations occurred in SPLs only. With taxonomic information, 859 unique drug-disease associations were identified, of which 152 of these drug-disease associations overlapped between CPGs and SPLs, 541 associations occurred in CPGs only, and 166 associations occurred in SPLs only.
CONCLUSIONS: Our results suggest that CPG-recommended pharmacologic therapies and SPL indications do not overlap frequently when identifying drug-disease associations using named entity recognition, although incorporating taxonomic relationships between drug names and drug classes into the approach improves the overlap. This has important implications in practice because conflicting or inconsistent evidence may complicate clinical decision making and implementation or measurement of best practices.
PMID: 27277160 [PubMed - in process]
Systematic Analysis of Endometrial Cancer-Associated Hub Proteins Based on Text Mining.
Systematic Analysis of Endometrial Cancer-Associated Hub Proteins Based on Text Mining.
Biomed Res Int. 2015;2015:615825
Authors: Gao H, Zhang Z
Abstract
OBJECTIVE: The aim of this study was to systematically characterize the expression of endometrial cancer- (EC-) associated genes and to analysis the functions, pathways, and networks of EC-associated hub proteins.
METHODS: Gene data for EC were extracted from the PubMed (MEDLINE) database using text mining based on NLP. PPI networks and pathways were integrated and obtained from the KEGG and other databases. Proteins that interacted with at least 10 other proteins were identified as the hub proteins of the EC-related genes network.
RESULTS: A total of 489 genes were identified as EC-related with P < 0.05, and 32 pathways were identified as significant (P < 0.05, FDR < 0.05). A network of EC-related proteins that included 271 interactions was constructed. The 17 proteins that interact with 10 or more other proteins (P < 0.05, FDR < 0.05) were identified as the hub proteins of this PPI network of EC-related genes. These 17 proteins are EGFR, MET, PDGFRB, CCND1, JUN, FGFR2, MYC, PIK3CA, PIK3R1, PIK3R2, KRAS, MAPK3, CTNNB1, RELA, JAK2, AKT1, and AKT2.
CONCLUSION: Our data may help to reveal the molecular mechanisms of EC development and provide implications for targeted therapy for EC. However, corrections between certain proteins and EC continue to require additional exploration.
PMID: 26366417 [PubMed - indexed for MEDLINE]
Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.
Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.
PLoS One. 2015;10(10):e0139245
Authors: Liu RL
Abstract
Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.
PMID: 26440794 [PubMed - indexed for MEDLINE]
Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases.
Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases.
Sci Rep. 2015;5:10888
Authors: Hoehndorf R, Schofield PN, Gkoutos GV
Abstract
Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.
PMID: 26051359 [PubMed - indexed for MEDLINE]
Xenbase: Core features, data acquisition, and data processing.
Xenbase: Core features, data acquisition, and data processing.
Genesis. 2015 Aug;53(8):486-97
Authors: James-Zorn C, Ponferrada VG, Burns KA, Fortriede JD, Lotay VS, Liu Y, Brad Karpinka J, Karimi K, Zorn AM, Vize PD
Abstract
Xenbase, the Xenopus model organism database (www.xenbase.org), is a cloud-based, web-accessible resource that integrates the diverse genomic and biological data from Xenopus research. Xenopus frogs are one of the major vertebrate animal models used for biomedical research, and Xenbase is the central repository for the enormous amount of data generated using this model tetrapod. The goal of Xenbase is to accelerate discovery by enabling investigators to make novel connections between molecular pathways in Xenopus and human disease. Our relational database and user-friendly interface make these data easy to query and allows investigators to quickly interrogate and link different data types in ways that would otherwise be difficult, time consuming, or impossible. Xenbase also enhances the value of these data through high-quality gene expression curation and data integration, by providing bioinformatics tools optimized for Xenopus experiments, and by linking Xenopus data to other model organisms and to human data. Xenbase draws in data via pipelines that download data, parse the content, and save them into appropriate files and database tables. Furthermore, Xenbase makes these data accessible to the broader biomedical community by continually providing annotated data updates to organizations such as NCBI, UniProtKB, and Ensembl. Here, we describe our bioinformatics, genome-browsing tools, data acquisition and sharing, our community submitted and literature curation pipelines, text-mining support, gene page features, and the curation of gene nomenclature and gene models.
PMID: 26150211 [PubMed - indexed for MEDLINE]
Systematic analysis of the molecular mechanism underlying atherosclerosis using a text mining approach.
Systematic analysis of the molecular mechanism underlying atherosclerosis using a text mining approach.
Hum Genomics. 2016;10(1):14
Authors: Xi D, Zhao J, Lai W, Guo Z
Abstract
BACKGROUND: Atherosclerosis is one of the common health threats all over the world. It is a complex heritable disease that affects arterial blood vessels. Chronic inflammatory response plays an important role in atherogenesis. There has been little success in fully identifying functionally important genes in the pathogenesis of atherosclerosis.
RESULTS: In the present study, we performed a systematic analysis of atherosclerosis-related genes using text mining. We identified a total of 1312 genes. Gene ontology (GO) analysis revealed that a total of 35 terms exhibited significance (p < 0.05) as overrepresented terms, indicating that atherosclerosis invokes many genes with a wide range of different functions. Pathway analysis demonstrated that the most highly enriched pathway is the Toll-like receptor signaling pathway. Finally, through gene network analysis, we prioritized 48 genes using the hub gene method.
CONCLUSIONS: Our study provides a valuable resource for the in-depth understanding of the mechanism underlying atherosclerosis.
PMID: 27251057 [PubMed - in process]
On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.
On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.
J Biomed Inform. 2015 Aug;56:318-32
Authors: Oronoz M, Gojenola K, Pérez A, de Ilarraza AD, Casillas A
Abstract
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.
PMID: 26141794 [PubMed - indexed for MEDLINE]
Identifying synonymy between relational phrases using word embeddings.
Identifying synonymy between relational phrases using word embeddings.
J Biomed Inform. 2015 Aug;56:94-102
Authors: Nguyen NT, Miwa M, Tsuruoka Y, Tojo S
Abstract
Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.
PMID: 26004792 [PubMed - indexed for MEDLINE]
Automatic endpoint detection to support the systematic review process.
Automatic endpoint detection to support the systematic review process.
J Biomed Inform. 2015 Aug;56:42-56
Authors: Blake C, Lucic A
Abstract
Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically extract three facets - two entities (the agent and object) and the way in which the entities are compared (the endpoint) - from direct comparative sentences in full-text articles. The system does not require a user to predefine entities in advance and thus can be used in domains where entity recognition is difficult or unavailable. As with a systematic review, the tabular summary produced using the automatically extracted facets shows how experimental results differ between studies. Experiments were conducted using a collection of more than 2million sentences from three journals Diabetes, Carcinogenesis and Endocrinology and two machine learning algorithms, support vector machines (SVM) and a general linear model (GLM). F1 and accuracy measures for the SVM and GLM differed by only 0.01 across all three comparison facets in a randomly selected set of test sentences. The system achieved the best performance of 92% for objects, whereas the accuracy for both agent and endpoints was 73%. F1 scores were higher for objects (0.77) than for endpoints (0.51) or agents (0.47). A situated evaluation of Metformin, a drug to treat diabetes, showed system accuracy of 95%, 83% and 79% for the object, endpoint and agent respectively. The situated evaluation had higher F1 scores of 0.88, 0.64 and 0.62 for object, endpoint, and agent respectively. On average, only 5.31% of the sentences in a full-text article are direct comparisons, but the tabular summaries suggest that these sentences provide a rich source of currently underutilized information that can be used to accelerate the systematic review process and identify gaps where future research should be focused.
PMID: 26003938 [PubMed - indexed for MEDLINE]
MET network in PubMed: a text-mined network visualization and curation system.
MET network in PubMed: a text-mined network visualization and curation system.
Database (Oxford). 2016;2016
Authors: Dai HJ, Su CH, Lai PT, Huang MS, Jonnagaddala J, Rose Jue T, Rao S, Chou HJ, Milacic M, Singh O, Syed-Abdul S, Hsu WL
Abstract
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway.
PMID: 27242035 [PubMed - in process]
Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.
Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.
J Proteomics. 2015 Aug 3;126:163-71
Authors: Núñez de Villavicencio-Díaz T, Ramos Gómez Y, Oliva Argüelles B, Fernández Masso JR, Rodríguez-Ulloa A, Cruz García Y, Guirola-Cruz O, Perez-Riverol Y, Javier González L, Tiscornia I, Victoria S, Bollati-Fogolín M, Besada Pérez V, Guerra Vallespi M
Abstract
The second generation peptide CIGB-552 has a pro-apoptotic effect on H460 non-small cell lung cancer cells and displays a potent cytotoxic effect in HT-29 colon adenocarcinoma cells though its action mechanism is ill defined. Here, we present the first proteomic study of peptide effect in HT-29 cells using subcellular fractionation, protein and peptide fractionation by DF-PAGE and LC-MS/MS peptide identification. In particular, we explored the nuclear proteome of HT-29 cells at a 5h treatment identifying a total of 68 differentially modulated proteins, 49 of which localize to the nucleus. The differentially modulated proteins were analyzed following a system biology approach. Results pointed to a modulation of apoptosis, oxidative damage removal, NF-κB activation, inflammatory signaling and of cell adhesion and motility. Further Western blot and flow-cytometry experiments confirmed both pro-apoptotic and anti-inflammatory effects of CIGB-552 peptide in HT-29 cells.
PMID: 26013411 [PubMed - indexed for MEDLINE]
Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.
Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.
JMIR Public Health Surveill. 2016 Jan-Jun;2(1):e17
Authors: Xu S, Markson C, Costello KL, Xing CY, Demissie K, Llanos AA
Abstract
BACKGROUND: As social media becomes increasingly popular online venues for engaging in communication about public health issues, it is important to understand how users promote knowledge and awareness about specific topics.
OBJECTIVE: The aim of this study is to examine the frequency of discussion and differences by race and ethnicity of cancer-related topics among unique users via Twitter.
METHODS: Tweets were collected from April 1, 2014 through January 21, 2015 using the Twitter public streaming Application Programming Interface (API) to collect 1% of public tweets. Twitter users were classified into racial and ethnic groups using a new text mining approach applied to English-only tweets. Each ethnic group was then analyzed for frequency in cancer-related terms within user timelines, investigated for changes over time and across groups, and measured for statistical significance.
RESULTS: Observable usage patterns of the terms "cancer", "breast cancer", "prostate cancer", and "lung cancer" between Caucasian and African American groups were evident across the study period. We observed some variation in the frequency of term usage during months known to be labeled as cancer awareness months, particularly September, October, and November. Interestingly, we found that of the terms studied, "colorectal cancer" received the least Twitter attention.
CONCLUSIONS: The findings of the study provide evidence that social media can serve as a very powerful and important tool in implementing and disseminating critical prevention, screening, and treatment messages to the community in real-time. The study also introduced and tested a new methodology of identifying race and ethnicity among users of the social media. Study findings highlight the potential benefits of social media as a tool in reducing racial and ethnic disparities.
PMID: 27227152 [PubMed]
Bioinformatic Studies to Predict MicroRNAs with the Potential of Uncoupling RECK Expression from Epithelial-Mesenchymal Transition in Cancer Cells.
Bioinformatic Studies to Predict MicroRNAs with the Potential of Uncoupling RECK Expression from Epithelial-Mesenchymal Transition in Cancer Cells.
Cancer Inform. 2016;15:91-102
Authors: Wang Z, Murakami R, Yuki K, Yoshida Y, Noda M
Abstract
RECK is downregulated in many tumors, and forced RECK expression in tumor cells often results in suppression of malignant phenotypes. Recent findings suggest that RECK is upregulated after epithelial-mesenchymal transition (EMT) in normal epithelium-derived cells but not in cancer cells. Since several microRNAs (miRs) are known to target RECK mRNA, we hypothesized that certain miR(s) may be involved in this suppression of RECK upregulation after EMT in cancer cells. To test this hypothesis, we used three approaches: (1) text mining to find miRs relevant to EMT in cancer cells, (2) predicting miR targets using four algorithms, and (3) comparing miR-seq data and RECK mRNA data using a novel non-parametric method. These approaches identified the miR-183-96-182 cluster as a strong candidate. We also looked for transcription factors and signaling molecules that may promote cancer EMT, miR-183-96-182 upregulation, and RECK downregulation. Here we describe our methods, findings, and a testable hypothesis on how RECK expression could be regulated in cancer cells after EMT.
PMID: 27226706 [PubMed]
SWIFT-Review: a text-mining workbench for systematic review.
SWIFT-Review: a text-mining workbench for systematic review.
Syst Rev. 2016;5(1):87
Authors: Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, Macleod M, Shah RR, Thayer K
Abstract
BACKGROUND: There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus.
METHODS: Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics.
RESULTS: Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation.
CONCLUSIONS: Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.
PMID: 27216467 [PubMed - in process]
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.
J Biomed Semantics. 2016;7(1):9
Authors: Gupta S, Ross KE, Tudor CO, Wu CH, Schmidt CJ, Vijay-Shanker K
Abstract
BACKGROUND: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation.
METHODS: We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence.
RESULTS: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list.
CONCLUSIONS: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."
PMID: 27216254 [PubMed - in process]
Improving Biochemical Named Entity Recognition Performance Using PSO Classifier Selection and Bayesian Combination Method.
Improving Biochemical Named Entity Recognition Performance Using PSO Classifier Selection and Bayesian Combination Method.
IEEE/ACM Trans Comput Biol Bioinform. 2016 May 18;
Authors: Akkasi A, Varoglu E
Abstract
Named Entity Recognition (NER) is a basic step for large number of consequent text mining tasks in the biochemical domain. Increasing the performance of such recognition systems is of high importance and always poses a challenge. In this study, a new community based decision making system is proposed which aims at increasing the efficiency of NER systems in the chemical/drug name context. Particle Swarm Optimization (PSO) algorithm is chosen as the expert selection strategy along with the Bayesian combination method to merge the outputs of the selected classifiers as well as evaluate the fitness of the selected candidates. The proposed system performs in two steps. The first step is focuses on creating various numbers of baseline classifiers for NER with different features sets using the Conditional Random Fields (CRFs). The second step involves the selection and efficient combination of the classifiers using PSO and Bayesisan combination. Two comprehensive corpora from BioCreative events, namely ChemDNER and CEMP, are used for the experiments conducted. Results show that the ensemble of classifiers selected by means of the proposed approach perform better than the single best classifier as well as ensembles formed using other popular selection/combination strategies for both corpora. Furthermore, the proposed method outperforms the best performing system at the Biocreative IV ChemDNER track by achieving an F-score of 87.95%.
PMID: 27214909 [PubMed - as supplied by publisher]