Drug-induced Adverse Events

A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set.

Wed, 2016-06-15 09:02
Related Articles

A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set.

Comput Math Methods Med. 2015;2015:910423

Authors: Muzaffar AW, Azam F, Qamar U

Abstract
The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.

PMID: 26347797 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method.

Wed, 2016-06-15 09:02
Related Articles

Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method.

BMJ Open. 2015;5(9):e007619

Authors: Patel R, Jayatilleke N, Broadbent M, Chang CK, Foskett N, Gorrell G, Hayes RD, Jackson R, Johnston C, Shetty H, Roberts A, McGuire P, Stewart R

Abstract
OBJECTIVES: To identify negative symptoms in the clinical records of a large sample of patients with schizophrenia using natural language processing and assess their relationship with clinical outcomes.
DESIGN: Observational study using an anonymised electronic health record case register.
SETTING: South London and Maudsley NHS Trust (SLaM), a large provider of inpatient and community mental healthcare in the UK.
PARTICIPANTS: 7678 patients with schizophrenia receiving care during 2011.
MAIN OUTCOME MEASURES: Hospital admission, readmission and duration of admission.
RESULTS: 10 different negative symptoms were ascertained with precision statistics above 0.80. 41% of patients had 2 or more negative symptoms. Negative symptoms were associated with younger age, male gender and single marital status, and with increased likelihood of hospital admission (OR 1.24, 95% CI 1.10 to 1.39), longer duration of admission (β-coefficient 20.5 days, 7.6-33.5), and increased likelihood of readmission following discharge (OR 1.58, 1.28 to 1.95).
CONCLUSIONS: Negative symptoms were common and associated with adverse clinical outcomes, consistent with evidence that these symptoms account for much of the disability associated with schizophrenia. Natural language processing provides a means of conducting research in large representative samples of patients, using data recorded during routine clinical practice.

PMID: 26346872 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

DrugQuest - a text mining workflow for drug association discovery.

Tue, 2016-06-14 08:47
Related Articles

DrugQuest - a text mining workflow for drug association discovery.

BMC Bioinformatics. 2016;17(Suppl 5):182

Authors: Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I

Abstract
BACKGROUND: Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases.
RESULTS: Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface.
CONCLUSIONS: DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .

PMID: 27295093 [PubMed - as supplied by publisher]

Categories: Literature Watch

TEES 2.2: Biomedical Event Extraction for Diverse Corpora.

Tue, 2016-06-14 08:47
Related Articles

TEES 2.2: Biomedical Event Extraction for Diverse Corpora.

BMC Bioinformatics. 2015;16 Suppl 16:S4

Authors: Björne J, Salakoski T

Abstract
BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks.
RESULTS: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets.
CONCLUSIONS: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.

PMID: 26551925 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis.

Tue, 2016-06-14 08:47
Related Articles

Application of the EVEX resource to event extraction and network construction: Shared Task entry and result analysis.

BMC Bioinformatics. 2015;16 Suppl 16:S3

Authors: Hakala K, Van Landeghem S, Salakoski T, Van de Peer Y, Ginter F

Abstract
BACKGROUND: Modern methods for mining biomolecular interactions from literature typically make predictions based solely on the immediate textual context, in effect a single sentence. No prior work has been published on extending this context to the information automatically gathered from the whole biomedical literature. Thus, our motivation for this study is to explore whether mutually supporting evidence, aggregated across several documents can be utilized to improve the performance of the state-of-the-art event extraction systems.
RESULTS: In the GE task, our re-ranking approach led to a modest performance increase and resulted in the first rank of the official Shared Task results with 50.97% F-score. Additionally, in this paper we explore and evaluate the usage of distributed vector representations for this challenge.
CONCLUSIONS: For the GRN task, we were able to produce a gene regulatory network from the EVEX data, warranting the use of such generic large-scale text mining data in network biology settings. A detailed performance and error analysis provides more insight into the relatively low recall rates.

PMID: 26551766 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets.

Tue, 2016-06-14 08:47
Related Articles

SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets.

Nucleic Acids Res. 2016 Jan 4;44(D1):D1011-7

Authors: Guo J, Liu H, Zheng J

Abstract
Synthetic lethality (SL) is a type of genetic interaction between two genes such that simultaneous perturbations of the two genes result in cell death or a dramatic decrease of cell viability, while a perturbation of either gene alone is not lethal. SL reflects the biologically endogenous difference between cancer cells and normal cells, and thus the inhibition of SL partners of genes with cancer-specific mutations could selectively kill cancer cells but spare normal cells. Therefore, SL is emerging as a promising anticancer strategy that could potentially overcome the drawbacks of traditional chemotherapies by reducing severe side effects. Researchers have developed experimental technologies and computational prediction methods to identify SL gene pairs on human and a few model species. However, there has not been a comprehensive database dedicated to collecting SL pairs and related knowledge. In this paper, we propose a comprehensive database, SynLethDB (http://histone.sce.ntu.edu.sg/SynLethDB/), which contains SL pairs collected from biochemical assays, other related databases, computational predictions and text mining results on human and four model species, i.e. mouse, fruit fly, worm and yeast. For each SL pair, a confidence score was calculated by integrating individual scores derived from different evidence sources. We also developed a statistical analysis module to estimate the druggability and sensitivity of cancer cells upon drug treatments targeting human SL partners, based on large-scale genomic data, gene expression profiles and drug sensitivity profiles on more than 1000 cancer cell lines. To help users access and mine the wealth of the data, we developed other practical functionalities, such as search and filtering, orthology search, gene set enrichment analysis. Furthermore, a user-friendly web interface has been implemented to facilitate data analysis and interpretation. With the integrated data sets and analytics functionalities, SynLethDB would be a useful resource for biomedical research community and pharmaceutical industry.

PMID: 26516187 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

TaggerOne: Joint Named Entity Recognition and Normalization with Semi-Markov Models.

Sat, 2016-06-11 07:56

TaggerOne: Joint Named Entity Recognition and Normalization with Semi-Markov Models.

Bioinformatics. 2016 Jun 9;

Authors: Leaman R, Lu Z

Abstract
MOTIVATION: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization.
METHODS: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput.
RESULTS: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance.
AVAILABILITY: TaggerOne will be made open source upon acceptance. Demonstration available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/demo/TaggerOne/demo.cgi CONTACT: zhiyong.lu@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID: 27283952 [PubMed - as supplied by publisher]

Categories: Literature Watch

Mining clinical attributes of genomic variants through assisted literature curation in Egas.

Fri, 2016-06-10 07:43

Mining clinical attributes of genomic variants through assisted literature curation in Egas.

Database (Oxford). 2016;2016

Authors: Matos S, Campos D, Pinho R, Silva RM, Mort M, Cooper DN, Oliveira JL

Abstract
The veritable deluge of biological data over recent years has led to the establishment of a considerable number of knowledge resources that compile curated information extracted from the literature and store it in structured form, facilitating its use and exploitation. In this article, we focus on the curation of inherited genetic variants and associated clinical attributes, such as zygosity, penetrance or inheritance mode, and describe the use of Egas for this task. Egas is a web-based platform for text-mining assisted literature curation that focuses on usability through modern design solutions and simple user interactions. Egas offers a flexible and customizable tool that allows defining the concept types and relations of interest for a given annotation task, as well as the ontologies used for normalizing each concept type. Further, annotations may be performed on raw documents or on the results of automated concept identification and relation extraction tools. Users can inspect, correct or remove automatic text-mining results, manually add new annotations, and export the results to standard formats. Egas is compatible with the most recent versions of Google Chrome, Mozilla Firefox, Internet Explorer and Safari and is available for use at https://demo.bmd-software.com/egas/Database URL: https://demo.bmd-software.com/egas/.

PMID: 27278817 [PubMed - in process]

Categories: Literature Watch

Overlap in drug-disease associations between clinical practice guidelines and drug structured product label indications.

Fri, 2016-06-10 07:43

Overlap in drug-disease associations between clinical practice guidelines and drug structured product label indications.

J Biomed Semantics. 2016;7:37

Authors: Leung TI, Dumontier M

Abstract
BACKGROUND: Clinical practice guidelines (CPGs) recommend pharmacologic treatments for clinical conditions, and drug structured product labels (SPLs) summarize approved treatment indications. Both resources are intended to promote evidence-based medical practices and guide clinicians' prescribing decisions. However, it is unclear how well CPG recommendations about pharmacologic therapies match SPL indications for recommended drugs. In this study, we perform text mining of CPG summaries to examine drug-disease associations in CPG recommendations and in SPL treatment indications for 15 common chronic conditions.
METHODS: We constructed an initial text corpus of guideline summaries from the National Guideline Clearinghouse (NGC) from a set of manually selected ICD-9 codes for each of the 15 conditions. We obtained 377 relevant guideline summaries and their Major Recommendations section, which excludes guidelines for pediatric patients, pregnant or breastfeeding women, or for medical diagnoses not meeting inclusion criteria. A vocabulary of drug terms was derived from five medical taxonomies. We used named entity recognition, in combination with dictionary-based and ontology-based methods, to identify drug term occurrences in the text corpus and construct drug-disease associations. The ATC (Anatomical Therapeutic Chemical Classification) was utilized to perform drug name and drug class matching to construct the drug-disease associations from CPGs. We then obtained drug-disease associations from SPLs using conditions mentioned in their Indications section in SIDER. The primary outcomes were the frequency of drug-disease associations in CPGs and SPLs, and the frequency of overlap between the two sets of drug-disease associations, with and without using taxonomic information from ATC.
RESULTS: Without taxonomic information, we identified 1444 drug-disease associations across CPGs and SPLs for 15 common chronic conditions. Of these, 195 drug-disease associations overlapped between CPGs and SPLs, 917 associations occurred in CPGs only and 332 associations occurred in SPLs only. With taxonomic information, 859 unique drug-disease associations were identified, of which 152 of these drug-disease associations overlapped between CPGs and SPLs, 541 associations occurred in CPGs only, and 166 associations occurred in SPLs only.
CONCLUSIONS: Our results suggest that CPG-recommended pharmacologic therapies and SPL indications do not overlap frequently when identifying drug-disease associations using named entity recognition, although incorporating taxonomic relationships between drug names and drug classes into the approach improves the overlap. This has important implications in practice because conflicting or inconsistent evidence may complicate clinical decision making and implementation or measurement of best practices.

PMID: 27277160 [PubMed - in process]

Categories: Literature Watch

Systematic Analysis of Endometrial Cancer-Associated Hub Proteins Based on Text Mining.

Fri, 2016-06-10 07:43
Related Articles

Systematic Analysis of Endometrial Cancer-Associated Hub Proteins Based on Text Mining.

Biomed Res Int. 2015;2015:615825

Authors: Gao H, Zhang Z

Abstract
OBJECTIVE: The aim of this study was to systematically characterize the expression of endometrial cancer- (EC-) associated genes and to analysis the functions, pathways, and networks of EC-associated hub proteins.
METHODS: Gene data for EC were extracted from the PubMed (MEDLINE) database using text mining based on NLP. PPI networks and pathways were integrated and obtained from the KEGG and other databases. Proteins that interacted with at least 10 other proteins were identified as the hub proteins of the EC-related genes network.
RESULTS: A total of 489 genes were identified as EC-related with P < 0.05, and 32 pathways were identified as significant (P < 0.05, FDR < 0.05). A network of EC-related proteins that included 271 interactions was constructed. The 17 proteins that interact with 10 or more other proteins (P < 0.05, FDR < 0.05) were identified as the hub proteins of this PPI network of EC-related genes. These 17 proteins are EGFR, MET, PDGFRB, CCND1, JUN, FGFR2, MYC, PIK3CA, PIK3R1, PIK3R2, KRAS, MAPK3, CTNNB1, RELA, JAK2, AKT1, and AKT2.
CONCLUSION: Our data may help to reveal the molecular mechanisms of EC development and provide implications for targeted therapy for EC. However, corrections between certain proteins and EC continue to require additional exploration.

PMID: 26366417 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.

Thu, 2016-06-09 16:35
Related Articles

Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.

PLoS One. 2015;10(10):e0139245

Authors: Liu RL

Abstract
Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.

PMID: 26440794 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases.

Thu, 2016-06-09 16:35
Related Articles

Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases.

Sci Rep. 2015;5:10888

Authors: Hoehndorf R, Schofield PN, Gkoutos GV

Abstract
Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.

PMID: 26051359 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Xenbase: Core features, data acquisition, and data processing.

Sat, 2016-06-04 06:07
Related Articles

Xenbase: Core features, data acquisition, and data processing.

Genesis. 2015 Aug;53(8):486-97

Authors: James-Zorn C, Ponferrada VG, Burns KA, Fortriede JD, Lotay VS, Liu Y, Brad Karpinka J, Karimi K, Zorn AM, Vize PD

Abstract
Xenbase, the Xenopus model organism database (www.xenbase.org), is a cloud-based, web-accessible resource that integrates the diverse genomic and biological data from Xenopus research. Xenopus frogs are one of the major vertebrate animal models used for biomedical research, and Xenbase is the central repository for the enormous amount of data generated using this model tetrapod. The goal of Xenbase is to accelerate discovery by enabling investigators to make novel connections between molecular pathways in Xenopus and human disease. Our relational database and user-friendly interface make these data easy to query and allows investigators to quickly interrogate and link different data types in ways that would otherwise be difficult, time consuming, or impossible. Xenbase also enhances the value of these data through high-quality gene expression curation and data integration, by providing bioinformatics tools optimized for Xenopus experiments, and by linking Xenopus data to other model organisms and to human data. Xenbase draws in data via pipelines that download data, parse the content, and save them into appropriate files and database tables. Furthermore, Xenbase makes these data accessible to the broader biomedical community by continually providing annotated data updates to organizations such as NCBI, UniProtKB, and Ensembl. Here, we describe our bioinformatics, genome-browsing tools, data acquisition and sharing, our community submitted and literature curation pipelines, text-mining support, gene page features, and the curation of gene nomenclature and gene models.

PMID: 26150211 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Systematic analysis of the molecular mechanism underlying atherosclerosis using a text mining approach.

Fri, 2016-06-03 08:52
Related Articles

Systematic analysis of the molecular mechanism underlying atherosclerosis using a text mining approach.

Hum Genomics. 2016;10(1):14

Authors: Xi D, Zhao J, Lai W, Guo Z

Abstract
BACKGROUND: Atherosclerosis is one of the common health threats all over the world. It is a complex heritable disease that affects arterial blood vessels. Chronic inflammatory response plays an important role in atherogenesis. There has been little success in fully identifying functionally important genes in the pathogenesis of atherosclerosis.
RESULTS: In the present study, we performed a systematic analysis of atherosclerosis-related genes using text mining. We identified a total of 1312 genes. Gene ontology (GO) analysis revealed that a total of 35 terms exhibited significance (p < 0.05) as overrepresented terms, indicating that atherosclerosis invokes many genes with a wide range of different functions. Pathway analysis demonstrated that the most highly enriched pathway is the Toll-like receptor signaling pathway. Finally, through gene network analysis, we prioritized 48 genes using the hub gene method.
CONCLUSIONS: Our study provides a valuable resource for the in-depth understanding of the mechanism underlying atherosclerosis.

PMID: 27251057 [PubMed - in process]

Categories: Literature Watch

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

Thu, 2016-06-02 08:35
Related Articles

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

J Biomed Inform. 2015 Aug;56:318-32

Authors: Oronoz M, Gojenola K, Pérez A, de Ilarraza AD, Casillas A

Abstract
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.

PMID: 26141794 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Identifying synonymy between relational phrases using word embeddings.

Thu, 2016-06-02 08:35
Related Articles

Identifying synonymy between relational phrases using word embeddings.

J Biomed Inform. 2015 Aug;56:94-102

Authors: Nguyen NT, Miwa M, Tsuruoka Y, Tojo S

Abstract
Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.

PMID: 26004792 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Automatic endpoint detection to support the systematic review process.

Thu, 2016-06-02 08:35
Related Articles

Automatic endpoint detection to support the systematic review process.

J Biomed Inform. 2015 Aug;56:42-56

Authors: Blake C, Lucic A

Abstract
Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically extract three facets - two entities (the agent and object) and the way in which the entities are compared (the endpoint) - from direct comparative sentences in full-text articles. The system does not require a user to predefine entities in advance and thus can be used in domains where entity recognition is difficult or unavailable. As with a systematic review, the tabular summary produced using the automatically extracted facets shows how experimental results differ between studies. Experiments were conducted using a collection of more than 2million sentences from three journals Diabetes, Carcinogenesis and Endocrinology and two machine learning algorithms, support vector machines (SVM) and a general linear model (GLM). F1 and accuracy measures for the SVM and GLM differed by only 0.01 across all three comparison facets in a randomly selected set of test sentences. The system achieved the best performance of 92% for objects, whereas the accuracy for both agent and endpoints was 73%. F1 scores were higher for objects (0.77) than for endpoints (0.51) or agents (0.47). A situated evaluation of Metformin, a drug to treat diabetes, showed system accuracy of 95%, 83% and 79% for the object, endpoint and agent respectively. The situated evaluation had higher F1 scores of 0.88, 0.64 and 0.62 for object, endpoint, and agent respectively. On average, only 5.31% of the sentences in a full-text article are direct comparisons, but the tabular summaries suggest that these sentences provide a rich source of currently underutilized information that can be used to accelerate the systematic review process and identify gaps where future research should be focused.

PMID: 26003938 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

MET network in PubMed: a text-mined network visualization and curation system.

Wed, 2016-06-01 08:22
Related Articles

MET network in PubMed: a text-mined network visualization and curation system.

Database (Oxford). 2016;2016

Authors: Dai HJ, Su CH, Lai PT, Huang MS, Jonnagaddala J, Rose Jue T, Rao S, Chou HJ, Milacic M, Singh O, Syed-Abdul S, Hsu WL

Abstract
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway.

PMID: 27242035 [PubMed - in process]

Categories: Literature Watch

Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.

Tue, 2016-05-31 08:07
Related Articles

Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.

J Proteomics. 2015 Aug 3;126:163-71

Authors: Núñez de Villavicencio-Díaz T, Ramos Gómez Y, Oliva Argüelles B, Fernández Masso JR, Rodríguez-Ulloa A, Cruz García Y, Guirola-Cruz O, Perez-Riverol Y, Javier González L, Tiscornia I, Victoria S, Bollati-Fogolín M, Besada Pérez V, Guerra Vallespi M

Abstract
The second generation peptide CIGB-552 has a pro-apoptotic effect on H460 non-small cell lung cancer cells and displays a potent cytotoxic effect in HT-29 colon adenocarcinoma cells though its action mechanism is ill defined. Here, we present the first proteomic study of peptide effect in HT-29 cells using subcellular fractionation, protein and peptide fractionation by DF-PAGE and LC-MS/MS peptide identification. In particular, we explored the nuclear proteome of HT-29 cells at a 5h treatment identifying a total of 68 differentially modulated proteins, 49 of which localize to the nucleus. The differentially modulated proteins were analyzed following a system biology approach. Results pointed to a modulation of apoptosis, oxidative damage removal, NF-κB activation, inflammatory signaling and of cell adhesion and motility. Further Western blot and flow-cytometry experiments confirmed both pro-apoptotic and anti-inflammatory effects of CIGB-552 peptide in HT-29 cells.

PMID: 26013411 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.

Fri, 2016-05-27 07:02
Related Articles

Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.

JMIR Public Health Surveill. 2016 Jan-Jun;2(1):e17

Authors: Xu S, Markson C, Costello KL, Xing CY, Demissie K, Llanos AA

Abstract
BACKGROUND: As social media becomes increasingly popular online venues for engaging in communication about public health issues, it is important to understand how users promote knowledge and awareness about specific topics.
OBJECTIVE: The aim of this study is to examine the frequency of discussion and differences by race and ethnicity of cancer-related topics among unique users via Twitter.
METHODS: Tweets were collected from April 1, 2014 through January 21, 2015 using the Twitter public streaming Application Programming Interface (API) to collect 1% of public tweets. Twitter users were classified into racial and ethnic groups using a new text mining approach applied to English-only tweets. Each ethnic group was then analyzed for frequency in cancer-related terms within user timelines, investigated for changes over time and across groups, and measured for statistical significance.
RESULTS: Observable usage patterns of the terms "cancer", "breast cancer", "prostate cancer", and "lung cancer" between Caucasian and African American groups were evident across the study period. We observed some variation in the frequency of term usage during months known to be labeled as cancer awareness months, particularly September, October, and November. Interestingly, we found that of the terms studied, "colorectal cancer" received the least Twitter attention.
CONCLUSIONS: The findings of the study provide evidence that social media can serve as a very powerful and important tool in implementing and disseminating critical prevention, screening, and treatment messages to the community in real-time. The study also introduced and tested a new methodology of identifying race and ethnicity among users of the social media. Study findings highlight the potential benefits of social media as a tool in reducing racial and ethnic disparities.

PMID: 27227152 [PubMed]

Categories: Literature Watch

Pages