Drug-induced Adverse Events

Supporting systematic reviews using LDA-based document representations.
Supporting systematic reviews using LDA-based document representations.
Syst Rev. 2015;4:172
Authors: Mo Y, Kontonatsios G, Ananiadou S
Abstract
BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW).
METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation.
RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain.
CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers.
PMID: 26612232 [PubMed - indexed for MEDLINE]
Textual Analysis and Data Mining: An Interpreting Research on Nursing.
Textual Analysis and Data Mining: An Interpreting Research on Nursing.
Stud Health Technol Inform. 2016;225:948
Authors: De Caro W, Mitello L, Marucci AR, Lancia L, Sansoni J
Abstract
Every day there is a data explosion on the web. In 2013, 5 exabytes of content were created each day. Every hour internet networks carries a quantity of texts equivalent to twenty billion books. For idea Iit is a huge mass of information on the linguistic behavior of people and society that was unthinkable until a few years ago. It is an opportunity for valuable analysis for understanding social phenomena, also in nursing and health care sector.This poster shows the the steps of an idealy strategy for textual statistical analysis and the process of extracting useful information about health care, referring expecially nursing care from journal and web information. We show the potential of web tools of Text Mining applications (DTM, Wordle, Voyant Tools, Taltac 2.10, Treecloud and other web 2.0 app) analyzing text data and information extraction about sentiment, perception, scientific activites and visibility of nursing. This specific analysis is conduct analyzing "Repubblica", first newspaper in Italy (years of analisys: 2012-14) and one italian scientific nursing journal (years: 2012-14).
PMID: 27332424 [PubMed - as supplied by publisher]
Application of Text Mining in Cancer Symptom Management.
Application of Text Mining in Cancer Symptom Management.
Stud Health Technol Inform. 2016;225:930-931
Authors: Lee YJ, Donovan H
Abstract
Fatigue continues to be one of the main symptoms that afflict ovarian cancer patients and negatively affects their functional status and quality of life. To manage fatigue effectively, the symptom must be understood from the perspective of patients. We utilized text mining to understand the symptom experiences and strategies that were associated with fatigue among ovarian cancer patients. Through text analysis, we determined that descriptors such as energetic, challenging, frustrating, struggling, unmanageable, and agony were associated with fatigue. Descriptors such as decadron, encourager, grocery, massage, relaxing, shower, sleep, zoloft, and church were associated with strategies to ameliorate fatigue. This study demonstrates the potential of applying text mining in cancer research to understand patients' perspective on symptom management. Future study will consider various factors to refine the results.
PMID: 27332415 [PubMed - as supplied by publisher]
Mining Clinicians' Electronic Documentation to Identify Heart Failure Patients with Ineffective Self-Management: A Pilot Text-Mining Study.
Mining Clinicians' Electronic Documentation to Identify Heart Failure Patients with Ineffective Self-Management: A Pilot Text-Mining Study.
Stud Health Technol Inform. 2016;225:856-857
Authors: Topaz M, Radhakrishnan K, Lei V, Zhou L
Abstract
Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.
PMID: 27332377 [PubMed - as supplied by publisher]
Using a Text-Mining Approach to Evaluate the Quality of Nursing Records.
Using a Text-Mining Approach to Evaluate the Quality of Nursing Records.
Stud Health Technol Inform. 2016;225:813-814
Authors: Chang HM, Chiou SF, Liu HY, Yu HC
Abstract
Nursing records in Taiwan have been computerized, but their quality has rarely been discussed. Therefore, this study employed a text-mining approach and a cross-sectional retrospective research design to evaluate the quality of electronic nursing records at a medical center in Northern Taiwan. SAS Text Miner software Version 13.2 was employed to analyze unstructured nursing event records. The results show that SAS Text Miner is suitable for developing a textmining model for validating nursing records. The sensitivity of SAS Text Miner was approximately 0.94, and the specificity and accuracy were 0.99. Thus, SAS Text Miner software is an effective tool for auditing unstructured electronic nursing records.
PMID: 27332355 [PubMed - as supplied by publisher]
A Survey of Bioinformatics Database and Software Usage through Mining the Literature.
A Survey of Bioinformatics Database and Software Usage through Mining the Literature.
PLoS One. 2016;11(6):e0157989
Authors: Duck G, Nenadic G, Filannino M, Brass A, Robertson DL, Stevens R
Abstract
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.
PMID: 27331905 [PubMed - as supplied by publisher]
GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.
GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.
Biomed Res Int. 2015;2015:918710
Authors: Wei CH, Kao HY, Lu Z
Abstract
The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers. In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection. We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset. The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator.
PMID: 26380306 [PubMed - indexed for MEDLINE]
Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records.
Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records.
Biomed Res Int. 2015;2015:636371
Authors: Jonnagaddala J, Liaw ST, Ray P, Kumar M, Dai HJ, Hsu CY
Abstract
Heart disease is the leading cause of death worldwide. Therefore, assessing the risk of its occurrence is a crucial step in predicting serious cardiac events. Identifying heart disease risk factors and tracking their progression is a preliminary step in heart disease risk assessment. A large number of studies have reported the use of risk factor data collected prospectively. Electronic health record systems are a great resource of the required risk factor data. Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records. In this study, we present an information extraction system to extract related information on heart disease risk factors from unstructured clinical notes using a hybrid approach. The hybrid approach employs both machine learning and rule-based clinical text mining techniques. The developed system achieved an overall microaveraged F-score of 0.8302.
PMID: 26380290 [PubMed - indexed for MEDLINE]
Text Mining for Translational Bioinformatics.
Text Mining for Translational Bioinformatics.
Biomed Res Int. 2015;2015:368264
Authors: Dai HJ, Wei CH, Kao HY, Liu RL, Tsai RT, Lu Z
PMID: 26380272 [PubMed - indexed for MEDLINE]
TRRUST: a reference database of human transcriptional regulatory interactions.
TRRUST: a reference database of human transcriptional regulatory interactions.
Sci Rep. 2015;5:11432
Authors: Han H, Shim H, Shin D, Shim JE, Ko Y, Shin J, Kim H, Cho A, Kim E, Lee T, Kim H, Kim K, Yang S, Bae D, Yun A, Kim S, Kim CY, Cho HJ, Kang B, Shin S, Lee I
Abstract
The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.
PMID: 26066708 [PubMed - indexed for MEDLINE]
Using Literature-Based Discovery to Explain Adverse Drug Effects.
Using Literature-Based Discovery to Explain Adverse Drug Effects.
J Med Syst. 2016 Aug;40(8):185
Authors: Hristovski D, Kastrin A, Dinevski D, Burgun A, Žiberna L, Rindflesch TC
Abstract
We report on our research in using literature-based discovery (LBD) to provide pharmacological and/or pharmacogenomic explanations for reported adverse drug effects. The goal of LBD is to generate novel and potentially useful hypotheses by analyzing the scientific literature and optionally some additional resources. Our assumption is that drugs have effects on some genes or proteins and that these genes or proteins are associated with the observed adverse effects. Therefore, by using LBD we try to find genes or proteins that link the drugs with the reported adverse effects. These genes or proteins can be used to provide insight into the processes causing the adverse effects. Initial results show that our method has the potential to assist in explaining reported adverse drug effects.
PMID: 27318993 [PubMed - as supplied by publisher]
@MInter: Automated Text-mining of Microbial Interactions.
@MInter: Automated Text-mining of Microbial Interactions.
Bioinformatics. 2016 Jun 16;
Authors: Lim KM, Li C, Chng KR, Nagarajan N
Abstract
MOTIVATION: Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible.
RESULTS: We present @MInter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @MInter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3,917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @MInter is able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity = 95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @MInter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @MInter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities.
AVAILABILITY: @MInter is freely available at https://github.com/CSB5/atminter CONTACT: nagarajann@gis.a-star.edu.sg SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.
PMID: 27312413 [PubMed - as supplied by publisher]
DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.
DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.
Bioinformatics. 2016 Jun 15;32(12):i70-i79
Authors: Peng S, You R, Wang H, Zhai C, Mamitsuka H, Zhu S
Abstract
MOTIVATION: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well.
METHODS: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation.
RESULTS: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.
AVAILABILITY AND IMPLEMENTATION: The software is available upon request.
CONTACT: zhusf@fudan.edu.cn
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID: 27307646 [PubMed - in process]
Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.
Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.
Database (Oxford). 2016;2016
Authors: Bravo À, Li TS, Su AI, Good BM, Furlong LI
Abstract
Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V.
PMID: 27307137 [PubMed - in process]
Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs.
Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs.
Comput Intell Neurosci. 2015;2015:123028
Authors: Chen H, Han D, Dai Y, Zhao L
Abstract
In recent years, Massive Open Online Courses (MOOCs) are very popular among college students and have a powerful impact on academic institutions. In the MOOCs environment, knowledge discovery and knowledge sharing are very important, which currently are often achieved by ontology techniques. In building ontology, automatic extraction technology is crucial. Because the general methods of text mining algorithm do not have obvious effect on online course, we designed automatic extracting course knowledge points (AECKP) algorithm for online course. It includes document classification, Chinese word segmentation, and POS tagging for each document. Vector Space Model (VSM) is used to calculate similarity and design the weight to optimize the TF-IDF algorithm output values, and the higher scores will be selected as knowledge points. Course documents of "C programming language" are selected for the experiment in this study. The results show that the proposed approach can achieve satisfactory accuracy rate and recall rate.
PMID: 26448738 [PubMed - indexed for MEDLINE]
DermO; an ontology for the description of dermatologic disease.
DermO; an ontology for the description of dermatologic disease.
J Biomed Semantics. 2016;7:38
Authors: Fisher HM, Hoehndorf R, Bazelato BS, Dadras SS, King LE, Gkoutos GV, Sundberg JP, Schofield PN
Abstract
BACKGROUND: There have been repeated initiatives to produce standard nosologies and terminologies for cutaneous disease, some dedicated to the domain and some part of bigger terminologies such as ICD-10. Recently, formally structured terminologies, ontologies, have been widely developed in many areas of biomedical research. Primarily, these address the aim of providing comprehensive working terminologies for domains of knowledge, but because of the knowledge contained in the relationships between terms they can also be used computationally for many purposes.
RESULTS: We have developed an ontology of cutaneous disease, constructed manually by domain experts. With more than 3000 terms, DermO represents the most comprehensive formal dermatological disease terminology available. The disease entities are categorized in 20 upper level terms, which use a variety of features such as anatomical location, heritability, affected cell or tissue type, or etiology, as the features for classification, in line with professional practice and nosology in dermatology. Available in OBO flatfile and OWL 2 formats, it is integrated semantically with other ontologies and terminologies describing diseases and phenotypes. We demonstrate the application of DermO to text mining the biomedical literature and in the creation of a network describing the phenotypic relationships between cutaneous diseases.
CONCLUSIONS: DermO is an ontology with broad coverage of the domain of dermatologic disease and we demonstrate here its utility for text mining and investigation of phenotypic relationships between dermatologic disorders. We envision that in the future it may be applied to the creation and mining of electronic health records, clinical training and basic research, as it supports automated inference and reasoning, and for the broader integration of skin disease information with that from other domains.
PMID: 27296450 [PubMed - in process]
A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set.
A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set.
Comput Math Methods Med. 2015;2015:910423
Authors: Muzaffar AW, Azam F, Qamar U
Abstract
The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.
PMID: 26347797 [PubMed - indexed for MEDLINE]
Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method.
Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method.
BMJ Open. 2015;5(9):e007619
Authors: Patel R, Jayatilleke N, Broadbent M, Chang CK, Foskett N, Gorrell G, Hayes RD, Jackson R, Johnston C, Shetty H, Roberts A, McGuire P, Stewart R
Abstract
OBJECTIVES: To identify negative symptoms in the clinical records of a large sample of patients with schizophrenia using natural language processing and assess their relationship with clinical outcomes.
DESIGN: Observational study using an anonymised electronic health record case register.
SETTING: South London and Maudsley NHS Trust (SLaM), a large provider of inpatient and community mental healthcare in the UK.
PARTICIPANTS: 7678 patients with schizophrenia receiving care during 2011.
MAIN OUTCOME MEASURES: Hospital admission, readmission and duration of admission.
RESULTS: 10 different negative symptoms were ascertained with precision statistics above 0.80. 41% of patients had 2 or more negative symptoms. Negative symptoms were associated with younger age, male gender and single marital status, and with increased likelihood of hospital admission (OR 1.24, 95% CI 1.10 to 1.39), longer duration of admission (β-coefficient 20.5 days, 7.6-33.5), and increased likelihood of readmission following discharge (OR 1.58, 1.28 to 1.95).
CONCLUSIONS: Negative symptoms were common and associated with adverse clinical outcomes, consistent with evidence that these symptoms account for much of the disability associated with schizophrenia. Natural language processing provides a means of conducting research in large representative samples of patients, using data recorded during routine clinical practice.
PMID: 26346872 [PubMed - indexed for MEDLINE]
DrugQuest - a text mining workflow for drug association discovery.
DrugQuest - a text mining workflow for drug association discovery.
BMC Bioinformatics. 2016;17(Suppl 5):182
Authors: Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I
Abstract
BACKGROUND: Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases.
RESULTS: Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface.
CONCLUSIONS: DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
PMID: 27295093 [PubMed - as supplied by publisher]
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.
BMC Bioinformatics. 2015;16 Suppl 16:S4
Authors: Björne J, Salakoski T
Abstract
BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks.
RESULTS: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets.
CONCLUSIONS: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
PMID: 26551925 [PubMed - indexed for MEDLINE]