Drug-induced Adverse Events

Xenbase: Core features, data acquisition, and data processing.

Sat, 2016-06-04 06:07
Related Articles

Xenbase: Core features, data acquisition, and data processing.

Genesis. 2015 Aug;53(8):486-97

Authors: James-Zorn C, Ponferrada VG, Burns KA, Fortriede JD, Lotay VS, Liu Y, Brad Karpinka J, Karimi K, Zorn AM, Vize PD

Abstract
Xenbase, the Xenopus model organism database (www.xenbase.org), is a cloud-based, web-accessible resource that integrates the diverse genomic and biological data from Xenopus research. Xenopus frogs are one of the major vertebrate animal models used for biomedical research, and Xenbase is the central repository for the enormous amount of data generated using this model tetrapod. The goal of Xenbase is to accelerate discovery by enabling investigators to make novel connections between molecular pathways in Xenopus and human disease. Our relational database and user-friendly interface make these data easy to query and allows investigators to quickly interrogate and link different data types in ways that would otherwise be difficult, time consuming, or impossible. Xenbase also enhances the value of these data through high-quality gene expression curation and data integration, by providing bioinformatics tools optimized for Xenopus experiments, and by linking Xenopus data to other model organisms and to human data. Xenbase draws in data via pipelines that download data, parse the content, and save them into appropriate files and database tables. Furthermore, Xenbase makes these data accessible to the broader biomedical community by continually providing annotated data updates to organizations such as NCBI, UniProtKB, and Ensembl. Here, we describe our bioinformatics, genome-browsing tools, data acquisition and sharing, our community submitted and literature curation pipelines, text-mining support, gene page features, and the curation of gene nomenclature and gene models.

PMID: 26150211 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Systematic analysis of the molecular mechanism underlying atherosclerosis using a text mining approach.

Fri, 2016-06-03 08:52
Related Articles

Systematic analysis of the molecular mechanism underlying atherosclerosis using a text mining approach.

Hum Genomics. 2016;10(1):14

Authors: Xi D, Zhao J, Lai W, Guo Z

Abstract
BACKGROUND: Atherosclerosis is one of the common health threats all over the world. It is a complex heritable disease that affects arterial blood vessels. Chronic inflammatory response plays an important role in atherogenesis. There has been little success in fully identifying functionally important genes in the pathogenesis of atherosclerosis.
RESULTS: In the present study, we performed a systematic analysis of atherosclerosis-related genes using text mining. We identified a total of 1312 genes. Gene ontology (GO) analysis revealed that a total of 35 terms exhibited significance (p < 0.05) as overrepresented terms, indicating that atherosclerosis invokes many genes with a wide range of different functions. Pathway analysis demonstrated that the most highly enriched pathway is the Toll-like receptor signaling pathway. Finally, through gene network analysis, we prioritized 48 genes using the hub gene method.
CONCLUSIONS: Our study provides a valuable resource for the in-depth understanding of the mechanism underlying atherosclerosis.

PMID: 27251057 [PubMed - in process]

Categories: Literature Watch

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

Thu, 2016-06-02 08:35
Related Articles

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

J Biomed Inform. 2015 Aug;56:318-32

Authors: Oronoz M, Gojenola K, Pérez A, de Ilarraza AD, Casillas A

Abstract
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.

PMID: 26141794 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Identifying synonymy between relational phrases using word embeddings.

Thu, 2016-06-02 08:35
Related Articles

Identifying synonymy between relational phrases using word embeddings.

J Biomed Inform. 2015 Aug;56:94-102

Authors: Nguyen NT, Miwa M, Tsuruoka Y, Tojo S

Abstract
Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.

PMID: 26004792 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Automatic endpoint detection to support the systematic review process.

Thu, 2016-06-02 08:35
Related Articles

Automatic endpoint detection to support the systematic review process.

J Biomed Inform. 2015 Aug;56:42-56

Authors: Blake C, Lucic A

Abstract
Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically extract three facets - two entities (the agent and object) and the way in which the entities are compared (the endpoint) - from direct comparative sentences in full-text articles. The system does not require a user to predefine entities in advance and thus can be used in domains where entity recognition is difficult or unavailable. As with a systematic review, the tabular summary produced using the automatically extracted facets shows how experimental results differ between studies. Experiments were conducted using a collection of more than 2million sentences from three journals Diabetes, Carcinogenesis and Endocrinology and two machine learning algorithms, support vector machines (SVM) and a general linear model (GLM). F1 and accuracy measures for the SVM and GLM differed by only 0.01 across all three comparison facets in a randomly selected set of test sentences. The system achieved the best performance of 92% for objects, whereas the accuracy for both agent and endpoints was 73%. F1 scores were higher for objects (0.77) than for endpoints (0.51) or agents (0.47). A situated evaluation of Metformin, a drug to treat diabetes, showed system accuracy of 95%, 83% and 79% for the object, endpoint and agent respectively. The situated evaluation had higher F1 scores of 0.88, 0.64 and 0.62 for object, endpoint, and agent respectively. On average, only 5.31% of the sentences in a full-text article are direct comparisons, but the tabular summaries suggest that these sentences provide a rich source of currently underutilized information that can be used to accelerate the systematic review process and identify gaps where future research should be focused.

PMID: 26003938 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

MET network in PubMed: a text-mined network visualization and curation system.

Wed, 2016-06-01 08:22
Related Articles

MET network in PubMed: a text-mined network visualization and curation system.

Database (Oxford). 2016;2016

Authors: Dai HJ, Su CH, Lai PT, Huang MS, Jonnagaddala J, Rose Jue T, Rao S, Chou HJ, Milacic M, Singh O, Syed-Abdul S, Hsu WL

Abstract
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway.

PMID: 27242035 [PubMed - in process]

Categories: Literature Watch

Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.

Tue, 2016-05-31 08:07
Related Articles

Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.

J Proteomics. 2015 Aug 3;126:163-71

Authors: Núñez de Villavicencio-Díaz T, Ramos Gómez Y, Oliva Argüelles B, Fernández Masso JR, Rodríguez-Ulloa A, Cruz García Y, Guirola-Cruz O, Perez-Riverol Y, Javier González L, Tiscornia I, Victoria S, Bollati-Fogolín M, Besada Pérez V, Guerra Vallespi M

Abstract
The second generation peptide CIGB-552 has a pro-apoptotic effect on H460 non-small cell lung cancer cells and displays a potent cytotoxic effect in HT-29 colon adenocarcinoma cells though its action mechanism is ill defined. Here, we present the first proteomic study of peptide effect in HT-29 cells using subcellular fractionation, protein and peptide fractionation by DF-PAGE and LC-MS/MS peptide identification. In particular, we explored the nuclear proteome of HT-29 cells at a 5h treatment identifying a total of 68 differentially modulated proteins, 49 of which localize to the nucleus. The differentially modulated proteins were analyzed following a system biology approach. Results pointed to a modulation of apoptosis, oxidative damage removal, NF-κB activation, inflammatory signaling and of cell adhesion and motility. Further Western blot and flow-cytometry experiments confirmed both pro-apoptotic and anti-inflammatory effects of CIGB-552 peptide in HT-29 cells.

PMID: 26013411 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.

Fri, 2016-05-27 07:02
Related Articles

Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.

JMIR Public Health Surveill. 2016 Jan-Jun;2(1):e17

Authors: Xu S, Markson C, Costello KL, Xing CY, Demissie K, Llanos AA

Abstract
BACKGROUND: As social media becomes increasingly popular online venues for engaging in communication about public health issues, it is important to understand how users promote knowledge and awareness about specific topics.
OBJECTIVE: The aim of this study is to examine the frequency of discussion and differences by race and ethnicity of cancer-related topics among unique users via Twitter.
METHODS: Tweets were collected from April 1, 2014 through January 21, 2015 using the Twitter public streaming Application Programming Interface (API) to collect 1% of public tweets. Twitter users were classified into racial and ethnic groups using a new text mining approach applied to English-only tweets. Each ethnic group was then analyzed for frequency in cancer-related terms within user timelines, investigated for changes over time and across groups, and measured for statistical significance.
RESULTS: Observable usage patterns of the terms "cancer", "breast cancer", "prostate cancer", and "lung cancer" between Caucasian and African American groups were evident across the study period. We observed some variation in the frequency of term usage during months known to be labeled as cancer awareness months, particularly September, October, and November. Interestingly, we found that of the terms studied, "colorectal cancer" received the least Twitter attention.
CONCLUSIONS: The findings of the study provide evidence that social media can serve as a very powerful and important tool in implementing and disseminating critical prevention, screening, and treatment messages to the community in real-time. The study also introduced and tested a new methodology of identifying race and ethnicity among users of the social media. Study findings highlight the potential benefits of social media as a tool in reducing racial and ethnic disparities.

PMID: 27227152 [PubMed]

Categories: Literature Watch

Bioinformatic Studies to Predict MicroRNAs with the Potential of Uncoupling RECK Expression from Epithelial-Mesenchymal Transition in Cancer Cells.

Fri, 2016-05-27 07:02
Related Articles

Bioinformatic Studies to Predict MicroRNAs with the Potential of Uncoupling RECK Expression from Epithelial-Mesenchymal Transition in Cancer Cells.

Cancer Inform. 2016;15:91-102

Authors: Wang Z, Murakami R, Yuki K, Yoshida Y, Noda M

Abstract
RECK is downregulated in many tumors, and forced RECK expression in tumor cells often results in suppression of malignant phenotypes. Recent findings suggest that RECK is upregulated after epithelial-mesenchymal transition (EMT) in normal epithelium-derived cells but not in cancer cells. Since several microRNAs (miRs) are known to target RECK mRNA, we hypothesized that certain miR(s) may be involved in this suppression of RECK upregulation after EMT in cancer cells. To test this hypothesis, we used three approaches: (1) text mining to find miRs relevant to EMT in cancer cells, (2) predicting miR targets using four algorithms, and (3) comparing miR-seq data and RECK mRNA data using a novel non-parametric method. These approaches identified the miR-183-96-182 cluster as a strong candidate. We also looked for transcription factors and signaling molecules that may promote cancer EMT, miR-183-96-182 upregulation, and RECK downregulation. Here we describe our methods, findings, and a testable hypothesis on how RECK expression could be regulated in cancer cells after EMT.

PMID: 27226706 [PubMed]

Categories: Literature Watch

SWIFT-Review: a text-mining workbench for systematic review.

Wed, 2016-05-25 06:35

SWIFT-Review: a text-mining workbench for systematic review.

Syst Rev. 2016;5(1):87

Authors: Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, Macleod M, Shah RR, Thayer K

Abstract
BACKGROUND: There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus.
METHODS: Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics.
RESULTS: Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation.
CONCLUSIONS: Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.

PMID: 27216467 [PubMed - in process]

Categories: Literature Watch

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

Wed, 2016-05-25 06:35

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

J Biomed Semantics. 2016;7(1):9

Authors: Gupta S, Ross KE, Tudor CO, Wu CH, Schmidt CJ, Vijay-Shanker K

Abstract
BACKGROUND: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation.
METHODS: We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence.
RESULTS: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list.
CONCLUSIONS: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."

PMID: 27216254 [PubMed - in process]

Categories: Literature Watch

Improving Biochemical Named Entity Recognition Performance Using PSO Classifier Selection and Bayesian Combination Method.

Tue, 2016-05-24 06:17

Improving Biochemical Named Entity Recognition Performance Using PSO Classifier Selection and Bayesian Combination Method.

IEEE/ACM Trans Comput Biol Bioinform. 2016 May 18;

Authors: Akkasi A, Varoglu E

Abstract
Named Entity Recognition (NER) is a basic step for large number of consequent text mining tasks in the biochemical domain. Increasing the performance of such recognition systems is of high importance and always poses a challenge. In this study, a new community based decision making system is proposed which aims at increasing the efficiency of NER systems in the chemical/drug name context. Particle Swarm Optimization (PSO) algorithm is chosen as the expert selection strategy along with the Bayesian combination method to merge the outputs of the selected classifiers as well as evaluate the fitness of the selected candidates. The proposed system performs in two steps. The first step is focuses on creating various numbers of baseline classifiers for NER with different features sets using the Conditional Random Fields (CRFs). The second step involves the selection and efficient combination of the classifiers using PSO and Bayesisan combination. Two comprehensive corpora from BioCreative events, namely ChemDNER and CEMP, are used for the experiments conducted. Results show that the ensemble of classifiers selected by means of the proposed approach perform better than the single best classifier as well as ensembles formed using other popular selection/combination strategies for both corpora. Furthermore, the proposed method outperforms the best performing system at the Biocreative IV ChemDNER track by achieving an F-score of 87.95%.

PMID: 27214909 [PubMed - as supplied by publisher]

Categories: Literature Watch

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

Tue, 2016-05-24 06:17

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

J Biomed Discov Collab. 2016;7:e1

Authors: Smalheiser NR, Bonifield G

Abstract
In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same individual (using the 2009 Author-ity author name disambiguation dataset as a gold standard). The two metrics are only modestly correlated with each other (r = 0.50), indicating that they capture different aspects of term usage. The article-based metric provides a measure of semantic relatedness, and MeSH term pairs that co-occur more often than expected by chance may reflect relations between the two terms. In contrast, the author metric is indicative of how individuals practice science, and may have value for author name disambiguation and studies of scientific discovery. We have calculated article metrics for all MeSH terms appearing in at least 25 articles in MEDLINE (as of 2014) and author metrics for MeSH terms published as of 2009. The dataset is freely available for download and can be queried at http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html. Handling editor: Elizabeth Workman, MLIS, PhD.

PMID: 27213780 [PubMed - as supplied by publisher]

Categories: Literature Watch

Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

Sun, 2016-05-22 08:47
Related Articles

Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

Comput Biol Med. 2016 Apr 20;73:173-185

Authors: Wagner M, Vicinus B, Muthra ST, Richards TA, Linder R, Frick VO, Groh A, Rubie C, Weichert F

Abstract
BACKGROUND: The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining.
METHODS: A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features.
RESULTS: The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable.
CONCLUSIONS: The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable.

PMID: 27208610 [PubMed - as supplied by publisher]

Categories: Literature Watch

Text Mining of Journal Articles for Sleep Disorder Terminologies.

Sat, 2016-05-21 08:32
Related Articles

Text Mining of Journal Articles for Sleep Disorder Terminologies.

PLoS One. 2016;11(5):e0156031

Authors: Lam C, Lai FC, Wang CH, Lai MH, Hsu N, Chung MH

Abstract
OBJECTIVE: Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings.
METHODS: SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms.
RESULTS: MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms.
CONCLUSION: Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.

PMID: 27203858 [PubMed - as supplied by publisher]

Categories: Literature Watch

EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews

Fri, 2016-05-20 08:19

EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews

Book. 2016 04

Authors: Paynter R, Bañez LL, Berliner E, Erinoff E, Lege-Matsuura J, Potter S, Uhl S

Abstract
OBJECTIVE: This project's goal was to provide a preliminary sketch of the use of text-mining tools as an emerging methodology within a number of systematic review processes. We sought to provide information addressing pressing questions individuals and organizations face when considering utilizing text-mining tools.
METHODS: We searched the literature to identify and summarize research on the use of text-mining tools within the systematic review context. We conducted telephone interviews with Key Informants (KIs; n=8) using a semi-structured instrument and subsequent qualitative analysis to explore issues surrounding the implementation and use of text-mining tools. Lastly, we compiled a list of text-mining tools to support systematic review methods and evaluated the tools using an informal descriptive appraisal tool.
RESULTS: The literature review identified 122 articles that met inclusion criteria, including two recent systematic reviews on the use of text-mining tools in the screening and data abstraction steps of systematic reviews. In addition to these two steps, a preliminary exploration of the literature on searching and other less-studied steps are presented. Support for the use of text-mining was strong amongst the KIs overall, though most KIs noted some performance caveats and/or areas in which further research is necessary. We evaluated 111 text-mining tools identified from the literature review and KI interviews.
CONCLUSIONS: Text-mining tools are currently being used within several systematic review organizations for a variety of review processes (e.g., searching, screening abstracts), and the published evidence-base is growing fairly rapidly in breadth and levels of evidence. Several outstanding questions remain for future empirical research to address regarding the reliability and validity of using these emerging technologies across a variety of review processes and whether these generalize across the scope of review topics. Guidance on reporting the use of these tools would be useful.


PMID: 27195359

Categories: Literature Watch

Argo: enabling the development of bespoke workflows and services for disease annotation.

Wed, 2016-05-18 16:52

Argo: enabling the development of bespoke workflows and services for disease annotation.

Database (Oxford). 2016;2016

Authors: Batista-Navarro R, Carter J, Ananiadou S

Abstract
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo's potential as an enabling technology for curating disease and phenotypic information from literature.Database URL: http://argo.nactem.ac.uk.

PMID: 27189607 [PubMed - as supplied by publisher]

Categories: Literature Watch

Identifying a biomarker network for corticosteroid resistance in asthma from bronchoalveolar lavage samples.

Wed, 2016-05-18 16:52

Identifying a biomarker network for corticosteroid resistance in asthma from bronchoalveolar lavage samples.

Mol Biol Rep. 2016 May 17;

Authors: Vargas JE, Porto BN, Puga R, Stein RT, Pitrez PM

Abstract
Corticosteroid resistance (CR) is a major barrier to the effective treatment of severe asthma. Hence, a better understanding of the molecular mechanisms involved in this condition is a priority. Network analysis is an emerging strategy to explore this complex heterogeneous disorder at system level to identify a small own network for CR in asthma. Gene expression profile of GSE7368 from bronchoalveolar lavage (BAL) of CR in subjects with asthma was downloaded from the gene expression omnibus (GEO) database and compared to BAL of corticosteroid-sensitive (CS) patients. DEGs were identified by the Limma package in R language. In addition, DEGs were mapped to STRING to acquire protein-protein interaction (PPI) pairs. Topological properties of PPI network were calculated by Centiscape, ClusterOne and BINGO. Subsequently, text-mining tools were applied to design one own cell signalling for CR in asthma. Thirty-five PPI networks were obtained; including a major network consisted of 370 nodes, connected by 777 edges. After topological analysis, a minor PPI network composed by 48 nodes was indentified, which is composed by most relevant nodes of major PPI network. In this subnetwork, several receptors (EGFR, EGR1, ESR2, PGR), transcription factors (MYC, JAK), cytokines (IL8, IL6, IL1B), one chemokine (CXCL1), one kinase (SRC) and one cyclooxygenase (PTGS2) were described to be associated with inflammatory environment and steroid resistance in asthma. We suggest a biomarker network composed by 48 nodes that could be potentially explored with diagnostic or therapeutic use.

PMID: 27188427 [PubMed - as supplied by publisher]

Categories: Literature Watch

On the unsupervised analysis of domain-specific Chinese texts.

Wed, 2016-05-18 16:52
Related Articles

On the unsupervised analysis of domain-specific Chinese texts.

Proc Natl Acad Sci U S A. 2016 May 16;

Authors: Deng K, Bol PK, Li KJ, Liu JS

Abstract
With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.

PMID: 27185919 [PubMed - as supplied by publisher]

Categories: Literature Watch

Exploring mechanisms of Panax notoginseng saponins in treating coronary heart disease by integrating gene interaction network and functional enrichment analysis.

Wed, 2016-05-18 16:52
Related Articles

Exploring mechanisms of Panax notoginseng saponins in treating coronary heart disease by integrating gene interaction network and functional enrichment analysis.

Chin J Integr Med. 2016 May 16;

Authors: Yu G, Wang J

Abstract
OBJECTIVE: To investigate the mechanisms of Panax notoginseng saponins (PNS) in treating coronary heart disease (CHD) by integrating gene interaction network and functional enrichment analysis.
METHODS: Text mining was used to get CHD and PNS associated genes. Gene-gene interaction networks of CHD and PNS were built by the GeneMANIA Cytoscape plugin. Advanced Network Merge Cytoscape plugin was used to analyze the two networks. Their functions were analyzed by gene functional enrichment analysis via DAVID Bioinformatics. Joint subnetwork of CHD network and PNS network was identified by network analysis.
RESULTS: The 11 genes of the joint subnetwork were the direct targets of PNS in CHD network and enriched in cytokine-cytokine receptor interaction pathway. PNS could affect other 85 genes by the gene-gene interaction of joint subnetwork and these genes were enriched in other 7 pathways. The direct mechanisms of PNS in treating CHD by targeting cytokines to relieve the inflammation and the indirect mechanisms of PNS in treating CHD by affecting other 7 pathways through the interaction of joint subnetwork of PNS and CHD network. The genes in the 7 pathways could be potential targets for the immunologic adjuvant, anticoagulant, hypolipidemic, anti-platelet and anti-hypertrophic activities of PNS.
CONCLUSION: The key mechanisms of PNS in treating CHD could be anticoagulant and hypolipidemic which are indicated by analyzing biological functions of hubs in the merged network.

PMID: 27184904 [PubMed - as supplied by publisher]

Categories: Literature Watch

Pages