Drug-induced Adverse Events

MET network in PubMed: a text-mined network visualization and curation system.

Wed, 2016-06-01 08:22
Related Articles

MET network in PubMed: a text-mined network visualization and curation system.

Database (Oxford). 2016;2016

Authors: Dai HJ, Su CH, Lai PT, Huang MS, Jonnagaddala J, Rose Jue T, Rao S, Chou HJ, Milacic M, Singh O, Syed-Abdul S, Hsu WL

Abstract
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway.

PMID: 27242035 [PubMed - in process]

Categories: Literature Watch

Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.

Tue, 2016-05-31 08:07
Related Articles

Comparative proteomics analysis of the antitumor effect of CIGB-552 peptide in HT-29 colon adenocarcinoma cells.

J Proteomics. 2015 Aug 3;126:163-71

Authors: Núñez de Villavicencio-Díaz T, Ramos Gómez Y, Oliva Argüelles B, Fernández Masso JR, Rodríguez-Ulloa A, Cruz García Y, Guirola-Cruz O, Perez-Riverol Y, Javier González L, Tiscornia I, Victoria S, Bollati-Fogolín M, Besada Pérez V, Guerra Vallespi M

Abstract
The second generation peptide CIGB-552 has a pro-apoptotic effect on H460 non-small cell lung cancer cells and displays a potent cytotoxic effect in HT-29 colon adenocarcinoma cells though its action mechanism is ill defined. Here, we present the first proteomic study of peptide effect in HT-29 cells using subcellular fractionation, protein and peptide fractionation by DF-PAGE and LC-MS/MS peptide identification. In particular, we explored the nuclear proteome of HT-29 cells at a 5h treatment identifying a total of 68 differentially modulated proteins, 49 of which localize to the nucleus. The differentially modulated proteins were analyzed following a system biology approach. Results pointed to a modulation of apoptosis, oxidative damage removal, NF-κB activation, inflammatory signaling and of cell adhesion and motility. Further Western blot and flow-cytometry experiments confirmed both pro-apoptotic and anti-inflammatory effects of CIGB-552 peptide in HT-29 cells.

PMID: 26013411 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.

Fri, 2016-05-27 07:02
Related Articles

Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter.

JMIR Public Health Surveill. 2016 Jan-Jun;2(1):e17

Authors: Xu S, Markson C, Costello KL, Xing CY, Demissie K, Llanos AA

Abstract
BACKGROUND: As social media becomes increasingly popular online venues for engaging in communication about public health issues, it is important to understand how users promote knowledge and awareness about specific topics.
OBJECTIVE: The aim of this study is to examine the frequency of discussion and differences by race and ethnicity of cancer-related topics among unique users via Twitter.
METHODS: Tweets were collected from April 1, 2014 through January 21, 2015 using the Twitter public streaming Application Programming Interface (API) to collect 1% of public tweets. Twitter users were classified into racial and ethnic groups using a new text mining approach applied to English-only tweets. Each ethnic group was then analyzed for frequency in cancer-related terms within user timelines, investigated for changes over time and across groups, and measured for statistical significance.
RESULTS: Observable usage patterns of the terms "cancer", "breast cancer", "prostate cancer", and "lung cancer" between Caucasian and African American groups were evident across the study period. We observed some variation in the frequency of term usage during months known to be labeled as cancer awareness months, particularly September, October, and November. Interestingly, we found that of the terms studied, "colorectal cancer" received the least Twitter attention.
CONCLUSIONS: The findings of the study provide evidence that social media can serve as a very powerful and important tool in implementing and disseminating critical prevention, screening, and treatment messages to the community in real-time. The study also introduced and tested a new methodology of identifying race and ethnicity among users of the social media. Study findings highlight the potential benefits of social media as a tool in reducing racial and ethnic disparities.

PMID: 27227152 [PubMed]

Categories: Literature Watch

Bioinformatic Studies to Predict MicroRNAs with the Potential of Uncoupling RECK Expression from Epithelial-Mesenchymal Transition in Cancer Cells.

Fri, 2016-05-27 07:02
Related Articles

Bioinformatic Studies to Predict MicroRNAs with the Potential of Uncoupling RECK Expression from Epithelial-Mesenchymal Transition in Cancer Cells.

Cancer Inform. 2016;15:91-102

Authors: Wang Z, Murakami R, Yuki K, Yoshida Y, Noda M

Abstract
RECK is downregulated in many tumors, and forced RECK expression in tumor cells often results in suppression of malignant phenotypes. Recent findings suggest that RECK is upregulated after epithelial-mesenchymal transition (EMT) in normal epithelium-derived cells but not in cancer cells. Since several microRNAs (miRs) are known to target RECK mRNA, we hypothesized that certain miR(s) may be involved in this suppression of RECK upregulation after EMT in cancer cells. To test this hypothesis, we used three approaches: (1) text mining to find miRs relevant to EMT in cancer cells, (2) predicting miR targets using four algorithms, and (3) comparing miR-seq data and RECK mRNA data using a novel non-parametric method. These approaches identified the miR-183-96-182 cluster as a strong candidate. We also looked for transcription factors and signaling molecules that may promote cancer EMT, miR-183-96-182 upregulation, and RECK downregulation. Here we describe our methods, findings, and a testable hypothesis on how RECK expression could be regulated in cancer cells after EMT.

PMID: 27226706 [PubMed]

Categories: Literature Watch

SWIFT-Review: a text-mining workbench for systematic review.

Wed, 2016-05-25 06:35

SWIFT-Review: a text-mining workbench for systematic review.

Syst Rev. 2016;5(1):87

Authors: Howard BE, Phillips J, Miller K, Tandon A, Mav D, Shah MR, Holmgren S, Pelch KE, Walker V, Rooney AA, Macleod M, Shah RR, Thayer K

Abstract
BACKGROUND: There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus.
METHODS: Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics.
RESULTS: Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation.
CONCLUSIONS: Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.

PMID: 27216467 [PubMed - in process]

Categories: Literature Watch

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

Wed, 2016-05-25 06:35

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

J Biomed Semantics. 2016;7(1):9

Authors: Gupta S, Ross KE, Tudor CO, Wu CH, Schmidt CJ, Vijay-Shanker K

Abstract
BACKGROUND: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation.
METHODS: We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence.
RESULTS: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list.
CONCLUSIONS: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."

PMID: 27216254 [PubMed - in process]

Categories: Literature Watch

Improving Biochemical Named Entity Recognition Performance Using PSO Classifier Selection and Bayesian Combination Method.

Tue, 2016-05-24 06:17

Improving Biochemical Named Entity Recognition Performance Using PSO Classifier Selection and Bayesian Combination Method.

IEEE/ACM Trans Comput Biol Bioinform. 2016 May 18;

Authors: Akkasi A, Varoglu E

Abstract
Named Entity Recognition (NER) is a basic step for large number of consequent text mining tasks in the biochemical domain. Increasing the performance of such recognition systems is of high importance and always poses a challenge. In this study, a new community based decision making system is proposed which aims at increasing the efficiency of NER systems in the chemical/drug name context. Particle Swarm Optimization (PSO) algorithm is chosen as the expert selection strategy along with the Bayesian combination method to merge the outputs of the selected classifiers as well as evaluate the fitness of the selected candidates. The proposed system performs in two steps. The first step is focuses on creating various numbers of baseline classifiers for NER with different features sets using the Conditional Random Fields (CRFs). The second step involves the selection and efficient combination of the classifiers using PSO and Bayesisan combination. Two comprehensive corpora from BioCreative events, namely ChemDNER and CEMP, are used for the experiments conducted. Results show that the ensemble of classifiers selected by means of the proposed approach perform better than the single best classifier as well as ensembles formed using other popular selection/combination strategies for both corpora. Furthermore, the proposed method outperforms the best performing system at the Biocreative IV ChemDNER track by achieving an F-score of 87.95%.

PMID: 27214909 [PubMed - as supplied by publisher]

Categories: Literature Watch

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

Tue, 2016-05-24 06:17

Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

J Biomed Discov Collab. 2016;7:e1

Authors: Smalheiser NR, Bonifield G

Abstract
In the present paper, we have created and characterized several similarity metrics for relating any two Medical Subject Headings (MeSH terms) to each other. The article-based metric measures the tendency of two MeSH terms to appear in the MEDLINE record of the same article. The author-based metric measures the tendency of two MeSH terms to appear in the body of articles written by the same individual (using the 2009 Author-ity author name disambiguation dataset as a gold standard). The two metrics are only modestly correlated with each other (r = 0.50), indicating that they capture different aspects of term usage. The article-based metric provides a measure of semantic relatedness, and MeSH term pairs that co-occur more often than expected by chance may reflect relations between the two terms. In contrast, the author metric is indicative of how individuals practice science, and may have value for author name disambiguation and studies of scientific discovery. We have calculated article metrics for all MeSH terms appearing in at least 25 articles in MEDLINE (as of 2014) and author metrics for MeSH terms published as of 2009. The dataset is freely available for download and can be queried at http://arrowsmith.psych.uic.edu/arrowsmith_uic/mesh_pair_metrics.html. Handling editor: Elizabeth Workman, MLIS, PhD.

PMID: 27213780 [PubMed - as supplied by publisher]

Categories: Literature Watch

Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

Sun, 2016-05-22 08:47
Related Articles

Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

Comput Biol Med. 2016 Apr 20;73:173-185

Authors: Wagner M, Vicinus B, Muthra ST, Richards TA, Linder R, Frick VO, Groh A, Rubie C, Weichert F

Abstract
BACKGROUND: The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining.
METHODS: A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features.
RESULTS: The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable.
CONCLUSIONS: The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable.

PMID: 27208610 [PubMed - as supplied by publisher]

Categories: Literature Watch

Text Mining of Journal Articles for Sleep Disorder Terminologies.

Sat, 2016-05-21 08:32
Related Articles

Text Mining of Journal Articles for Sleep Disorder Terminologies.

PLoS One. 2016;11(5):e0156031

Authors: Lam C, Lai FC, Wang CH, Lai MH, Hsu N, Chung MH

Abstract
OBJECTIVE: Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings.
METHODS: SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms.
RESULTS: MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms.
CONCLUSION: Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.

PMID: 27203858 [PubMed - as supplied by publisher]

Categories: Literature Watch

EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews

Fri, 2016-05-20 08:19

EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews

Book. 2016 04

Authors: Paynter R, Bañez LL, Berliner E, Erinoff E, Lege-Matsuura J, Potter S, Uhl S

Abstract
OBJECTIVE: This project's goal was to provide a preliminary sketch of the use of text-mining tools as an emerging methodology within a number of systematic review processes. We sought to provide information addressing pressing questions individuals and organizations face when considering utilizing text-mining tools.
METHODS: We searched the literature to identify and summarize research on the use of text-mining tools within the systematic review context. We conducted telephone interviews with Key Informants (KIs; n=8) using a semi-structured instrument and subsequent qualitative analysis to explore issues surrounding the implementation and use of text-mining tools. Lastly, we compiled a list of text-mining tools to support systematic review methods and evaluated the tools using an informal descriptive appraisal tool.
RESULTS: The literature review identified 122 articles that met inclusion criteria, including two recent systematic reviews on the use of text-mining tools in the screening and data abstraction steps of systematic reviews. In addition to these two steps, a preliminary exploration of the literature on searching and other less-studied steps are presented. Support for the use of text-mining was strong amongst the KIs overall, though most KIs noted some performance caveats and/or areas in which further research is necessary. We evaluated 111 text-mining tools identified from the literature review and KI interviews.
CONCLUSIONS: Text-mining tools are currently being used within several systematic review organizations for a variety of review processes (e.g., searching, screening abstracts), and the published evidence-base is growing fairly rapidly in breadth and levels of evidence. Several outstanding questions remain for future empirical research to address regarding the reliability and validity of using these emerging technologies across a variety of review processes and whether these generalize across the scope of review topics. Guidance on reporting the use of these tools would be useful.


PMID: 27195359

Categories: Literature Watch

Argo: enabling the development of bespoke workflows and services for disease annotation.

Wed, 2016-05-18 16:52

Argo: enabling the development of bespoke workflows and services for disease annotation.

Database (Oxford). 2016;2016

Authors: Batista-Navarro R, Carter J, Ananiadou S

Abstract
Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo's potential as an enabling technology for curating disease and phenotypic information from literature.Database URL: http://argo.nactem.ac.uk.

PMID: 27189607 [PubMed - as supplied by publisher]

Categories: Literature Watch

Identifying a biomarker network for corticosteroid resistance in asthma from bronchoalveolar lavage samples.

Wed, 2016-05-18 16:52

Identifying a biomarker network for corticosteroid resistance in asthma from bronchoalveolar lavage samples.

Mol Biol Rep. 2016 May 17;

Authors: Vargas JE, Porto BN, Puga R, Stein RT, Pitrez PM

Abstract
Corticosteroid resistance (CR) is a major barrier to the effective treatment of severe asthma. Hence, a better understanding of the molecular mechanisms involved in this condition is a priority. Network analysis is an emerging strategy to explore this complex heterogeneous disorder at system level to identify a small own network for CR in asthma. Gene expression profile of GSE7368 from bronchoalveolar lavage (BAL) of CR in subjects with asthma was downloaded from the gene expression omnibus (GEO) database and compared to BAL of corticosteroid-sensitive (CS) patients. DEGs were identified by the Limma package in R language. In addition, DEGs were mapped to STRING to acquire protein-protein interaction (PPI) pairs. Topological properties of PPI network were calculated by Centiscape, ClusterOne and BINGO. Subsequently, text-mining tools were applied to design one own cell signalling for CR in asthma. Thirty-five PPI networks were obtained; including a major network consisted of 370 nodes, connected by 777 edges. After topological analysis, a minor PPI network composed by 48 nodes was indentified, which is composed by most relevant nodes of major PPI network. In this subnetwork, several receptors (EGFR, EGR1, ESR2, PGR), transcription factors (MYC, JAK), cytokines (IL8, IL6, IL1B), one chemokine (CXCL1), one kinase (SRC) and one cyclooxygenase (PTGS2) were described to be associated with inflammatory environment and steroid resistance in asthma. We suggest a biomarker network composed by 48 nodes that could be potentially explored with diagnostic or therapeutic use.

PMID: 27188427 [PubMed - as supplied by publisher]

Categories: Literature Watch

On the unsupervised analysis of domain-specific Chinese texts.

Wed, 2016-05-18 16:52
Related Articles

On the unsupervised analysis of domain-specific Chinese texts.

Proc Natl Acad Sci U S A. 2016 May 16;

Authors: Deng K, Bol PK, Li KJ, Liu JS

Abstract
With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.

PMID: 27185919 [PubMed - as supplied by publisher]

Categories: Literature Watch

Exploring mechanisms of Panax notoginseng saponins in treating coronary heart disease by integrating gene interaction network and functional enrichment analysis.

Wed, 2016-05-18 16:52
Related Articles

Exploring mechanisms of Panax notoginseng saponins in treating coronary heart disease by integrating gene interaction network and functional enrichment analysis.

Chin J Integr Med. 2016 May 16;

Authors: Yu G, Wang J

Abstract
OBJECTIVE: To investigate the mechanisms of Panax notoginseng saponins (PNS) in treating coronary heart disease (CHD) by integrating gene interaction network and functional enrichment analysis.
METHODS: Text mining was used to get CHD and PNS associated genes. Gene-gene interaction networks of CHD and PNS were built by the GeneMANIA Cytoscape plugin. Advanced Network Merge Cytoscape plugin was used to analyze the two networks. Their functions were analyzed by gene functional enrichment analysis via DAVID Bioinformatics. Joint subnetwork of CHD network and PNS network was identified by network analysis.
RESULTS: The 11 genes of the joint subnetwork were the direct targets of PNS in CHD network and enriched in cytokine-cytokine receptor interaction pathway. PNS could affect other 85 genes by the gene-gene interaction of joint subnetwork and these genes were enriched in other 7 pathways. The direct mechanisms of PNS in treating CHD by targeting cytokines to relieve the inflammation and the indirect mechanisms of PNS in treating CHD by affecting other 7 pathways through the interaction of joint subnetwork of PNS and CHD network. The genes in the 7 pathways could be potential targets for the immunologic adjuvant, anticoagulant, hypolipidemic, anti-platelet and anti-hypertrophic activities of PNS.
CONCLUSION: The key mechanisms of PNS in treating CHD could be anticoagulant and hypolipidemic which are indicated by analyzing biological functions of hubs in the merged network.

PMID: 27184904 [PubMed - as supplied by publisher]

Categories: Literature Watch

Celastrol targets IRAKs to block Toll-like receptor 4-mediated nuclear factor-κB activation.

Wed, 2016-05-18 16:52
Related Articles

Celastrol targets IRAKs to block Toll-like receptor 4-mediated nuclear factor-κB activation.

J Integr Med. 2016 May;14(3):203-8

Authors: Shen YF, Zhang X, Wang Y, Cao FF, Uzan G, Peng B, Zhang DH

Abstract
OBJECTIVE: Celastrol has been established as a nuclear factor-κB (NF-κB) activation inhibitor; however, the exact mechanism behind this action is still unknown. Using text-mining technology, the authors predicted that interleukin-1 receptor-associated kinases (IRAKs) are potential celastrol targets, and hypothesized that targeting IRAKs might be one way that celastrol inhibits NF-κB. This is because IRAKs are key molecules for some crucial pathways to activate NF-κB (e.g., the interleukin-1 receptor (IL-1R)/Toll-like receptor (TLR) superfamily).
METHODS: The human hepatocellular cell line (HepG2) treated with palmitic acid (PA) was used as a model for stimulating TLR4/NF-κB activation, in order to observe the potential effects of celastrol in IRAK regulation and NF-κB inhibition. The transfection of small interfering RNA was used for down-regulating TLR4, IRAK1 and IRAK4, and the Western blot method was used to detect changes in the protein expressions.
RESULTS: The results showed that celastrol could effectively inhibit PA-caused TLR4-dependent NF-κB activation in the HepG2 cells; PA also activated IRAKs, which were inhibited by celastrol. Knocking down IRAKs abolished PA-caused NF-κB activation.
CONCLUSION: The results for the first time show that targeting IRAKs is one way in which celastrol inhibits NF-κB activation.

PMID: 27181127 [PubMed - in process]

Categories: Literature Watch

Text mining patents for biomedical knowledge.

Wed, 2016-05-18 16:52
Related Articles

Text mining patents for biomedical knowledge.

Drug Discov Today. 2016 May 11;

Authors: Rodriguez-Esteban R, Bundschus M

Abstract
Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery.

PMID: 27179985 [PubMed - as supplied by publisher]

Categories: Literature Watch

A novel procedure on next generation sequencing data analysis using text mining algorithm.

Sun, 2016-05-15 07:12

A novel procedure on next generation sequencing data analysis using text mining algorithm.

BMC Bioinformatics. 2016;17(1):213

Authors: Zhao W, Chen JJ, Perkins R, Wang Y, Liu Z, Hong H, Tong W, Zou W

Abstract
BACKGROUND: Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining.
METHODS: We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure.
RESULTS: The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.
CONCLUSION: The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

PMID: 27177941 [PubMed - as supplied by publisher]

Categories: Literature Watch

Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

Sat, 2016-05-14 06:56

Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

J Biomed Semantics. 2016;7:27

Authors: Mehryary F, Kaewphan S, Hakala K, Ginter F

Abstract
BACKGROUND: Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. In this paper we propose a novel approach for filtering falsely identified triggers from large-scale event databases, thus improving the quality of knowledge extraction.
METHODS: Our method relies on state-of-the-art word embeddings, event statistics gathered from the whole biomedical literature, and both supervised and unsupervised machine learning techniques. We focus on EVEX, an event database covering the whole PubMed and PubMed Central Open Access literature containing more than 40 million extracted events. The top most frequent EVEX trigger words are hierarchically clustered, and the resulting cluster tree is pruned to identify words that can never act as triggers regardless of their context. For rarely occurring trigger words we introduce a supervised approach trained on the combination of trigger word classification produced by the unsupervised clustering method and manual annotation.
RESULTS: The method is evaluated on the official test set of BioNLP Shared Task on Event Extraction. The evaluation shows that the method can be used to improve the performance of the state-of-the-art event extraction systems. This successful effort also translates into removing 1,338,075 of potentially incorrect events from EVEX, thus greatly improving the quality of the data. The method is not solely bound to the EVEX resource and can be thus used to improve the quality of any event extraction system or database.
AVAILABILITY: The data and source code for this work are available at: http://bionlp-www.utu.fi/trigger-clustering/.

PMID: 27175227 [PubMed - as supplied by publisher]

Categories: Literature Watch

Mining chemical patents with an ensemble of open systems.

Sat, 2016-05-14 06:56

Mining chemical patents with an ensemble of open systems.

Database (Oxford). 2016;2016

Authors: Leaman R, Wei CH, Zou C, Lu Z

Abstract
The significant amount of medicinal chemistry information contained in patents makes them an attractive target for text mining. In this manuscript, we describe systems for named entity recognition (NER) of chemicals and genes/proteins in patents, using the CEMP (for chemicals) and GPRO (for genes/proteins) corpora provided by the CHEMDNER task at BioCreative V. Our chemical NER system is an ensemble of five open systems, including both versions of tmChem, our previous work on chemical NER. Their output is combined using a machine learning classification approach. Our chemical NER system obtained 0.8752 precision and 0.9129 recall, for 0.8937 f-score on the CEMP task. Our gene/protein NER system is an extension of our previous work for gene and protein NER, GNormPlus. This system obtained a performance of 0.8143 precision and 0.8141 recall, for 0.8137 f-score on the GPRO task. Both systems achieved the highest performance in their respective tasks at BioCreative V. We conclude that an ensemble of independently-created open systems is sufficiently diverse to significantly improve performance over any individual system, even when they use a similar approach.Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/.

PMID: 27173521 [PubMed - as supplied by publisher]

Categories: Literature Watch

Pages