Drug-induced Adverse Events

[Studies Using Text Mining on the Differences in Learning Effects between the KJ and World Café Method as Learning Strategies].

Tue, 2016-09-13 08:00
Related Articles

[Studies Using Text Mining on the Differences in Learning Effects between the KJ and World Café Method as Learning Strategies].

Yakugaku Zasshi. 2015;135(5):753-9

Authors: Yasuhara T, Sone T, Konishi M, Kushihata T, Nishikawa T, Yamamoto Y, Kurio W, Kohno T

Abstract
The KJ method (named for developer Jiro Kawakita; also known as affinity diagramming) is widely used in participatory learning as a means to collect and organize information. In addition, the World Café (WC) has recently become popular. However, differences in the information obtained using each method have not been studied comprehensively. To determine the appropriate information selection criteria, we analyzed differences in the information generated by the WC and KJ methods. Two groups engaged in sessions to collect and organize information using either the WC or KJ method and small group discussions were held to create "proposals to improve first-year education". Both groups answered two pre- and post- session questionnaires that asked for free descriptions. Key words were extracted from the results of the two questionnaires and categorized using text mining. In the responses to questionnaire 1, which was directly related to the session theme, a significant increase in the number of key words was observed in the WC group (p=0.0050, Fisher's exact test). However, there was no significant increase in the number of key words in the responses to questionnaire 2, which was not directly related to the session theme (p=0.8347, Fisher's exact test). In the KJ method, participants extracted the most notable issues and progressed to a detailed discussion, whereas in the WC method, various information and problems were spread among the participants. The choice between the WC and KJ method should be made to reflect the educational objective and desired direction of discussion.

PMID: 25948313 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

Sun, 2016-09-11 07:28

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

J Biomed Semantics. 2016;7(1):52

Authors: Funk CS, Cohen KB, Hunter LE, Verspoor KM

Abstract
BACKGROUND: Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms.
RESULTS: We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations.
CONCLUSIONS: In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

PMID: 27613112 [PubMed - as supplied by publisher]

Categories: Literature Watch

Review and Literature Mining on Proteostasis Factors and Cancer.

Sun, 2016-09-11 07:28

Review and Literature Mining on Proteostasis Factors and Cancer.

Methods Mol Biol. 2016;1449:71-84

Authors: Carvalho AS, Rodríguez MS, Matthiesen R

Abstract
Automatic analysis of increasingly growing literature repositories including data integration to other databases is a powerful tool to propose hypothesis that can be used to plan experiments to validate or disprove the hypothesis. Furthermore, it provides means to evaluate the redundancy of research line in comparison to the published literature. This is potentially beneficial for those developing research in a specific disease which are interested in exploring a particular pathway or set of genes/proteins. In the scope of the integrating book a case will be made addressing proteostasis factors in cancer. The maintenance of proteome homeostasis, known as proteostasis, is a process by which cells regulate protein translation, degradation, subcellular localization, and protein folding and consists of an integrated network of proteins. The ubiquitin-proteasome system plays a key role in essential biological processes such as cell cycle, DNA damage repair, membrane trafficking, and maintaining protein homeostasis. Cells maintain proteostasis by regulating protein translation, degradation, subcellular localization, and protein folding. Aberrant proteostasis leads to loss-of-function diseases (cystic fibrosis) and gain-of-toxic-function diseases (Alzheimer's, Parkinson's, and Huntington's disease). Cancer therapy on the other hand explores inhibition of proteostasis factors to trigger endoplasmic reticulum stress with subsequent apoptosis. Alternatively therapies target deubiquitinases and thereby regulate tumor promoters or suppressors. Furthermore, mutations in specific proteostasis factors are associated with higher risk for specific cancers, e.g., BRCA mutations in breast cancer. This chapter discusses proteostasis protein factors' association with cancer from a literature mining perspective.

PMID: 27613028 [PubMed - in process]

Categories: Literature Watch

Social representation of "music" in young adults: a cross-cultural study.

Sat, 2016-09-10 07:10

Social representation of "music" in young adults: a cross-cultural study.

Int J Audiol. 2016 Sep 9;:1-9

Authors: Manchaiah V, Zhao F, Widén S, Auzenne J, Beukes EW, Ahmadi T, Tomé D, Mahadeva D, Krishna R, Germundsson P

Abstract
OBJECTIVE: This study was aimed to explore perceptions of and reactions to music in young adults (18-25 years) using the theory of social representations (TSR).
DESIGN: The study used a cross-sectional survey design and included participants from India, Iran, Portugal, USA and UK. Data were analysed using various qualitative and quantitative methods.
STUDY SAMPLE: The study sample included 534 young adults.
RESULTS: The Chi-square analysis showed significant differences between the countries regarding the informants' perception of music. The most positive connotations about music were found in the responses obtained from Iranian participants (82.2%), followed by Portuguese participants (80.6%), while the most negative connotations about music were found in the responses obtained from Indian participants (18.2%), followed by Iranian participants (7.3%). The participants' responses fell into 19 main categories based on their meaning; however, not all categories were found in all five countries. The co-occurrence analysis results generally indicate that the category "positive emotions or actions" was the most frequent category occurring in all five countries.
CONCLUSIONS: The results indicate that music is generally considered to bring positive emotions for people within these societies, although a small percentage of responses indicate some negative consequences of music.

PMID: 27609441 [PubMed - as supplied by publisher]

Categories: Literature Watch

Genome-wide protein-protein interactions and protein function exploration in cyanobacteria.

Fri, 2016-09-09 06:52
Related Articles

Genome-wide protein-protein interactions and protein function exploration in cyanobacteria.

Sci Rep. 2015;5:15519

Authors: Lv Q, Ma W, Liu H, Li J, Wang H, Lu F, Zhao C, Shi T

Abstract
Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and "interologs" in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria.

PMID: 26490033 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Automatic classification of communication logs into implementation stages via text analysis.

Thu, 2016-09-08 06:36

Automatic classification of communication logs into implementation stages via text analysis.

Implement Sci. 2016;11(1):119

Authors: Wang D, Ogihara M, Gallo C, Villamar JA, Smith JD, Vermeer W, Cruden G, Benbow N, Brown CH

Abstract
BACKGROUND: To improve the quality, quantity, and speed of implementation, careful monitoring of the implementation process is required. However, some health organizations have such limited capacity to collect, organize, and synthesize information relevant to its decision to implement an evidence-based program, the preparation steps necessary for successful program adoption, the fidelity of program delivery, and the sustainment of this program over time. When a large health system implements an evidence-based program across multiple sites, a trained intermediary or broker may provide such monitoring and feedback, but this task is labor intensive and not easily scaled up for large numbers of sites. We present a novel approach to producing an automated system of monitoring implementation stage entrances and exits based on a computational analysis of communication log notes generated by implementation brokers. Potentially discriminating keywords are identified using the definitions of the stages and experts' coding of a portion of the log notes. A machine learning algorithm produces a decision rule to classify remaining, unclassified log notes.
RESULTS: We applied this procedure to log notes in the implementation trial of multidimensional treatment foster care in the California 40-county implementation trial (CAL-40) project, using the stages of implementation completion (SIC) measure. We found that a semi-supervised non-negative matrix factorization method accurately identified most stage transitions. Another computational model was built for determining the start and the end of each stage.
CONCLUSIONS: This automated system demonstrated feasibility in this proof of concept challenge. We provide suggestions on how such a system can be used to improve the speed, quality, quantity, and sustainment of implementation. The innovative methods presented here are not intended to replace the expertise and judgement of an expert rater already in place. Rather, these can be used when human monitoring and feedback is too expensive to use or maintain. These methods rely on digitized text that already exists or can be collected with minimal to no intrusiveness and can signal when additional attention or remediation is required during implementation. Thus, resources can be allocated according to need rather than universally applied, or worse, not applied at all due to their cost.

PMID: 27600612 [PubMed - in process]

Categories: Literature Watch

Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces.

Wed, 2016-09-07 06:12

Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces.

PLoS One. 2016;11(9):e0162293

Authors: Zhao H, Wang DD, Chen L, Liu X, Yan H

Abstract
Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.

PMID: 27598575 [PubMed - as supplied by publisher]

Categories: Literature Watch

BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.

Sun, 2016-09-04 07:33

BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.

Database (Oxford). 2016;2016

Authors: Kim S, Islamaj Doğan R, Chatr-Aryamontri A, Chang CS, Oughtred R, Rust J, Batista-Navarro R, Carter J, Ananiadou S, Matos S, Santos A, Campos D, Oliveira JL, Singh O, Jonnagaddala J, Dai HJ, Su EC, Chang YC, Su YC, Chu CH, Chen CC, Hsu WL, Peng Y, Arighi C, Wu CH, Vijay-Shanker K, Aydın F, Hüsünbeyi ZM, Özgür A, Shin SY, Kwon D, Dolinski K, Tyers M, Wilbur WJ, Comeau DC

Abstract
BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/.

PMID: 27589962 [PubMed - as supplied by publisher]

Categories: Literature Watch

Overview of the interactive task in BioCreative V.

Sun, 2016-09-04 07:33

Overview of the interactive task in BioCreative V.

Database (Oxford). 2016;2016

Authors: Wang Q, S Abdul S, Almeida L, Ananiadou S, Balderas-Martínez YI, Batista-Navarro R, Campos D, Chilton L, Chou HJ, Contreras G, Cooper L, Dai HJ, Ferrell B, Fluck J, Gama-Castro S, George N, Gkoutos G, Irin AK, Jensen LJ, Jimenez S, Jue TR, Keseler I, Madan S, Matos S, McQuilton P, Milacic M, Mort M, Natarajan J, Pafilis E, Pereira E, Rao S, Rinaldi F, Rothfels K, Salgado D, Silva RM, Singh O, Stefancsik R, Su CH, Subramani S, Tadepally HD, Tsaprouni L, Vasilevsky N, Wang X, Chatr-Aryamontri A, Laulederkind SJ, Matis-Mitchell S, McEntyre J, Orchard S, Pundir S, Rodriguez-Esteban R, Van Auken K, Lu Z, Schaeffer M, Wu CH, Hirschman L, Arighi CN

Abstract
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested.Database URL: http://www.biocreative.org.

PMID: 27589961 [PubMed - as supplied by publisher]

Categories: Literature Watch

Use of large-scale veterinary data for the investigation of antimicrobial prescribing practices in equine medicine.

Sat, 2016-09-03 06:52

Use of large-scale veterinary data for the investigation of antimicrobial prescribing practices in equine medicine.

Equine Vet J. 2016 Sep 2;

Authors: Welsh CE, Parkin TD, Marshall JF

Abstract
BACKGROUND: As antimicrobial resistant bacterial strains continue to emerge and spread in human and animal populations, understanding prescription practices is key in benchmarking current performance and setting goals. Antimicrobial prescription in companion veterinary species is widespread, but is neither monitored nor restricted in the US and Canada. The veterinary use of certain antimicrobial classes is discouraged in some countries, in the hope of preserving efficacy for serious human infections.
OBJECTIVES: The aim of this study was to ascertain the rate of prescription of a number of 'reserved' antimicrobials in a first-opinion US and Canadian horse cohort, and identify trends in their empirical use.
STUDY DESIGN: Retrospective cohort study.
METHODS: A large convenience sample of electronic medical records (2006 to 2012) were interrogated using text mining to identify enrofloxacin, clarithromycin and ceftiofur prescriptions. Time series analysis and logistic regression were used to identify trends and risk factors for prescription.
RESULTS: Prescription of these antimicrobials as a first-line intervention, without culture and sensitivity testing, was common in this population. Enrofloxacin prescriptions were found to increase over the study period, and there was evidence of either a reducing, or static trend in the proportion of reserved antimicrobial prescriptions informed by culture and sensitivity testing.
MAIN LIMITATIONS: Dose adequacy could not be included due to the nature of the data used.
CONCLUSIONS: Empirical use of reserved antimicrobials was common in this population, and further advice and guidance should be issued to first-opinion veterinarians to safeguard antimicrobial efficacy. This article is protected by copyright. All rights reserved.

PMID: 27589226 [PubMed - as supplied by publisher]

Categories: Literature Watch

Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature.

Sat, 2016-09-03 06:52

Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature.

BMC Syst Biol. 2016;10 Suppl 3:67

Authors: Zhang Y, Wu HY, Xu J, Wang J, Soysal E, Li L, Xu H

Abstract
BACKGROUND: Information about drug-drug interactions (DDIs) supported by scientific evidence is crucial for establishing computational knowledge bases for applications like pharmacovigilance. Since new reports of DDIs are rapidly accumulating in the scientific literature, text-mining techniques for automatic DDI extraction are critical. We propose a novel approach for automated pharmacokinetic (PK) DDI detection that incorporates syntactic and semantic information into graph kernels, to address the problem of sparseness associated with syntactic-structural approaches. First, we used a novel all-path graph kernel using shallow semantic representation of sentences. Next, we statistically integrated fine-granular semantic classes into the dependency and shallow semantic graphs.
RESULTS: When evaluated on the PK DDI corpus, our approach significantly outperformed the original all-path graph kernel that is based on dependency structure. Our system that combined dependency graph kernel with semantic classes achieved the best F-scores of 81.94 % for in vivo PK DDIs and 69.34 % for in vitro PK DDIs, respectively. Further, combining shallow semantic graph kernel with semantic classes achieved the highest precisions of 84.88 % for in vivo PK DDIs and 74.83 % for in vitro PK DDIs, respectively.
CONCLUSIONS: We presented a graph kernel based approach to combine syntactic and semantic information for extracting pharmacokinetic DDIs from Biomedical Literature. Experimental results showed that our proposed approach could extract PK DDIs from literature effectively, which significantly enhanced the performance of the original all-path graph kernel based on dependency structure.

PMID: 27585838 [PubMed - in process]

Categories: Literature Watch

Meta-generalis: A novel method for structuring information from radiology reports.

Fri, 2016-09-02 06:17

Meta-generalis: A novel method for structuring information from radiology reports.

Appl Clin Inform. 2016;7(3):803-16

Authors: Barbosa F, Traina AJ, Muglia VF

Abstract
BACKGROUND: A structured report for imaging exams aims at increasing the precision in information retrieval and communication between physicians. However, it is more concise than free text and may limit specialists' descriptions of important findings not covered by pre-defined structures. A computational ontological structure derived from free texts designed by specialists may be a solution for this problem. Therefore, the goal of our study was to develop a methodology for structuring information in radiology reports covering specifications required for the Brazilian Portuguese language, including the terminology to be used.
METHODS: We gathered 1,701 radiological reports of magnetic resonance imaging (MRI) studies of the lumbosacral spine from three different institutions. Techniques of text mining and ontological conceptualization of lexical units extracted were used to structure information. Ten radiologists, specialists in lumbosacral MRI, evaluated the textual superstructure and terminology extracted using an electronic questionnaire.
RESULTS: The established methodology consists of six steps: 1) collection of radiology reports of a specific MRI examination; 2) textual decomposition; 3) normalization of lexical units; 4) identification of textual superstructures; 5) conceptualization of candidate-terms; and 6) evaluation of superstructures and extracted terminology by experts using an electronic questionnaire. Three different textual superstructures were identified, with terminological variations in the names of their textual categories. The number of candidate-terms conceptualized was 4,183, yielding 727 concepts. There were a total of 13,963 relationships between candidate-terms and concepts and 789 relationships among concepts.
CONCLUSIONS: The proposed methodology allowed structuring information in a more intuitive and practical way. Indications of three textual superstructures, extraction of lexicon units and the normalization and ontologically conceptualization were achieved while maintaining references to their respective categories and free text radiology reports.

PMID: 27580980 [PubMed - in process]

Categories: Literature Watch

MapReduce in the Cloud: A Use Case Study for Efficient Co-Occurrence Processing of MEDLINE Annotations with MeSH.

Thu, 2016-09-01 08:45
Related Articles

MapReduce in the Cloud: A Use Case Study for Efficient Co-Occurrence Processing of MEDLINE Annotations with MeSH.

Stud Health Technol Inform. 2016;228:582-6

Authors: Kreuzthaler M, Miñarro-Giménez JA, Schulz S

Abstract
Big data resources are difficult to process without a scaled hardware environment that is specifically adapted to the problem. The emergence of flexible cloud-based virtualization techniques promises solutions to this problem. This paper demonstrates how a billion of lines can be processed in a reasonable amount of time in a cloud-based environment. Our use case addresses the accumulation of concept co-occurrence data in MEDLINE annotation as a series of MapReduce jobs, which can be scaled and executed in the cloud. Besides showing an efficient way solving this problem, we generated an additional resource for the scientific community to be used for advanced text mining approaches.

PMID: 27577450 [PubMed - in process]

Categories: Literature Watch

Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

Tue, 2016-08-30 08:13

Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

AMIA Jt Summits Transl Sci Proc. 2016;2016:42-51

Authors: Elayavilli RK, Liu H

Abstract
BACKGROUND: Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts.
METHODS: In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts.
RESULTS: The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study.
CONCLUSIONS: This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework.
AVAILABILITY: The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.

PMID: 27570648 [PubMed]

Categories: Literature Watch

Survey of Natural Language Processing Techniques in Bioinformatics.

Tue, 2016-08-30 08:13
Related Articles

Survey of Natural Language Processing Techniques in Bioinformatics.

Comput Math Methods Med. 2015;2015:674296

Authors: Zeng Z, Shi H, Wu Y, Hong Z

Abstract
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

PMID: 26525745 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma.

Mon, 2016-08-29 08:02

Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma.

J Exp Clin Cancer Res. 2016;35(1):127

Authors: Yin F, Shu L, Liu X, Li T, Peng T, Nan Y, Li S, Zeng X, Qiu X

Abstract
BACKGROUND: Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths. The average survival and 5-year survival rates of HCC patients still remains poor. Thus, there is an urgent need to better understand the mechanisms of cancer progression in HCC and to identify useful biomarkers to predict prognosis.
METHODS: Public data portals including Oncomine, The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) profiles were used to retrieve the HCC-related microarrays and to identify potential genes contributed to cancer progression. Bioinformatics analyses including pathway enrichment, protein/gene interaction and text mining were used to explain the potential roles of the identified genes in HCC. Quantitative real-time polymerase chain reaction analysis and Western blotting were used to measure the expression of the targets. The data were analysed by SPSS 20.0 software.
RESULTS: We identified 80 genes that were significantly dysregulated in HCC according to four independent microarrays covering 386 cases of HCC and 327 normal liver tissues. Twenty genes were consistently and stably dysregulated in the four microarrays by at least 2-fold and detection of gene expression by RT-qPCR and western blotting showed consistent expression profiles in 11 HCC tissues compared with corresponding paracancerous tissues. Eleven of these 20 genes were associated with disease-free survival (DFS) or overall survival (OS) in a cohort of 157 HCC patients, and eight genes were associated with tumour pathologic PT, tumour stage or vital status. Potential roles of those 20 genes in regulation of HCC progression were predicted, primarily in association with metastasis. INTS8 was specifically correlated with most clinical characteristics including DFS, OS, stage, metastasis, invasiveness, diagnosis, and age.
CONCLUSION: The significantly dysregulated genes identified in this study were associated with cancer progression and prognosis in HCC, and might be potential therapeutic targets for HCC treatment or potential biomarkers for diagnosis and prognosis.

PMID: 27567667 [PubMed - as supplied by publisher]

Categories: Literature Watch

pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

Fri, 2016-08-26 07:18
Related Articles

pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

J Biosci. 2015 Oct;40(4):671-82

Authors: Rani J, Shah AB, Ramachandran S

Abstract
The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

PMID: 26564970 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Thu, 2016-08-25 07:02
Related Articles

DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

PLoS One. 2016;11(4):e0152725

Authors: Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K

Abstract
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

PMID: 27073839 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Prevalence, survival analysis and multimorbidity of chronic diseases in the general veterinarian-attended horse population of the UK.

Mon, 2016-08-22 06:17

Prevalence, survival analysis and multimorbidity of chronic diseases in the general veterinarian-attended horse population of the UK.

Prev Vet Med. 2016 Sep 1;131:137-145

Authors: Welsh CE, Duz M, Parkin TD, Marshall JF

Abstract
The average age of the global human population is increasing, leading to increased interest in the effects of chronic disease and multimorbidity on health resources and patient welfare. It has been posited that the average age of the general veterinarian-attended horse population of the UK is also increasing, and therefore it could be assumed that chronic diseases and multimorbidity would pose an increasing risk here also. However, evidence for this trend in ageing is very limited, and the current prevalence of many chronic diseases, and of multimorbidity, is unknown. Using text mining of first-opinion electronic medical records from seven veterinary practices around the UK, Kaplan-Meier and Cox proportional hazard modelling, we were able to estimate the apparent prevalence among veterinarian-attended horses of nine chronic diseases, and to assess their relative effects on median life expectancy following diagnosis. With these methods we found evidence of increasing population age. Multimorbidity affected 1.2% of the study population, and had a significant effect upon survival times, with co-occurrence of two diseases, and three or more diseases, leading to 6.6 and 21.3 times the hazard ratio compared to no chronic disease, respectively. Laminitis was involved in 74% of cases of multimorbidity. The population of horses attended by UK veterinarians appears to be aging, and chronic diseases and their co-occurrence are common features, and as such warrant further investigation.

PMID: 27544263 [PubMed - as supplied by publisher]

Categories: Literature Watch

The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge.

Sun, 2016-08-21 06:02

The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge.

Database (Oxford). 2016;2016

Authors: Pérez-Pérez M, Pérez-Rodríguez G, Rabal O, Vazquez M, Oyarzabal J, Fdez-Riverola F, Valencia A, Krallinger M, Lourenço A

Abstract
Biomedical text mining methods and technologies have improved significantly in the last decade. Considerable efforts have been invested in understanding the main challenges of biomedical literature retrieval and extraction and proposing solutions to problems of practical interest. Most notably, community-oriented initiatives such as the BioCreative challenge have enabled controlled environments for the comparison of automatic systems while pursuing practical biomedical tasks. Under this scenario, the present work describes the Markyt Web-based document curation platform, which has been implemented to support the visualisation, prediction and benchmark of chemical and gene mention annotations at BioCreative/CHEMDNER challenge. Creating this platform is an important step for the systematic and public evaluation of automatic prediction systems and the reusability of the knowledge compiled for the challenge. Markyt was not only critical to support the manual annotation and annotation revision process but also facilitated the comparative visualisation of automated results against the manually generated Gold Standard annotations and comparative assessment of generated results. We expect that future biomedical text mining challenges and the text mining community may benefit from the Markyt platform to better explore and interpret annotations and improve automatic system predictions.Database URL: http://www.markyt.org, https://github.com/sing-group/Markyt.

PMID: 27542845 [PubMed - as supplied by publisher]

Categories: Literature Watch

Pages