Semantic Web

Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.

Sat, 2016-06-25 07:04

Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.

J Biomed Semantics. 2016;7(1):42

Authors: Scuba W, Tharp M, Mowery D, Tseytlin E, Liu Y, Drews FA, Chapman WW

Abstract
BACKGROUND: Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text.
RESULTS: Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76 %) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86 %) and varied recall for modifiers (certainty: 91 % sidedness: 80 %, neurovascular anatomy: 46 %).
CONCLUSION: Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts.

PMID: 27338146 [PubMed - in process]

Categories: Literature Watch

Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources.

Fri, 2016-06-24 06:48

Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources.

PLoS Comput Biol. 2016 Jun;12(6):e1004989

Authors: Waagmeester A, Kutmon M, Riutta A, Miller R, Willighagen EL, Evelo CT, Pico AR

Abstract
The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.

PMID: 27336457 [PubMed - as supplied by publisher]

Categories: Literature Watch

stringgaussnet: from differentially expressed genes to semantic and Gaussian networks generation.

Sat, 2016-06-18 06:35
Related Articles

stringgaussnet: from differentially expressed genes to semantic and Gaussian networks generation.

Bioinformatics. 2015 Dec 1;31(23):3865-7

Authors: Chaplais E, Garchon HJ

Abstract
MOTIVATION: Knowledge-based and co-expression networks are two kinds of gene networks that can be currently implemented by sophisticated but distinct tools. We developed stringgaussnet, an R package that integrates both approaches, starting from a list of differentially expressed genes.
CONTACT: henri-jean.garchon@inserm.fr.
AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://cran.r-project.org/web/packages/stringgaussnet.

PMID: 26231430 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

DNA data bank of Japan (DDBJ) progress report.

Thu, 2016-06-16 06:11
Related Articles

DNA data bank of Japan (DDBJ) progress report.

Nucleic Acids Res. 2016 Jan 4;44(D1):D51-7

Authors: Mashima J, Kodama Y, Kosuge T, Fujisawa T, Katayama T, Nagasaki H, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T

Abstract
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration.

PMID: 26578571 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

Wed, 2016-06-15 09:02
Related Articles

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

J Biomed Semantics. 2016;7:39

Authors: Bolleman JT, Mungall CJ, Strozzi F, Baran J, Dumontier M, Bonnal RJ, Buels R, Hoehndorf R, Fujisawa T, Katayama T, Cock PJ

Abstract
BACKGROUND: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples.
DESCRIPTION: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations.
CONCLUSIONS: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

PMID: 27296299 [PubMed - in process]

Categories: Literature Watch

PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context.

Sat, 2016-06-11 07:56

PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context.

Sci Rep. 2016;6:27653

Authors: Zhou J, Xu R, He Y, Lu Q, Wang H, Kong B

Abstract
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.

PMID: 27282833 [PubMed - in process]

Categories: Literature Watch

The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.

Thu, 2016-06-09 16:35

The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.

ACS Synth Biol. 2016 Jun 7;

Authors: Madsen C, McLaughlin JA, Misirli G, Pocock M, Flanagan K, Hallinan J, Wipat A

Abstract
Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

PMID: 27268205 [PubMed - as supplied by publisher]

Categories: Literature Watch

The Orthology Ontology: development and applications.

Sun, 2016-06-05 06:27

The Orthology Ontology: development and applications.

J Biomed Semantics. 2016;7(1):34

Authors: Fernández-Breis JT, Chiba H, Legaz-García MD, Uchiyama I

Abstract
BACKGROUND: Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit.
DESCRIPTION: The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth .
CONCLUSIONS: The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.

PMID: 27259657 [PubMed - as supplied by publisher]

Categories: Literature Watch

Generation of open biomedical datasets through ontology-driven transformation and integration processes.

Sat, 2016-06-04 06:07

Generation of open biomedical datasets through ontology-driven transformation and integration processes.

J Biomed Semantics. 2016;7:32

Authors: Carmen Legaz-García MD, Miñarro-Giménez JA, Menárguez-Tortosa M, Fernández-Breis JT

Abstract
BACKGROUND: Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats.
METHODS: We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes.
RESULTS: The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned.
CONCLUSIONS: We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.

PMID: 27255189 [PubMed - in process]

Categories: Literature Watch

P-Finder: Reconstruction of Signaling Networks from Protein-Protein Interactions and GO Annotations.

Fri, 2016-06-03 08:52
Related Articles

P-Finder: Reconstruction of Signaling Networks from Protein-Protein Interactions and GO Annotations.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):309-21

Authors: Young-Rae Cho, Yanan Xin, Speegle G

Abstract
Because most complex genetic diseases are caused by defects of cell signaling, illuminating a signaling cascade is essential for understanding their mechanisms. We present three novel computational algorithms to reconstruct signaling networks between a starting protein and an ending protein using genome-wide protein-protein interaction (PPI) networks and gene ontology (GO) annotation data. A signaling network is represented as a directed acyclic graph in a merged form of multiple linear pathways. An advanced semantic similarity metric is applied for weighting PPIs as the preprocessing of all three methods. The first algorithm repeatedly extends the list of nodes based on path frequency towards an ending protein. The second algorithm repeatedly appends edges based on the occurrence of network motifs which indicate the link patterns more frequently appearing in a PPI network than in a random graph. The last algorithm uses the information propagation technique which iteratively updates edge orientations based on the path strength and merges the selected directed edges. Our experimental results demonstrate that the proposed algorithms achieve higher accuracy than previous methods when they are tested on well-studied pathways of S. cerevisiae. Furthermore, we introduce an interactive web application tool, called P-Finder, to visualize reconstructed signaling networks.

PMID: 26357219 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

The BioHub Knowledge Base: Ontology and Repository for Sustainable Biosourcing.

Thu, 2016-06-02 08:35
Related Articles

The BioHub Knowledge Base: Ontology and Repository for Sustainable Biosourcing.

J Biomed Semantics. 2016;7(1):30

Authors: Read WJ, Demetriou G, Nenadic G, Ruddock N, Stevens R, Winter J

Abstract
BACKGROUND: The motivation for the BioHub project is to create an Integrated Knowledge Management System (IKMS) that will enable chemists to source ingredients from bio-renewables, rather than from non-sustainable sources such as fossil oil and its derivatives.
METHOD: The BioHubKB is the data repository of the IKMS; it employs Semantic Web technologies, especially OWL, to host data about chemical transformations, bio-renewable feedstocks, co-product streams and their chemical components. Access to this knowledge base is provided to other modules within the IKMS through a set of RESTful web services, driven by SPARQL queries to a Sesame back-end. The BioHubKB re-uses several bio-ontologies and bespoke extensions, primarily for chemical feedstocks and products, to form its knowledge organisation schema.
RESULTS: Parts of plants form feedstocks, while various processes generate co-product streams that contain certain chemicals. Both chemicals and transformations are associated with certain qualities, which the BioHubKB also attempts to capture. Of immediate commercial and industrial importance is to estimate the cost of particular sets of chemical transformations (leading to candidate surfactants) performed in sequence, and these costs too are captured. Data are sourced from companies' internal knowledge and document stores, and from the publicly available literature. Both text analytics and manual curation play their part in populating the ontology. We describe the prototype IKMS, the BioHubKB and the services that it supports for the IKMS.
AVAILABILITY: The BioHubKB can be found via http://biohub.cs.manchester.ac.uk/ontology/biohub-kb.owl .

PMID: 27246819 [PubMed - in process]

Categories: Literature Watch

ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository.

Thu, 2016-06-02 08:35
Related Articles

ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository.

BMC Med Res Methodol. 2016;16(1):65

Authors: Dugas M, Meidt A, Neuhaus P, Storck M, Varghese J

Abstract
BACKGROUND: The volume and complexity of patient data - especially in personalised medicine - is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare.
METHODS: Semantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don't provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository.
RESULTS: A web-based tool called ODMedit ( https://odmeditor.uni-muenster.de/ ) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal.
CONCLUSION: Uniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.

PMID: 27245222 [PubMed - in process]

Categories: Literature Watch

Acceptability of a mobile health exercise-based cardiac rehabilitation intervention: a randomized trial.

Thu, 2016-06-02 08:35
Related Articles

Acceptability of a mobile health exercise-based cardiac rehabilitation intervention: a randomized trial.

J Cardiopulm Rehabil Prev. 2015 Sep-Oct;35(5):312-9

Authors: Pfaeffli Dale L, Whittaker R, Dixon R, Stewart R, Jiang Y, Carter K, Maddison R

Abstract
BACKGROUND: Mobile technologies (mHealth) have recently been used to deliver behavior change interventions; however, few have investigated the application of mHealth for treatment of ischemic heart disease (IHD). The Heart Exercise And Remote Technologies trial examined the effectiveness of an mHealth intervention to increase exercise behavior in adults with IHD. As a part of this trial, a process evaluation was conducted.
METHODS: One hundred seventy-one adults with IHD were randomized to receive a 6-month mHealth intervention (n = 85) plus usual care or usual care alone (n = 86). The intervention delivered a theory-based, automated package of exercise prescription and behavior change text messages and a supporting Web site. Three sources of data were triangulated to assess intervention participant perceptions: (1) Web site usage statistics; (2) feedback surveys; and (3) semistructured exit interviews. Descriptive information from survey and Web data were merged with qualitative data and analyzed using a semantic thematic approach.
RESULTS: At 24 weeks, all intervention participants provided Web usage statistics, 75 completed the feedback survey, and 17 were interviewed. Participants reported reading the text messages (70/75; 93%) and liked the content (55/75; 73%). The program motivated participants to exercise. Several suggestions to improve the program included further tailoring of the content (7/75; 7%) and increased personal contact (10/75; 13%).
CONCLUSIONS: Adults with IHD were able to use an mHealth program and reported that text messaging is a good way to deliver exercise information. While mHealth is designed to be automated, programs might be improved if content and delivery were tailored to individual needs.

PMID: 26181037 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

An ontology for Autism Spectrum Disorder (ASD) to infer ASD phenotypes from Autism Diagnostic Interview-Revised data.

Thu, 2016-06-02 08:35
Related Articles

An ontology for Autism Spectrum Disorder (ASD) to infer ASD phenotypes from Autism Diagnostic Interview-Revised data.

J Biomed Inform. 2015 Aug;56:333-47

Authors: Mugzach O, Peleg M, Bagley SC, Guter SJ, Cook EH, Altman RB

Abstract
OBJECTIVE: Our goal is to create an ontology that will allow data integration and reasoning with subject data to classify subjects, and based on this classification, to infer new knowledge on Autism Spectrum Disorder (ASD) and related neurodevelopmental disorders (NDD). We take a first step toward this goal by extending an existing autism ontology to allow automatic inference of ASD phenotypes and Diagnostic & Statistical Manual of Mental Disorders (DSM) criteria based on subjects' Autism Diagnostic Interview-Revised (ADI-R) assessment data.
MATERIALS AND METHODS: Knowledge regarding diagnostic instruments, ASD phenotypes and risk factors was added to augment an existing autism ontology via Ontology Web Language class definitions and semantic web rules. We developed a custom Protégé plugin for enumerating combinatorial OWL axioms to support the many-to-many relations of ADI-R items to diagnostic categories in the DSM. We utilized a reasoner to infer whether 2642 subjects, whose data was obtained from the Simons Foundation Autism Research Initiative, meet DSM-IV-TR (DSM-IV) and DSM-5 diagnostic criteria based on their ADI-R data.
RESULTS: We extended the ontology by adding 443 classes and 632 rules that represent phenotypes, along with their synonyms, environmental risk factors, and frequency of comorbidities. Applying the rules on the data set showed that the method produced accurate results: the true positive and true negative rates for inferring autistic disorder diagnosis according to DSM-IV criteria were 1 and 0.065, respectively; the true positive rate for inferring ASD based on DSM-5 criteria was 0.94.
DISCUSSION: The ontology allows automatic inference of subjects' disease phenotypes and diagnosis with high accuracy.
CONCLUSION: The ontology may benefit future studies by serving as a knowledge base for ASD. In addition, by adding knowledge of related NDDs, commonalities and differences in manifestations and risk factors could be automatically inferred, contributing to the understanding of ASD pathophysiology.

PMID: 26151311 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.

Thu, 2016-06-02 08:35
Related Articles

Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.

J Biomed Inform. 2015 Aug;56:57-64

Authors: De-Arteaga M, Eggel I, Do B, Rubin D, Kahn CE, Müller H

Abstract
Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often underrepresented, therefore RadLex was considered to be the best option for this task. The results show a surprising similarity between the usage behaviour in the two systems, but several subtle differences can also be noted. The average number of terms per query is 2.21 for GoldMiner and 2.07 for radTF, the used axes of RadLex (anatomy, pathology, findings, …) have almost the same distribution with clinical findings being the most frequent and the anatomical entity the second; also, combinations of RadLex axes are extremely similar between the two systems. Differences include a longer length of the sessions in radTF than in GoldMiner (3.4 and 1.9 queries per session on average). Several frequent search terms overlap but some strong differences exist in the details. In radTF the term "normal" is frequent, whereas in GoldMiner it is not. This makes intuitive sense, as in the literature normal cases are rarely described whereas in clinical work the comparison with normal cases is often a first step. The general similarity in many points is likely due to the fact that users of the two systems are influenced by their daily behaviour in using standard web search engines and follow this behaviour in their professional search. This means that many results and insights gained from standard web search can likely be transferred to more specialized search systems. Still, specialized log files can be used to find out more on reformulations and detailed strategies of users to find the right content.

PMID: 26002820 [PubMed - indexed for MEDLINE]

Categories: Literature Watch

Building the Ferretome.

Wed, 2016-06-01 08:22
Related Articles

Building the Ferretome.

Front Neuroinform. 2016;10:16

Authors: Sukhinin DI, Engel AK, Manger P, Hilgetag CC

Abstract
Databases of structural connections of the mammalian brain, such as CoCoMac (cocomac.g-node.org) or BAMS (https://bams1.org), are valuable resources for the analysis of brain connectivity and the modeling of brain dynamics in species such as the non-human primate or the rodent, and have also contributed to the computational modeling of the human brain. Another animal model that is widely used in electrophysiological or developmental studies is the ferret; however, no systematic compilation of brain connectivity is currently available for this species. Thus, we have started developing a database of anatomical connections and architectonic features of the ferret brain, the Ferret(connect)ome, www.Ferretome.org. The Ferretome database has adapted essential features of the CoCoMac methodology and legacy, such as the CoCoMac data model. This data model was simplified and extended in order to accommodate new data modalities that were not represented previously, such as the cytoarchitecture of brain areas. The Ferretome uses a semantic parcellation of brain regions as well as a logical brain map transformation algorithm (objective relational transformation, ORT). The ORT algorithm was also adopted for the transformation of architecture data. The database is being developed in MySQL and has been populated with literature reports on tract-tracing observations in the ferret brain using a custom-designed web interface that allows efficient and validated simultaneous input and proofreading by multiple curators. The database is equipped with a non-specialist web interface. This interface can be extended to produce connectivity matrices in several formats, including a graphical representation superimposed on established ferret brain maps. An important feature of the Ferretome database is the possibility to trace back entries in connectivity matrices to the original studies archived in the system. Currently, the Ferretome contains 50 reports on connections comprising 20 injection reports with more than 150 labeled source and target areas, the majority reflecting connectivity of subcortical nuclei and 15 descriptions of regional brain architecture. We hope that the Ferretome database will become a useful resource for neuroinformatics and neural modeling, and will support studies of the ferret brain as well as facilitate advances in comparative studies of mesoscopic brain connectivity.

PMID: 27242503 [PubMed]

Categories: Literature Watch

From frames to OWL2: Converting the Foundational Model of Anatomy.

Sun, 2016-05-29 07:35

From frames to OWL2: Converting the Foundational Model of Anatomy.

Artif Intell Med. 2016 May;69:12-21

Authors: Detwiler LT, Mejino JL, Brinkley JF

Abstract
OBJECTIVE: The Foundational Model of Anatomy (FMA) [Rosse C, Mejino JLV. A reference ontology for bioinformatics: the Foundational Model of Anatomy. J. Biomed. Inform. 2003;36:478-500] is an ontology that represents canonical anatomy at levels ranging from the entire body to biological macromolecules, and has rapidly become the primary reference ontology for human anatomy, and a template for model organisms. Prior to this work, the FMA was developed in a knowledge modeling language known as Protégé Frames. Frames is an intuitive representational language, but is no longer the industry standard. Recognizing the need for an official version of the FMA in the more modern semantic web language OWL2 (hereafter referred to as OWL), the objective of this work was to create a generalizable Frames-to-OWL conversion tool, to use the tool to convert the FMA to OWL, to "clean up" the converted FMA so that it classifies under an EL reasoner, and then to do all further development in OWL.
METHODS: The conversion tool is a Java application that uses the Protégé knowledge representation API for interacting with the initial Frames ontology, and uses the OWL-API for producing new statements (axioms, etc.) in OWL. The converter is relation centric. The conversion is configurable, on a property-by-property basis, via user-specifiable XML configuration files. The best conversion, for each property, was determined in conjunction with the FMA knowledge author. The convertor is potentially generalizable, which we partially demonstrate by using it to convert our Ontology of Craniofacial Development and Malformation as well as the FMA. Post-conversion cleanup involved using the Explain feature of Protégé to trace classification errors under the ELK reasoner in Protégé, fixing the errors, then re-running the reasoner.
RESULTS: We are currently doing all our development in the converted and cleaned-up version of the FMA. The FMA (updated every 3 months) is available via our FMA web page http://si.washington.edu/projects/fma, which also provides access to mailing lists, an issue tracker, a SPARQL endpoint (updated every week), and an online browser. The converted OCDM is available at http://www.si.washington.edu/projects/ocdm. The conversion code is open source, and available at http://purl.org/sig/software/frames2owl. Prior to the post-conversion cleanup 73% of the more than 100,000 classes were unsatisfiable. After correction of six types of errors no classes remained unsatisfiable.
CONCLUSION: Because our FMA conversion captures all or most of the information in the Frames version, is the only complete OWL version that classifies under an EL reasoner, and is maintained by the FMA authors themselves, we propose that this version should be the only official release version of the FMA in OWL, supplanting all other versions. Although several issues remain to be resolved post-conversion, release of a single, standardized version of the FMA in OWL will greatly facilitate its use in informatics research and in the development of a global knowledge base within the semantic web. Because of the fundamental nature of anatomy in both understanding and organizing biomedical information, and because of the importance of the FMA in particular in representing human anatomy, the FMA in OWL should greatly accelerate the development of an anatomically based structural information framework for organizing and linking a large amount of biomedical information.

PMID: 27235801 [PubMed - as supplied by publisher]

Categories: Literature Watch

Web Ontologies to Categorialy Structure Reality: Representations of Human Emotional, Cognitive, and Motivational Processes.

Sat, 2016-05-21 08:32
Related Articles

Web Ontologies to Categorialy Structure Reality: Representations of Human Emotional, Cognitive, and Motivational Processes.

Front Psychol. 2016;7:551

Authors: López-Gil JM, Gil R, García R

Abstract
This work presents a Web ontology for modeling and representation of the emotional, cognitive and motivational state of online learners, interacting with university systems for distance or blended education. The ontology is understood as a way to provide the required mechanisms to model reality and associate it to emotional responses, but without committing to a particular way of organizing these emotional responses. Knowledge representation for the contributed ontology is performed by using Web Ontology Language (OWL), a semantic web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that computer programs can exploit knowledge expressed in OWL and also facilitates sharing and reusing knowledge using the global infrastructure of the Web. The proposed ontology has been tested in the field of Massive Open Online Courses (MOOCs) to check if it is capable of representing emotions and motivation of the students in this context of use.

PMID: 27199796 [PubMed]

Categories: Literature Watch

BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.

Sat, 2016-05-14 09:57

BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.

Database (Oxford). 2016;2016

Authors: Lai PT, Lo YY, Huang MS, Hsiao YC, Tsai RT

Abstract
Biological expression language (BEL) is one of the most popular languages to represent the causal and correlative relationships among biological events. Automatically extracting and representing biomedical events using BEL can help biologists quickly survey and understand relevant literature. Recently, many researchers have shown interest in biomedical event extraction. However, the task is still a challenge for current systems because of the complexity of integrating different information extraction tasks such as named entity recognition (NER), named entity normalization (NEN) and relation extraction into a single system. In this study, we introduce our BelSmile system, which uses a semantic-role-labeling (SRL)-based approach to extract the NEs and events for BEL statements. BelSmile combines our previous NER, NEN and SRL systems. We evaluate BelSmile using the BioCreative V BEL task dataset. Our system achieved an F-score of 27.8%, ∼7% higher than the top BioCreative V system. The three main contributions of this study are (i) an effective pipeline approach to extract BEL statements, and (ii) a syntactic-based labeler to extract subject-verb-object tuples. We also implement a web-based version of BelSmile (iii) that is publicly available at iisrserv.csie.ncu.edu.tw/belsmile.

PMID: 27173520 [PubMed - as supplied by publisher]

Categories: Literature Watch

DisGeNET-RDF: Harnessing the Innovative Power of the Semantic Web to Explore the Genetic Basis of Diseases.

Sat, 2016-05-07 11:04

DisGeNET-RDF: Harnessing the Innovative Power of the Semantic Web to Explore the Genetic Basis of Diseases.

Bioinformatics. 2016 Apr 22;

Authors: Queralt-Rosinach N, Piñero J, Bravo À, Sanz F, Furlong LI

Abstract
MOTIVATION: DisGeNET-RDF makes available knowledge on the genetic basis of human diseases in the Semantic Web (SW). Gene-disease associations (GDAs) and their provenance metadata are published as human-readable and machine-processable web resources. The information on GDAs included in DisGeNET-RDF is interlinked to other biomedical databases to support the development of bioinformatics approaches for translational research through evidence-based exploitation of a rich and fully interconnected Linked Open Data (LOD).
AVAILABILITY: http://rdf.disgenet.org/ CONTACT: support@disgenet.org.

PMID: 27153650 [PubMed - as supplied by publisher]

Categories: Literature Watch

Pages