Semantic Web
Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.
Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.
Database (Oxford). 2016;2016:
Authors: Putman TE, Burgstaller-Muehlbacher S, Waagmeester A, Wu C, Su AI, Good BM
Abstract
The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata's semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteriaChlamydia trachomatis.Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ∼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert.
PMID: 27022157 [PubMed - indexed for MEDLINE]
Wikidata as a semantic framework for the Gene Wiki initiative.
Wikidata as a semantic framework for the Gene Wiki initiative.
Database (Oxford). 2016;2016:
Authors: Burgstaller-Muehlbacher S, Waagmeester A, Mitraka E, Turner J, Putman T, Leong J, Naik C, Pavlidis P, Schriml L, Good BM, Su AI
Abstract
Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.
PMID: 26989148 [PubMed - indexed for MEDLINE]
Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.
Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):350
Authors: Roy S, Curry BC, Madahian B, Homayouni R
Abstract
BACKGROUND: The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs.
RESULTS: For approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms.
CONCLUSIONS: LSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin .
PMID: 27766940 [PubMed - in process]
Data Integration and Mining for Synthetic Biology Design.
Data Integration and Mining for Synthetic Biology Design.
ACS Synth Biol. 2016 Oct 21;5(10):1086-1097
Authors: Mısırlı G, Hallinan J, Pocock M, Lord P, McLaughlin JA, Sauro H, Wipat A
Abstract
One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.
PMID: 27110921 [PubMed - in process]
PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data.
PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data.
Ann N Y Acad Sci. 2016 Oct 17;:
Authors: Shaban-Nejad A, Lavigne M, Okhmatovskaia A, Buckeridge DL
Abstract
Population health decision makers must consider complex relationships between multiple concepts measured with differential accuracy from heterogeneous data sources. Population health information systems are currently limited in their ability to integrate data and present a coherent portrait of population health. Consequentially, these systems can provide only basic support for decision makers. The Population Health Record (PopHR) is a semantic web application that automates the integration and extraction of massive amounts of heterogeneous data from multiple distributed sources (e.g., administrative data, clinical records, and survey responses) to support the measurement and monitoring of population health and health system performance for a defined population. The design of the PopHR draws on the theories of the determinants of health and evidence-based public health to harmonize and explicitly link information about a population with evidence about the epidemiology and control of chronic diseases. Organizing information in this manner and linking it explicitly to evidence is expected to improve decision making related to the planning, implementation, and evaluation of population health and health system interventions. In this paper, we describe the PopHR platform and discuss the architecture, design, key modules, and its implementation and use.
PMID: 27750378 [PubMed - as supplied by publisher]
Semantic Indexing of Medical Learning Objects: Medical Students' Usage of a Semantic Network.
Semantic Indexing of Medical Learning Objects: Medical Students' Usage of a Semantic Network.
JMIR Med Educ. 2015 Nov 11;1(2):e16
Authors: Tix N, Gießler P, Ohnesorge-Radtke U, Spreckelsen C
Abstract
BACKGROUND: The Semantically Annotated Media (SAM) project aims to provide a flexible platform for searching, browsing, and indexing medical learning objects (MLOs) based on a semantic network derived from established classification systems. Primarily, SAM supports the Aachen emedia skills lab, but SAM is ready for indexing distributed content and the Simple Knowledge Organizing System standard provides a means for easily upgrading or even exchanging SAM's semantic network. There is a lack of research addressing the usability of MLO indexes or search portals like SAM and the user behavior with such platforms.
OBJECTIVE: The purpose of this study was to assess the usability of SAM by investigating characteristic user behavior of medical students accessing MLOs via SAM.
METHODS: In this study, we chose a mixed-methods approach. Lean usability testing was combined with usability inspection by having the participants complete four typical usage scenarios before filling out a questionnaire. The questionnaire was based on the IsoMetrics usability inventory. Direct user interaction with SAM (mouse clicks and pages accessed) was logged.
RESULTS: The study analyzed the typical usage patterns and habits of students using a semantic network for accessing MLOs. Four scenarios capturing characteristics of typical tasks to be solved by using SAM yielded high ratings of usability items and showed good results concerning the consistency of indexing by different users. Long-tail phenomena emerge as they are typical for a collaborative Web 2.0 platform. Suitable but nonetheless rarely used keywords were assigned to MLOs by some users.
CONCLUSIONS: It is possible to develop a Web-based tool with high usability and acceptance for indexing and retrieval of MLOs. SAM can be applied to indexing multicentered repositories of MLOs collaboratively.
PMID: 27731860 [PubMed - in process]
Inferring unknown biological functions by integration of GO annotations and gene expression data.
Inferring unknown biological functions by integration of GO annotations and gene expression data.
IEEE/ACM Trans Comput Biol Bioinform. 2016 Oct 07;:
Authors: Leale G, Baya A, Milone D, Granitto P, Stegmayer G
Abstract
Characterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/.
PMID: 27723603 [PubMed - as supplied by publisher]
Cross-Species Analysis of Gene Expression and Function in Prefrontal Cortex, Hippocampus and Striatum.
Cross-Species Analysis of Gene Expression and Function in Prefrontal Cortex, Hippocampus and Striatum.
PLoS One. 2016;11(10):e0164295
Authors: Chen W, Xia X, Song N, Wang Y, Zhu H, Deng W, Kong Q, Pan X, Qin C
Abstract
BACKGROUND: Mouse has been extensively used as a tool for investigating the onset and development of human neurological disorders. As a first step to construct a transgenic mouse model of human brain lesions, it is of fundamental importance to clarify the similarity and divergence of genetic background between non-diseased human and mouse brain tissues.
METHODS: We systematically compared, based on large scale integrated microarray data, the transcriptomes of three anatomically distinct brain regions; prefrontal cortex (PFC), hippocampus (HIP) and striatum (STR), across human and mouse. The widely used DAVID web server was used to decipher the biological functions of the highly expressed genes that were identified using a previously reported approach. Venn analysis was used to depict the overlapping ratios of the notably enriched biological process (BP) terms (one-tailed Fisher's exact test and Benjamini correction; adjusted p < 0.01) between two brain tissues. GOSemSim, an R package, was selected to perform GO semantic similarity analysis. Next, we adjusted signal intensities of orthologous genes by the total signals in all samples within species, and used one minus Pearson's correlation coefficient to assess the expression distance. Hierarchical clustering and principal component analysis (PCA) were selected for expression pattern analysis. Lineage specific expressed orthologous genes were identified by comparison of the most extreme sub-datasets across species and further verified using reverse transcription PCR (RT-PCR) and quantitative real-time PCR (qRT-PCR).
RESULTS: We found that the number of the significantly enriched BP terms of the highly expressed genes in human brain regions is larger than that in mouse corresponding brain regions. The mainly involved BP terms in human brain tissues associated with protein-membrane targeting and selenium metabolism are species-specific. The overlapping ratios of all the significantly enriched BP terms between any two brain tissues across species are lower than that within species, but the pairwise semantic similarities are very high between any two brain tissues from either human or mouse. Hierarchical clustering analysis shows the biological functions of the highly expressed genes in brain tissues are more consistent within species than interspecies; whereas it shows the expression patterns of orthologous genes are evidently conserved between human and mouse equivalent brain tissues. In addition, we identified four orthologous genes (COX5B, WIF1, SLC4A10 and PLA2G7) that are species-specific, which have been widely studied and confirmed to be closely linked with neuro- physiological and pathological functions.
CONCLUSION: Our study highlights the similarities and divergences in gene function and expression between human and mouse corresponding brain regions, including PFC, HIP and STR.
PMID: 27716781 [PubMed - in process]
Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents.
Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents.
IEEE Trans Image Process. 2016 Sep 27;:
Authors: Yu L, Yang Y, Huang Z, Wang P, Song J, Shen H
Abstract
In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyse video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of web video event recognition, where web videos often describe largegranular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video datasets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of web video event recognition.
PMID: 27705859 [PubMed - as supplied by publisher]
A Real-Time Web of Things Framework with Customizable Openness Considering Legacy Devices.
A Real-Time Web of Things Framework with Customizable Openness Considering Legacy Devices.
Sensors (Basel). 2016;16(10)
Authors: Zhao S, Yu L, Cheng B
Abstract
With the development of the Internet of Things (IoT), resources and applications based on it have emerged on a large scale. However, most efforts are "silo" solutions where devices and applications are tightly coupled. Infrastructures are needed to connect sensors to the Internet, open up and break the current application silos and move to a horizontal application mode. Based on the concept of Web of Things (WoT), many infrastructures have been proposed to integrate the physical world with the Web. However, issues such as no real-time guarantee, lack of fine-grained control of data, and the absence of explicit solutions for integrating heterogeneous legacy devices, hinder their widespread and practical use. To address these issues, this paper proposes a WoT resource framework that provides the infrastructures for the customizable openness and sharing of users' data and resources under the premise of ensuring the real-time behavior of their own applications. The proposed framework is validated by actual systems and experimental evaluations.
PMID: 27690038 [PubMed - as supplied by publisher]
Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.
Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.
Future Oncol. 2016 Jan;12(1):119-36
Authors: Meldolesi E, van Soest J, Damiani A, Dekker A, Alitto AR, Campitelli M, Dinapoli N, Gatta R, Gambacorta MA, Lanzotti V, Lambin P, Valentini V
Abstract
The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor heterogeneity and the clinical outcome and/or treatment's toxicity's complexity, justify the effort to develop predictive models from decision support systems. However, the number of evaluated variables coming from multiple disciplines: oncology, computer science, bioinformatics, statistics, genomics, imaging, among others could be very large thus making traditional statistical analysis difficult to exploit. Automated data-mining processes and machine learning approaches can be a solution to organize the massive amount of data, trying to unravel important interaction. The purpose of this paper is to describe the strategy to collect and analyze data properly for decision support and introduce the concept of an 'umbrella protocol' within the framework of 'rapid learning healthcare'.
PMID: 26674745 [PubMed - indexed for MEDLINE]
A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.
A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.
Bioinformatics. 2016 Feb 1;32(3):477-9
Authors: Mazandu GK, Chimusa ER, Mbiyavanga M, Mulder NJ
Abstract
SUMMARY: Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs.
AVAILABILITY AND IMPLEMENTATION: A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence).
CONTACT: gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID: 26476781 [PubMed - indexed for MEDLINE]
The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation.
The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation.
J Biomed Semantics. 2016;7(1):57
Authors: Buttigieg PL, Pafilis E, Lewis SE, Schildhauer MP, Walls RL, Mungall CJ
Abstract
BACKGROUND: The Environment Ontology (ENVO; http://www.environmentontology.org/ ), first described in 2013, is a resource and research target for the semantically controlled description of environmental entities. The ontology's initial aim was the representation of the biomes, environmental features, and environmental materials pertinent to genomic and microbiome-related investigations. However, the need for environmental semantics is common to a multitude of fields, and ENVO's use has steadily grown since its initial description. We have thus expanded, enhanced, and generalised the ontology to support its increasingly diverse applications.
METHODS: We have updated our development suite to promote expressivity, consistency, and speed: we now develop ENVO in the Web Ontology Language (OWL) and employ templating methods to accelerate class creation. We have also taken steps to better align ENVO with the Open Biological and Biomedical Ontologies (OBO) Foundry principles and interoperate with existing OBO ontologies. Further, we applied text-mining approaches to extract habitat information from the Encyclopedia of Life and automatically create experimental habitat classes within ENVO.
RESULTS: Relative to its state in 2013, ENVO's content, scope, and implementation have been enhanced and much of its existing content revised for improved semantic representation. ENVO now offers representations of habitats, environmental processes, anthropogenic environments, and entities relevant to environmental health initiatives and the global Sustainable Development Agenda for 2030. Several branches of ENVO have been used to incubate and seed new ontologies in previously unrepresented domains such as food and agronomy. The current release version of the ontology, in OWL format, is available at http://purl.obolibrary.org/obo/envo.owl .
CONCLUSIONS: ENVO has been shaped into an ontology which bridges multiple domains including biomedicine, natural and anthropogenic ecology, 'omics, and socioeconomic development. Through continued interactions with our users and partners, particularly those performing data archiving and sythesis, we anticipate that ENVO's growth will accelerate in 2017. As always, we invite further contributions and collaboration to advance the semantic representation of the environment, ranging from geographic features and environmental materials, across habitats and ecosystems, to everyday objects in household settings.
PMID: 27664130 [PubMed - as supplied by publisher]
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.
Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.
J Med Internet Res. 2016;18(9):e251
Authors: Agarwal V, Zhang L, Zhu J, Fang S, Cheng T, Hong C, Shah NH
Abstract
BACKGROUND: By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear.
OBJECTIVE: The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015.
METHODS: We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted utilization score, served as a surrogate measure of the model's utility.
RESULTS: We obtained the highest area under the curve (0.796) in medical visit prediction with our random forests model and daywise features. Ablating feature categories one at a time showed that the model performance worsened the most when location features were dropped. An online evaluation in which advertisements were served to users who had a high predicted probability of a future medical visit showed a 3.96% increase in the show conversion rate.
CONCLUSIONS: Results from our experiments done in a research setting suggest that it is possible to accurately predict future patient visits from geotagged mobile search logs. Results from the offline and online experiments on the utility of health utilization predictions suggest that such prediction can have utility for health care providers.
PMID: 27655225 [PubMed - as supplied by publisher]
Semantic processing of EHR data for clinical research.
Semantic processing of EHR data for clinical research.
J Biomed Inform. 2015 Dec;58:247-59
Authors: Sun H, Depraetere K, De Roo J, Mels G, De Vloed B, Twagirumukiza M, Colaert D
Abstract
There is a growing need to semantically process and integrate clinical data from different sources for clinical research. This paper presents an approach to integrate EHRs from heterogeneous resources and generate integrated data in different data formats or semantics to support various clinical research applications. The proposed approach builds semantic data virtualization layers on top of data sources, which generate data in the requested semantics or formats on demand. This approach avoids upfront dumping to and synchronizing of the data with various representations. Data from different EHR systems are first mapped to RDF data with source semantics, and then converted to representations with harmonized domain semantics where domain ontologies and terminologies are used to improve reusability. It is also possible to further convert data to application semantics and store the converted results in clinical research databases, e.g. i2b2, OMOP, to support different clinical research settings. Semantic conversions between different representations are explicitly expressed using N3 rules and executed by an N3 Reasoner (EYE), which can also generate proofs of the conversion processes. The solution presented in this paper has been applied to real-world applications that process large scale EHR data.
PMID: 26515501 [PubMed - indexed for MEDLINE]
Integrating HL7 RIM and ontology for unified knowledge and data representation in clinical decision support systems.
Integrating HL7 RIM and ontology for unified knowledge and data representation in clinical decision support systems.
Comput Methods Programs Biomed. 2016 Jan;123:94-108
Authors: Zhang YF, Tian Y, Zhou TS, Araki K, Li JS
Abstract
BACKGROUND AND OBJECTIVES: The broad adoption of clinical decision support systems within clinical practice has been hampered mainly by the difficulty in expressing domain knowledge and patient data in a unified formalism. This paper presents a semantic-based approach to the unified representation of healthcare domain knowledge and patient data for practical clinical decision making applications.
METHODS: A four-phase knowledge engineering cycle is implemented to develop a semantic healthcare knowledge base based on an HL7 reference information model, including an ontology to model domain knowledge and patient data and an expression repository to encode clinical decision making rules and queries. A semantic clinical decision support system is designed to provide patient-specific healthcare recommendations based on the knowledge base and patient data.
RESULTS: The proposed solution is evaluated in the case study of type 2 diabetes mellitus inpatient management. The knowledge base is successfully instantiated with relevant domain knowledge and testing patient data. Ontology-level evaluation confirms model validity. Application-level evaluation of diagnostic accuracy reaches a sensitivity of 97.5%, a specificity of 100%, and a precision of 98%; an acceptance rate of 97.3% is given by domain experts for the recommended care plan orders.
CONCLUSIONS: The proposed solution has been successfully validated in the case study as providing clinical decision support at a high accuracy and acceptance rate. The evaluation results demonstrate the technical feasibility and application prospect of our approach.
PMID: 26474836 [PubMed - indexed for MEDLINE]
The role of ontologies in biological and biomedical research: a functional perspective.
The role of ontologies in biological and biomedical research: a functional perspective.
Brief Bioinform. 2015 Nov;16(6):1069-80
Authors: Hoehndorf R, Schofield PN, Gkoutos GV
Abstract
Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.
PMID: 25863278 [PubMed - indexed for MEDLINE]
Large-scale Cross-modality Search via Collective Matrix Factorization Hashing.
Large-scale Cross-modality Search via Collective Matrix Factorization Hashing.
IEEE Trans Image Process. 2016 Sep 8;
Authors: Ding G, Guo Y, Zhou J, Gao Y
Abstract
By transforming data into binary representation, i.e., Hashing, we can perform high-speed search with low storage cost, and thus Hashing has collected increasing research interest in the recent years. Recently, how to generate Hashcode for multimodal data (e.g., images with textual tags, documents with photos, etc) for large-scale cross-modality search (e.g., searching semantically related images in database for a document query) is an important research issue because of the fast growth of multimodal data in the Web. To address this issue, a novel framework for multimodal Hashing is proposed, termed as Collective Matrix Factorization Hashing (CMFH). The key idea of CMFH is to learn unified Hashcodes for different modalities of one multimodal instance in the shared latent semantic space in which different modalities can be effectively connected. Therefore, accurate cross-modality search is supported. Based on the general framework, we extend it in the unsupervised scenario where it tries to preserve the Euclidean structure, and in the supervised scenario where it fully exploits the label information of data. The corresponding theoretical analysis and the optimization algorithms are given. We conducted comprehensive experiments on three benchmark datasets for cross-modality search. The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.
PMID: 27623584 [PubMed - as supplied by publisher]
FoodWiki: a Mobile App Examines Side Effects of Food Additives Via Semantic Web.
FoodWiki: a Mobile App Examines Side Effects of Food Additives Via Semantic Web.
J Med Syst. 2016 Feb;40(2):41
Authors: Çelik Ertuğrul D
Abstract
In this article, a research project on mobile safe food consumption system (FoodWiki) is discussed that performs its own inferencing rules in its own knowledge base. Currently, the developed rules examines the side effects that are causing some health risks: heart disease, diabetes, allergy, and asthma as initial. There are thousands compounds added to the processed food by food producers with numerous effects on the food: to add color, stabilize, texturize, preserve, sweeten, thicken, add flavor, soften, emulsify, and so forth. Those commonly used ingredients or compounds in manufactured foods may have many side effects that cause several health risks such as heart disease, hypertension, cholesterol, asthma, diabetes, allergies, alzheimer etc. according to World Health Organization. Safety in food consumption, especially by patients in these risk groups, has become crucial, given that such health problems are ranked in the top ten health risks around the world. It is needed personal e-health knowledge base systems to help patients take control of their safe food consumption. The systems with advanced semantic knowledge base can provide recommendations of appropriate foods before consumption by individuals. The proposed FoodWiki system is using a concept based search mechanism that performs on thousands food compounds to provide more relevant information.
PMID: 26590979 [PubMed - indexed for MEDLINE]
Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets.
Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets.
BMC Genomics. 2015;16:1071
Authors: Klus P, Ponti RD, Livi CM, Tartaglia GG
Abstract
BACKGROUND: Comparison between multiple protein datasets requires the choice of an appropriate reference system and a number of variables to describe their differences. Here we introduce an innovative approach to discriminate multiple protein datasets (multiCM) and to measure enrichments in gene ontology terms (cleverGO) using semantic similarities.
RESULTS: We illustrate the powerfulness of our approach by investigating the links between RNA-binding ability and other protein features, such as structural disorder and aggregation, in S. cerevisiae, C. elegans, M. musculus and H. sapiens. Our results are in striking agreement with available experimental evidence and unravel features that are key to understand the mechanisms regulating cellular homeostasis.
CONCLUSIONS: In an intuitive way, multiCM and cleverGO provide accurate classifications of physico-chemical features and annotations of biological processes, molecular functions and cellular components, which is extremely useful for the discovery and characterization of new trends in protein datasets. The multiCM and cleverGO can be freely accessed on the Web at http://www.tartaglialab.com/cs_multi/submission and http://www.tartaglialab.com/GO_analyser/universal . Each of the pages contains links to the corresponding documentation and tutorial.
PMID: 26673865 [PubMed - indexed for MEDLINE]