Semantic Web
A Semantic Framework for Logical Cross-Validation, Evaluation and Impact Analyses of Population Health Interventions.
A Semantic Framework for Logical Cross-Validation, Evaluation and Impact Analyses of Population Health Interventions.
Stud Health Technol Inform. 2017;235:481-485
Authors: Shaban-Nejad A, Okhmatovskaia A, Shin EK, Davis RL, Franklin BE, Buckeridge DL
Abstract
Most chronic diseases are a result of a complex web of causative and correlated factors. As a result, effective public health or clinical interventions that intend to generate a sustainable change in these diseases most often use a combination of strategies or programs. To optimize comparative effectiveness evaluations and select the most efficient intervention(s), stakeholders (i.e. public health institutions, policy-makers and advocacy groups, practitioners, insurers, clinicians, and researchers) need access to reliable assessment methods. Building on the theory of Evidence-Based Public Health (EBPH) we introduce a knowledge-based framework for evaluating the consistency and effectiveness of public health programs, interventions, and policies. We use a semantic inference model that assists decision-makers in finding inconsistencies, identifying selection and information biases, and with identifying confounding and hidden dependencies in different public health programs and interventions. The use of formal ontologies for automatic evaluation and assessment of public health programs improves program transparency to stakeholders and decision makers, which in turn increases buy-in and acceptance of methods, connects multiple evaluation activities, and strengthens cost analysis.
PMID: 28423839 [PubMed - in process]
Combining Archetypes, Ontologies and Formalization Enables Automated Computation of Quality Indicators.
Combining Archetypes, Ontologies and Formalization Enables Automated Computation of Quality Indicators.
Stud Health Technol Inform. 2017;235:416-420
Authors: Legaz-García MDC, Dentler K, Fernández-Breis JT, Cornet R
Abstract
ArchMS is a framework that represents clinical information and knowledge using ontologies in OWL, which facilitates semantic interoperability and thereby the exploitation and secondary use of clinical data. However, it does not yet support the automated assessment of quality of care. CLIF is a stepwise method to formalize quality indicators. The method has been implemented in the CLIF tool which supports its users in generating computable queries based on a patient data model which can be based on archetypes. To enable the automated computation of quality indicators using ontologies and archetypes, we tested whether ArchMS and the CLIF tool can be integrated. We successfully automated the process of generating SPARQL queries from quality indicators that have been formalized with CLIF and integrated them into ArchMS. Hence, ontologies and archetypes can be combined for the execution of formalized quality indicators.
PMID: 28423826 [PubMed - in process]
Linked Data Applications Through Ontology Based Data Access in Clinical Research.
Linked Data Applications Through Ontology Based Data Access in Clinical Research.
Stud Health Technol Inform. 2017;235:131-135
Authors: Kock-Schoppenhauer AK, Kamann C, Ulrich H, Duhm-Harbeck P, Ingenerf J
Abstract
Clinical care and research data are widely dispersed in isolated systems based on heterogeneous data models. Biomedicine predominantly makes use of connected datasets based on the Semantic Web paradigm. Initiatives like Bio2RDF created Resource Description Framework (RDF) versions of Omics resources, enabling sophisticated Linked Data applications. In contrast, electronic healthcare records (EHR) data are generated and processed in diverse clinical subsystems within hospital information systems (HIS). Usually, each of them utilizes a relational database system with a different proprietary schema. Semantic integration and access to the data is hardly possible. This paper describes ways of using Ontology Based Data Access (OBDA) for bridging the semantic gap between existing raw data and user-oriented views supported by ontology-based queries. Based on mappings between entities of data schemas and ontologies data can be made available as materialized or virtualized RDF triples ready for querying and processing. Our experiments based on CentraXX for biobank and study management demonstrate the advantages of abstracting away from low level details and semantic mediation. Furthermore, it becomes clear that using a professional platform for Linked Data applications is recommended due to the inherent complexity, the inconvenience to confront end users with SPARQL, and scalability and performance issues.
PMID: 28423769 [PubMed - in process]
Improving data workflow systems with cloud services and use of open data for bioinformatics research.
Improving data workflow systems with cloud services and use of open data for bioinformatics research.
Brief Bioinform. 2017 Apr 16;:
Authors: Karim MR, Michel A, Zappa A, Baranov P, Sahay R, Rebholz-Schuhmann D
Abstract
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.
PMID: 28419324 [PubMed - as supplied by publisher]
Semantics derived automatically from language corpora contain human-like biases.
Semantics derived automatically from language corpora contain human-like biases.
Science. 2017 Apr 14;356(6334):183-186
Authors: Caliskan A, Bryson JJ, Narayanan A
Abstract
Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.
PMID: 28408601 [PubMed - in process]
Integrating Statistical Machine Learning in a Semantic Sensor Web for Proactive Monitoring and Control.
Integrating Statistical Machine Learning in a Semantic Sensor Web for Proactive Monitoring and Control.
Sensors (Basel). 2017 Apr 09;17(4):
Authors: Adeleke JA, Moodley D, Rens G, Adewumi AO
Abstract
Proactive monitoring and control of our natural and built environments is important in various application scenarios. Semantic Sensor Web technologies have been well researched and used for environmental monitoring applications to expose sensor data for analysis in order to provide responsive actions in situations of interest. While these applications provide quick response to situations, to minimize their unwanted effects, research efforts are still necessary to provide techniques that can anticipate the future to support proactive control, such that unwanted situations can be averted altogether. This study integrates a statistical machine learning based predictive model in a Semantic Sensor Web using stream reasoning. The approach is evaluated in an indoor air quality monitoring case study. A sliding window approach that employs the Multilayer Perceptron model to predict short term PM 2 . 5 pollution situations is integrated into the proactive monitoring and control framework. Results show that the proposed approach can effectively predict short term PM 2 . 5 pollution situations: precision of up to 0.86 and sensitivity of up to 0.85 is achieved over half hour prediction horizons, making it possible for the system to warn occupants or even to autonomously avert the predicted pollution situations within the context of Semantic Sensor Web.
PMID: 28397776 [PubMed - in process]
The anatomy of phenotype ontologies: principles, properties and applications.
The anatomy of phenotype ontologies: principles, properties and applications.
Brief Bioinform. 2017 Apr 06;:
Authors: Gkoutos GV, Schofield PN, Hoehndorf R
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
PMID: 28387809 [PubMed - as supplied by publisher]
Age and Semantic Inhibition Measured by the Hayling Task: A Meta-Analysis.
Age and Semantic Inhibition Measured by the Hayling Task: A Meta-Analysis.
Arch Clin Neuropsychol. 2017 Mar 01;32(2):198-214
Authors: Cervera-Crespo T, González-Alvarez J
Abstract
Objective: Cognitive aging is commonly associated with a decrease in executive functioning (EF). A specific component of EF, semantic inhibition, is addressed in the present study, which presents a meta-analytic review of the literature that has evaluated the performance on the Hayling Sentence Completion test in young and older groups of individuals in order to assess the magnitude of the age effect.
Method: A systematic search involving Web of Science, PsyINFO, PsychARTICLE, and MedLine databases and Google Scholar was performed. A total of 11 studies were included in this meta-analysis, encompassing a total of 887 participants; 440 young and 447 older adults. The effect sizes for group differences on four measures of the Hayling test, latency responses and error scores on the Automatic and Inhibition sections of the test were calculated using the Comprehensive Meta-Analysis software package.
Results: The results revealed large age effects for response latencies in both the Automatic (Hedges' g = 0.81) and Inhibitory conditions (Hedges' g = 0.98), though the latter two effect sizes did not differ from each other. In contrast, analysis of errors revealed a significant difference between the small effect seen in the Automatic condition (Hedges' g = 0.13) relative to the moderate effect seen in the Inhibition condition (Hedges' g = 0.55).
Conclusions: These results may be important for a better understanding of the inhibitory functioning in elderly individuals, although they should be interpreted with caution because of the limited number of studies in the literature to date.
PMID: 28365747 [PubMed - in process]
Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects.
Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects.
Database (Oxford). 2017 Jan 01;2017(1):
Authors: Güntsch A, Hyam R, Hagedorn G, Chagnoux S, Röpert D, Casino A, Droege G, Glöckler F, Gödderz K, Groom Q, Hoffmann J, Holleman A, Kempa M, Koivula H, Marhold K, Nicolson N, Smith VS, Triebel D
Abstract
With biodiversity research activities being increasingly shifted to the web, the need for a system of persistent and stable identifiers for physical collection objects becomes increasingly pressing. The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations. The system follows Linked Open Data principles and implements redirection mechanisms to human-readable and machine-readable representations of specimens facilitating seamless integration into the growing semantic web. The implementation of stable identifiers across collection organizations is supported with open source provider software scripts, best practices documentations and recommendations for RDF metadata elements facilitating harmonized access to collection information in web portals.
Database URL: : http://cetaf.org/cetaf-stable-identifiers.
PMID: 28365724 [PubMed - in process]
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework.
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework.
Front Neuroinform. 2017;11:18
Authors: Grigis A, Goyard D, Cherbonnier R, Gareau T, Papadopoulos Orfanos D, Chauvat N, Di Mascio A, Schumann G, Spooren W, Murphy D, Frouin V
Abstract
In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts.
PMID: 28360851 [PubMed - in process]
Erratum to: InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.
Erratum to: InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.
BMC Genomics. 2017 Mar 28;18(1):262
Authors: Peng J, Li H, Liu Y, Juan L, Jiang Q, Wang Y, Chen J
PMID: 28351382 [PubMed - in process]
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework.
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework.
Sci Rep. 2017 Mar 23;7(1):381
Authors: Weichenberger CX, Palermo A, Pramstaller PP, Domingues FS
Abstract
Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as "mixing strategy", which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
PMID: 28336965 [PubMed - in process]
A Novel Semantic Representation for Eligibility Criteria in Clinical Trials.
A Novel Semantic Representation for Eligibility Criteria in Clinical Trials.
J Biomed Inform. 2017 Mar 21;:
Authors: Chondrogiannis E, Andronikou V, Tagaris A, Karanastasis E, Varvarigou T, Tsuji M
Abstract
Eligibility Criteria (EC) comprise an important part of a clinical study, being determinant of its cost, duration and overall success. Their formal, computer-processable description can significantly improve clinical trial design and conduction by enabling their intelligent processing, replicability and linkability with other data. For EC representation purposes, related standards were investigated, along with published literature. Moreover, a considerable number of clinicaltrials.gov studies was analyzed in collaboration with clinical experts for the determination and classification of parameters of clinical research importance. The outcome of this process was the EC Representation; a CDISC-compliant schema for organizing criteria along with a patient-centric model for their formal expression, properly linked with international classifications and codifications. Its evaluation against 200 randomly selected EC indicated that it can adequately serve its purpose, while it can be also combined with existing tools and components developed for both EC specification and especially application to Electronic Health Records.
PMID: 28336477 [PubMed - as supplied by publisher]
On the Prediction of Flickr Image Popularity by Analyzing Heterogeneous Social Sensory Data.
On the Prediction of Flickr Image Popularity by Analyzing Heterogeneous Social Sensory Data.
Sensors (Basel). 2017 Mar 19;17(3):
Authors: Aloufi S, Zhu S, El Saddik A
Abstract
The increase in the popularity of social media has shattered the gap between the physical and virtual worlds. The content generated by people or social sensors on social media provides information about users and their living surroundings, which allows us to access a user's preferences, opinions, and interactions. This provides an opportunity for us to understand human behavior and enhance the services provided for both the real and virtual worlds. In this paper, we will focus on the popularity prediction of social images on Flickr, a popular social photo-sharing site, and promote the research on utilizing social sensory data in the context of assisting people to improve their life on the Web. Social data are different from the data collected from physical sensors; in the fact that they exhibit special characteristics that pose new challenges. In addition to their huge quantity, social data are noisy, unstructured, and heterogeneous. Moreover, they involve human semantics and contextual data that require analysis and interpretation based on human behavior. Accordingly, we address the problem of popularity prediction for an image by exploiting three main factors that are important for making an image popular. In particular, we investigate the impact of the image's visual content, where the semantic and sentiment information extracted from the image show an impact on its popularity, as well as the textual information associated with the image, which has a fundamental role in boosting the visibility of the image in the keyword search results. Additionally, we explore social context, such as an image owner's popularity and how it positively influences the image popularity. With a comprehensive study on the effect of the three aspects, we further propose to jointly consider the heterogeneous social sensory data. Experimental results obtained from real-world data demonstrate that the three factors utilized complement each other in obtaining promising results in the prediction of image popularity on social photo-sharing site.
PMID: 28335498 [PubMed - in process]
BioFed: federated query processing over life sciences linked open data.
BioFed: federated query processing over life sciences linked open data.
J Biomed Semantics. 2017 Mar 15;8(1):13
Authors: Hasnain A, Mehmood Q, Sana E Zainab S, Saleem M, Warren C, Zehra D, Decker S, Rebholz-Schuhmann D
Abstract
BACKGROUND: Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain.
METHODS: The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider).
RESULTS: BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection.
CONCLUSION: Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.
PMID: 28298238 [PubMed - in process]
A Semantic-Based Model for Triage Patients in Emergency Departments.
A Semantic-Based Model for Triage Patients in Emergency Departments.
J Med Syst. 2017 Apr;41(4):65
Authors: Wunsch G, da Costa CA, Righi RR
Abstract
Triage is a process performed in an emergency department that aims to sort patients according to their need for care. When performed speedily and correctly, this process can potentially increase the chances of survival for a patient with serious complications. This study aims to develop a computer model, called UbiTriagem, which supports the process of triage using the concepts of web semantics and ubiquitous computing focused on healthcare. For evaluating the proposal, we performed an analysis of scenario-driven triage based on previously determined ratings. In addition, we conducted a usability evaluation in emergency department with the developed prototype with two user groups: nurses and patients. The main scientific contribution is the automatic triage assessment based on the gathering of patient data on mobile devices, performed automatically through the use of a reasoning technique in an ontology. The results for all evaluations were very positive. The automatic triage assessment has been assertive in 93.3% of the cases and, after adjustments in the model, in 100% of the cases. Regarding user satisfaction, we obtained rates of 98.7% and 96% when considering perception of utility and ease of use, respectively.
PMID: 28283999 [PubMed - in process]
Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data.
Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data.
J Biomed Semantics. 2017 Mar 07;8(1):11
Authors: Knowledge Base workgroup of the Observational Health Data Sciences and Informatics (OHDSI) collaborative
Abstract
BACKGROUND: Integrating multiple sources of pharmacovigilance evidence has the potential to advance the science of safety signal detection and evaluation. In this regard, there is a need for more research on how to integrate multiple disparate evidence sources while making the evidence computable from a knowledge representation perspective (i.e., semantic enrichment). Existing frameworks suggest well-promising outcomes for such integration but employ a rather limited number of sources. In particular, none have been specifically designed to support both regulatory and clinical use cases, nor have any been designed to add new resources and use cases through an open architecture. This paper discusses the architecture and functionality of a system called Large-scale Adverse Effects Related to Treatment Evidence Standardization (LAERTES) that aims to address these shortcomings.
RESULTS: LAERTES provides a standardized, open, and scalable architecture for linking evidence sources relevant to the association of drugs with health outcomes of interest (HOIs). Standard terminologies are used to represent different entities. For example, drugs and HOIs are represented in RxNorm and Systematized Nomenclature of Medicine -- Clinical Terms respectively. At the time of this writing, six evidence sources have been loaded into the LAERTES evidence base and are accessible through prototype evidence exploration user interface and a set of Web application programming interface services. This system operates within a larger software stack provided by the Observational Health Data Sciences and Informatics clinical research framework, including the relational Common Data Model for observational patient data created by the Observational Medical Outcomes Partnership. Elements of the Linked Data paradigm facilitate the systematic and scalable integration of relevant evidence sources.
CONCLUSIONS: The prototype LAERTES system provides useful functionality while creating opportunities for further research. Future work will involve improving the method for normalizing drug and HOI concepts across the integrated sources, aggregated evidence at different levels of a hierarchy of HOI concepts, and developing more advanced user interface for drug-HOI investigations.
PMID: 28270198 [PubMed - in process]
Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description.
Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description.
AMIA Annu Symp Proc. 2016;2016:1070-1079
Authors: Sahoo SS, Valdez J, Rueschman M
Abstract
Scientific reproducibility is key to scientific progress as it allows the research community to build on validated results, protect patients from potentially harmful trial drugs derived from incorrect results, and reduce wastage of valuable resources. The National Institutes of Health (NIH) recently published a systematic guideline titled "Rigor and Reproducibility " for supporting reproducible research studies, which has also been accepted by several scientific journals. These journals will require published articles to conform to these new guidelines. Provenance metadata describes the history or origin of data and it has been long used in computer science to capture metadata information for ensuring data quality and supporting scientific reproducibility. In this paper, we describe the development of Provenance for Clinical and healthcare Research (ProvCaRe) framework together with a provenance ontology to support scientific reproducibility by formally modeling a core set of data elements representing details of research study. We extend the PROV Ontology (PROV-O), which has been recommended as the provenance representation model by World Wide Web Consortium (W3C), to represent both: (a) data provenance, and (b) process provenance. We use 124 study variables from 6 clinical research studies from the National Sleep Research Resource (NSRR) to evaluate the coverage of the provenance ontology. NSRR is the largest repository of NIH-funded sleep datasets with 50,000 studies from 36,000 participants. The provenance ontology reuses ontology concepts from existing biomedical ontologies, for example the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), to model the provenance information of research studies. The ProvCaRe framework is being developed as part of the Big Data to Knowledge (BD2K) data provenance project.
PMID: 28269904 [PubMed - in process]
GT2RDF: Semantic Representation of Genetic Testing Data.
GT2RDF: Semantic Representation of Genetic Testing Data.
AMIA Annu Symp Proc. 2016;2016:1060-1069
Authors: Paul Rupa A, Singh S, Zhu Q
Abstract
Accelerated by the Human Genome Project, genetic testing has become an increasingly integral component in diagnosis, treatment, management, and prevention of numerous diseases and conditions. More than 480 laboratories perform genetic tests for more than 4,600 rare and common medical conditions. These tests can effectively help health professionals to determine or predict the genetic conditions of their patients. However, physicians have not actively incorporated such innovative genetic technology into their clinical practices according to two national wide surveys commissioned by UnitedHealth Group. To fill the gap of insufficient use of a large number of genetic tests, we generated a single Resource Description Framework (RDF) resource, called GT2RDF (Genetic Testing data to RDF) by integrating information about disease, gene, phenotype, genetic test, and drug from multiple sources including Genetic Testing Registry (GTR), Online Mendelian Inheritance in Man (OMIM), MedGen, Human Phenotype Ontology (HPO), ClinVar, National Drug File Reference Terminology (NDF-RT). Meanwhile, we manually annotated and extracted information from 200 randomly selected GeneReviews chapters, and integrated into the GT2RDF. We performed two case studies to demonstrate the usability of the GT2RDF. GT2RDF will serve as a data foundation to support the design of a genetic testing recommendation system, called iGenetics, which will ultimately facilitate the pace of precision medicine by means of actively and effectively incorporating innovative genetic technology in clinical settings. Abbreviations: GT2RDF: Genetic Testing data to RDF; SWT: Semantic web technology; OWL: Ontology Web Language; RDF: Resource Description Framework; SPARQL: SPARQL Protocol and RDF Query Language; GTR: Genetic Testing Registry; OMIM: Online Mendelian Inheritance in Man; HPO: Human Phenotype Ontology; NDF-RT: National Drug File Reference Terminology; UMLS: Unified Medical Language System.
PMID: 28269903 [PubMed - in process]
A platform for exploration into chaining of web services for clinical data transformation and reasoning.
A platform for exploration into chaining of web services for clinical data transformation and reasoning.
AMIA Annu Symp Proc. 2016;2016:854-863
Authors: Maldonado JA, Marcos M, Fernández-Breis JT, Parcero E, Boscá D, Legaz-García MD, Martínez-Salvador B, Robles M
Abstract
The heterogeneity of clinical data is a key problem in the sharing and reuse of Electronic Health Record (EHR) data. We approach this problem through the combined use of EHR standards and semantic web technologies, concretely by means of clinical data transformation applications that convert EHR data in proprietary format, first into clinical information models based on archetypes, and then into RDF/OWL extracts which can be used for automated reasoning. In this paper we describe a proof-of-concept platform to facilitate the (re)configuration of such clinical data transformation applications. The platform is built upon a number of web services dealing with transformations at different levels (such as normalization or abstraction), and relies on a collection of reusable mappings designed to solve specific transformation steps in a particular clinical domain. The platform has been used in the development of two different data transformation applications in the area of colorectal cancer.
PMID: 28269882 [PubMed - in process]