Semantic Web
Automated Characterization of Mobile Health Apps' Features by Extracting Information From the Web: An Exploratory Study.
Automated Characterization of Mobile Health Apps' Features by Extracting Information From the Web: An Exploratory Study.
Am J Audiol. 2018 Nov 19;27(3S):482-492
Authors: Paglialonga A, Schiavo M, Caiani EG
Abstract
Purpose: The aim of this study was to test the viability of a novel method for automated characterization of mobile health apps.
Method: In this exploratory study, we developed the basic modules of an automated method, based on text analytics, able to characterize the apps' medical specialties by extracting information from the web. We analyzed apps in the Medical and Health & Fitness categories on the U.S. iTunes store.
Results: We automatically crawled 42,007 Medical and 79,557 Health & Fitness apps' webpages. After removing duplicates and non-English apps, the database included 80,490 apps. We tested the accuracy of the automated method on a subset of 400 apps. We observed 91% accuracy for the identification of apps related to health or medicine, 95% accuracy for sensory systems apps, and an average of 82% accuracy for classification into medical specialties.
Conclusions: These preliminary results suggested the viability of automated characterization of apps based on text analytics and highlighted directions for improvement in terms of classification rules and vocabularies, analysis of semantic types, and extraction of key features (promoters, services, and users). The availability of automated tools for app characterization is important as it may support health care professionals in informed, aware selection of health apps to recommend to their patients.
PMID: 30452752 [PubMed - in process]
Neo4j graph database realizes efficient storage performance of oilfield ontology.
Neo4j graph database realizes efficient storage performance of oilfield ontology.
PLoS One. 2018;13(11):e0207595
Authors: Gong F, Ma Y, Gong W, Li X, Li C, Yuan X
Abstract
The integration of oilfield multidisciplinary ontology is increasingly important for the growth of the Semantic Web. However, current methods encounter performance bottlenecks either in storing data and searching for information when processing large amounts of data. To overcome these challenges, we propose a domain-ontology process based on the Neo4j graph database. In this paper, we focus on data storage and information retrieval of oilfield ontology. We have designed mapping rules from ontology files to regulate the Neo4j database, which can greatly reduce the required storage space. A two-tier index architecture, including object and triad indexing, is used to keep loading times low and match with different patterns for accurate retrieval. Therefore, we propose a retrieval method based on this architecture. Based on our evaluation, the retrieval method can save 13.04% of the storage space and improve retrieval efficiency by more than 30 times compared with the methods of relational databases.
PMID: 30444913 [PubMed - in process]
Association of Monoclonal Gammopathy with Progression to ESKD among US Veterans.
Association of Monoclonal Gammopathy with Progression to ESKD among US Veterans.
Clin J Am Soc Nephrol. 2018 Nov 15;:
Authors: Burwick N, Adams SV, Todd-Stenberg JA, Burrows NR, Pavkov ME, O'Hare AM
Abstract
BACKGROUND AND OBJECTIVES: Whether patients with monoclonal protein are at a higher risk for progression of kidney disease is not known. The goal of this study was to measure the association of monoclonal protein with progression to ESKD.
DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: This was a retrospective cohort study of 2,156,317 patients who underwent serum creatinine testing between October 1, 2000 and September 30, 2001 at a Department of Veterans Affairs medical center, among whom 21,898 had paraprotein testing within 1 year before or after cohort entry. Progression to ESKD was measured using linked data from the US Renal Data System.
RESULTS: Overall, 1,741,707 cohort members had an eGFR≥60 ml/min per 1.73 m2, 283,988 had an eGFR of 45-59 ml/min per 1.73 m2, 103,123 had an eGFR of 30-44 ml/min per 1.73 m2 and 27,499 had an eGFR of 15-29 ml/min per 1.73 m2. The crude incidence of ESKD ranged from 0.7 to 80 per 1000 person-years from the highest to lowest eGFR category. Patients with low versus preserved eGFR were more likely to be tested for monoclonal protein but no more likely to have a positive test result. In adjusted analyses, a positive versus negative test result was associated with a higher risk of ESKD among patients with an eGFR≥60 ml/min per 1.73 m2 (hazard ratio, 1.67; 95% confidence interval, 1.22 to 2.29) and those with an eGFR of 15-29 ml/min per 1.73 m2 (hazard ratio, 1.38; 95% confidence interval, 1.07 to 1.77), but not among those with an eGFR of 30-59 ml/min per 1.73 m2 . Progression to ESKD was attributed to a monoclonal process in 21 out of 76 versus seven out of 174 patients with monoclonal protein and preserved versus severely reduced eGFR at cohort entry.
CONCLUSIONS: The detection of monoclonal protein provides little information on ESKD risk for most patients with a low eGFR. Further study is required to better understand factors contributing to a positive association of monoclonal protein with ESKD risk in patients with preserved and severely reduced levels of eGFR.
PMID: 30442867 [PubMed - as supplied by publisher]
A Genetic Circuit Compiler: Generating Combinatorial Genetic Circuits with Web Semantics and Inference.
A Genetic Circuit Compiler: Generating Combinatorial Genetic Circuits with Web Semantics and Inference.
ACS Synth Biol. 2018 Nov 08;:
Authors: Waites W, Misirli G, Cavaliere M, Danos V, Wipat A
Abstract
A central strategy of synthetic biology is to understand the basic processes of living creatures through engineering organisms using the same building blocks. Biological machines described in terms of parts can be studied by computer simulation in any of several languages or robotically assembled in vitro. In this paper we present a language, the Genetic Circuit Description Language (GCDL) and a compiler, the Genetic Circuit Compiler (GCC). This language describes genetic circuits at a level of granularity appropriate both for automated assembly in the laboratory and deriving simulation code. The GCDL follows Semantic Web practice and the compiler makes novel use of the logical inference facilities that are therefore available. We present the GCDL and compiler structure as a study of a tool for generating κ-language simulations from semantic descriptions of genetic circuits.
PMID: 30408409 [PubMed - as supplied by publisher]
SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.
SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.
BMC Bioinformatics. 2018 Nov 06;19(1):405
Authors: Tchechmedjiev A, Abdaoui A, Emonet V, Zevio S, Jonquet C
Abstract
BACKGROUND: Despite a wide adoption of English in science, a significant amount of biomedical data are produced in other languages, such as French. Yet a majority of natural language processing or semantic tools as well as domain terminologies or ontologies are only available in English, and cannot be readily applied to other languages, due to fundamental linguistic differences. However, semantic resources are required to design semantic indexes and transform biomedical (text)data into knowledge for better information mining and retrieval.
RESULTS: We present the SIFR Annotator ( http://bioportal.lirmm.fr/annotator ), a publicly accessible ontology-based annotation web service to process biomedical text data in French. The service, developed during the Semantic Indexing of French Biomedical Data Resources (2013-2019) project is included in the SIFR BioPortal, an open platform to host French biomedical ontologies and terminologies based on the technology developed by the US National Center for Biomedical Ontology. The portal facilitates use and fostering of ontologies by offering a set of services -search, mappings, metadata, versioning, visualization, recommendation- including for annotation purposes. We introduce the adaptations and improvements made in applying the technology to French as well as a number of language independent additional features -implemented by means of a proxy architecture- in particular annotation scoring and clinical context detection. We evaluate the performance of the SIFR Annotator on different biomedical data, using available French corpora -Quaero (titles from French MEDLINE abstracts and EMEA drug labels) and CépiDC (ICD-10 coding of death certificates)- and discuss our results with respect to the CLEF eHealth information extraction tasks.
CONCLUSIONS: We show the web service performs comparably to other knowledge-based annotation approaches in recognizing entities in biomedical text and reach state-of-the-art levels in clinical context detection (negation, experiencer, temporality). Additionally, the SIFR Annotator is the first openly web accessible tool to annotate and contextualize French biomedical text with ontology concepts leveraging a dictionary currently made of 28 terminologies and ontologies and 333 K concepts. The code is openly available, and we also provide a Docker packaging for easy local deployment to process sensitive (e.g., clinical) data in-house ( https://github.com/sifrproject ).
PMID: 30400805 [PubMed - in process]
Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists.
Analysis of Errors in Dictated Clinical Documents Assisted by Speech Recognition Software and Professional Transcriptionists.
JAMA Netw Open. 2018 Jul;1(3):
Authors: Zhou L, Blackley SV, Kowalski L, Doan R, Acker WW, Landman AB, Kontrient E, Mack D, Meteer M, Bates DW, Goss FR
Abstract
IMPORTANCE: Accurate clinical documentation is critical to health care quality and safety. Dictation services supported by speech recognition (SR) technology and professional medical transcriptionists are widely used by US clinicians. However, the quality of SR-assisted documentation has not been thoroughly studied.
OBJECTIVE: To identify and analyze errors at each stage of the SR-assisted dictation process.
DESIGN SETTING AND PARTICIPANTS: This cross-sectional study collected a stratified random sample of 217 notes (83 office notes, 75 discharge summaries, and 59 operative notes) dictated by 144 physicians between January 1 and December 31, 2016, at 2 health care organizations using Dragon Medical 360 | eScription (Nuance). Errors were annotated in the SR engine-generated document (SR), the medical transcriptionist-edited document (MT), and the physician's signed note (SN). Each document was compared with a criterion standard created from the original audio recordings and medical record review.
MAIN OUTCOMES AND MEASURES: Error rate; mean errors per document; error frequency by general type (eg, deletion), semantic type (eg, medication), and clinical significance; and variations by physician characteristics, note type, and institution.
RESULTS: Among the 217 notes, there were 144 unique dictating physicians: 44 female (30.6%) and 10 unknown sex (6.9%). Mean (SD) physician age was 52 (12.5) years (median [range] age, 54 [28-80] years). Among 121 physicians for whom specialty information was available (84.0%), 35 specialties were represented, including 45 surgeons (37.2%), 30 internists (24.8%), and 46 others (38.0%). The error rate in SR notes was 7.4% (ie, 7.4 errors per 100 words). It decreased to 0.4% after transcriptionist review and 0.3% in SNs. Overall, 96.3% of SR notes, 58.1% of MT notes, and 42.4% of SNs contained errors. Deletions were most common (34.7%), then insertions (27.0%). Among errors at the SR, MT, and SN stages, 15.8%, 26.9%, and 25.9%, respectively, involved clinical information, and 5.7%, 8.9%, and 6.4%, respectively, were clinically significant. Discharge summaries had higher mean SR error rates than other types (8.9% vs 6.6%; difference, 2.3%; 95% CI, 1.0%-3.6%; P < .001). Surgeons' SR notes had lower mean error rates than other physicians' (6.0% vs 8.1%; difference, 2.2%; 95% CI, 0.8%-3.5%; P = .002). One institution had a higher mean SR error rate (7.6% vs 6.6%; difference, 1.0%; 95% CI, -0.2% to 2.8%; P = .10) but lower mean MT and SN error rates (0.3% vs 0.7%; difference, -0.3%; 95% CI, -0.63% to -0.04%; P = .03 and 0.2% vs 0.6%; difference, -0.4%; 95% CI, -0.7% to -0.2%; P = .003).
CONCLUSIONS AND RELEVANCE: Seven in 100 words in SR-generated documents contain errors; many errors involve clinical information. That most errors are corrected before notes are signed demonstrates the importance of manual review, quality assurance, and auditing.
PMID: 30370424 [PubMed]
The Semantic Student: Using Knowledge Modeling Activities to Enhance Enquiry-Based Group Learning in Engineering Education.
The Semantic Student: Using Knowledge Modeling Activities to Enhance Enquiry-Based Group Learning in Engineering Education.
Stud Health Technol Inform. 2018;256:431-443
Authors: Stacey P
Abstract
This paper argues that training engineering students in basic knowledge modeling techniques, using linked data principles, and semantic Web tools - within an enquiry-based group learning environment - enables them to enhance their domain knowledge, and their meta-cognitive skills. Knowledge modeling skills are in keeping with the principles of Universal Design for instruction. Learners are empowered with the regulation of cognition as they become more aware of their own development. This semantic student approach was trialed with a group of 3rd year Computer Engineering Students taking a module on computer architecture. An enquiry-based group learning activity was developed to help learners meet selected module learning outcomes. Learners were required to use semantic feature analysis and linked data principles to create a visual model of their knowledge structure. Results show that overall student attainment was increased when knowledge modeling activities were included as part of the learning process. A recommendation for practice to incorporate knowledge modeling as a learning strategy within an overall engineering curriculum framework is described. This can be achieved using semantic Web technologies such as semantic wikis and linked data tools.
PMID: 30371401 [PubMed - in process]
An online tool for measuring and visualizing phenotype similarities using HPO.
An online tool for measuring and visualizing phenotype similarities using HPO.
BMC Genomics. 2018 Aug 13;19(Suppl 6):571
Authors: Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y
Abstract
BACKGROUND: The Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs.
RESULTS: We present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network.
CONCLUSIONS: PhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.
PMID: 30367579 [PubMed - in process]
An Ontology-Driven Approach for Integrating Intelligence to Manage Human and Ecological Health Risks in the Geospatial Sensor Web.
An Ontology-Driven Approach for Integrating Intelligence to Manage Human and Ecological Health Risks in the Geospatial Sensor Web.
Sensors (Basel). 2018 Oct 25;18(11):
Authors: Meng X, Wang F, Xie Y, Song G, Ma S, Hu S, Bai J, Yang Y
Abstract
Due to the rapid installation of a massive number of fixed and mobile sensors, monitoring machines are intentionally or unintentionally involved in the production of a large amount of geospatial data. Environmental sensors and related software applications are rapidly altering human lifestyles and even impacting ecological and human health. However, there are rarely specific geospatial sensor web (GSW) applications for certain ecological public health questions. In this paper, we propose an ontology-driven approach for integrating intelligence to manage human and ecological health risks in the GSW. We design a Human and Ecological health Risks Ontology (HERO) based on a semantic sensor network ontology template. We also illustrate a web-based prototype, the Human and Ecological Health Risk Management System (HaEHMS), which helps health experts and decision makers to estimate human and ecological health risks. We demonstrate this intelligent system through a case study of automatic prediction of air quality and related health risk.
PMID: 30366399 [PubMed - in process]
Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain.
Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain.
Comput Methods Programs Biomed. 2018 Oct;165:117-128
Authors: Alobaidi M, Malik KM, Hussain M
Abstract
OBJECTIVE AND BACKGROUND: The exponential growth of the unstructured data available in biomedical literature, and Electronic Health Record (EHR), requires powerful novel technologies and architectures to unlock the information hidden in the unstructured data. The success of smart healthcare applications such as clinical decision support systems, disease diagnosis systems, and healthcare management systems depends on knowledge that is understandable by machines to interpret and infer new knowledge from it. In this regard, ontological data models are expected to play a vital role to organize, integrate, and make informative inferences with the knowledge implicit in that unstructured data and represent the resultant knowledge in a form that machines can understand. However, constructing such models is challenging because they demand intensive labor, domain experts, and ontology engineers. Such requirements impose a limit on the scale or scope of ontological data models. We present a framework that will allow mitigating the time-intensity to build ontologies and achieve machine interoperability.
METHODS: Empowered by linked biomedical ontologies, our proposed novel Automated Ontology Generation Framework consists of five major modules: a) Text Processing using compute on demand approach. b) Medical Semantic Annotation using N-Gram, ontology linking and classification algorithms, c) Relation Extraction using graph method and Syntactic Patterns, d), Semantic Enrichment using RDF mining, e) Domain Inference Engine to build the formal ontology.
RESULTS: Quantitative evaluations show 84.78% recall, 53.35% precision, and 67.70% F-measure in terms of disease-drug concepts identification; 85.51% recall, 69.61% precision, and F-measure 76.74% with respect to taxonomic relation extraction; and 77.20% recall, 40.10% precision, and F-measure 52.78% with respect to biomedical non-taxonomic relation extraction.
CONCLUSION: We present an automated ontology generation framework that is empowered by Linked Biomedical Ontologies. This framework integrates various natural language processing, semantic enrichment, syntactic pattern, and graph algorithm based techniques. Moreover, it shows that using Linked Biomedical Ontologies enables a promising solution to the problem of automating the process of disease-drug ontology generation.
PMID: 30337066 [PubMed - in process]
The Future of Computational Chemogenomics.
The Future of Computational Chemogenomics.
Methods Mol Biol. 2018;1825:425-450
Authors: Jacoby E, Brown JB
Abstract
Following the elucidation of the human genome, chemogenomics emerged in the beginning of the twenty-first century as an interdisciplinary research field with the aim to accelerate target and drug discovery by making best usage of the genomic data and the data linkable to it. What started as a systematization approach within protein target families now encompasses all types of chemical compounds and gene products. A key objective of chemogenomics is the establishment, extension, analysis, and prediction of a comprehensive SAR matrix which by application will enable further systematization in drug discovery. Herein we outline future perspectives of chemogenomics including the extension to new molecular modalities, or the potential extension beyond the pharma to the agro and nutrition sectors, and the importance for environmental protection. The focus is on computational sciences with potential applications for compound library design, virtual screening, hit assessment, analysis of phenotypic screens, lead finding and optimization, and systems biology-based prediction of toxicology and translational research.
PMID: 30334216 [PubMed - in process]
A Rule-Based Reasoner for Underwater Robots Using OWL and SWRL.
A Rule-Based Reasoner for Underwater Robots Using OWL and SWRL.
Sensors (Basel). 2018 Oct 16;18(10):
Authors: Zhai Z, Martínez Ortega JF, Lucas Martínez N, Castillejo P
Abstract
Web Ontology Language (OWL) is designed to represent varied knowledge about things and the relationships of things. It is widely used to express complex models and address information heterogeneity of specific domains, such as underwater environments and robots. With the help of OWL, heterogeneous underwater robots are able to cooperate with each other by exchanging information with the same meaning and robot operators can organize the coordination easier. However, OWL has expressivity limitations on representing general rules, especially the statement "If … Then … Else …". Fortunately, the Semantic Web Rule Language (SWRL) has strong rule representation capabilities. In this paper, we propose a rule-based reasoner for inferring and providing query services based on OWL and SWRL. SWRL rules are directly inserted into the ontologies by several steps of model transformations instead of using a specific editor. In the verification experiments, the SWRL rules were successfully and efficiently inserted into the OWL-based ontologies, obtaining completely correct query results. This rule-based reasoner is a promising approach to increase the inference capability of ontology-based models and it achieves significant contributions when semantic queries are done.
PMID: 30332798 [PubMed]
Thalia: Semantic search engine for biomedical abstracts.
Thalia: Semantic search engine for biomedical abstracts.
Bioinformatics. 2018 Oct 17;:
Authors: Soto AJ, Przybyla P, Ananiadou S
Abstract
Summary: While publication rate of the biomedical literature has been growing steadily during the last decades, the accessibility of pertinent research publications for biologist and medical practitioners remains a challenge. This paper describes Thalia, which is a semantic search engine that can recognize eight different types of concepts occurring in biomedical abstracts. Thalia is available via a web-based interface or a RESTful API. A key aspect of our search engine is that it is updated from PubMed on a daily basis. We describe here the main building blocks of our tool as well as an evaluation of the retrieval capabilities of Thalia in the context of a precision medicine dataset.
Availability: Thalia is available at http://nactem.ac.uk/Thalia_BI/.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMID: 30329013 [PubMed - as supplied by publisher]
Web-Based Information Infrastructure Increases the Interrater Reliability of Medical Coders: Quasi-Experimental Study.
Web-Based Information Infrastructure Increases the Interrater Reliability of Medical Coders: Quasi-Experimental Study.
J Med Internet Res. 2018 Oct 15;20(10):e274
Authors: Varghese J, Sandmann S, Dugas M
Abstract
BACKGROUND: Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and Natural Language Processing tools. However, the abundance of ambiguous codes leads to low rates of uniform coding among different coders.
OBJECTIVE: The objective of our study was to measure uniform coding among different medical experts in terms of interrater reliability and analyze the effect on interrater reliability using an expert- and Web-based code suggestion system.
METHODS: We conducted a quasi-experimental study in which 6 medical experts coded 602 medical items from structured quality assurance forms or free-text eligibility criteria of 20 different clinical trials. The medical item content was selected on the basis of mortality-leading diseases according to World Health Organization data. The intervention comprised using a semiautomatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of >300,000 medical form items with expert-assigned semantic codes. Krippendorff alpha (Kalpha) with bootstrap analysis was used for the interrater reliability analysis, and coding times were measured before and after the intervention.
RESULTS: The intervention improved interrater reliability in structured quality assurance form items (from Kalpha=0.50, 95% CI 0.43-0.57 to Kalpha=0.62 95% CI 0.55-0.69) and free-text eligibility criteria (from Kalpha=0.19, 95% CI 0.14-0.24 to Kalpha=0.43, 95% CI 0.37-0.50) while preserving or slightly reducing the mean coding time per item for all 6 coders. Regardless of the intervention, precoordination and structured items were associated with significantly high interrater reliability, but the proportion of items that were precoordinated significantly increased after intervention (eligibility criteria: OR 4.92, 95% CI 2.78-8.72; quality assurance: OR 1.96, 95% CI 1.19-3.25).
CONCLUSIONS: The Web-based code suggestion mechanism improved interrater reliability toward moderate or even substantial intercoder agreement. Precoordination and the use of structured versus free-text data elements are key drivers of higher interrater reliability.
PMID: 30322834 [PubMed - in process]
Predictors of long-term care among nonagenarians: the Vitality 90 + Study with linked data of the care registers.
Predictors of long-term care among nonagenarians: the Vitality 90 + Study with linked data of the care registers.
Aging Clin Exp Res. 2018 Aug;30(8):913-919
Authors: Kauppi M, Raitanen J, Stenholm S, Aaltonen M, Enroth L, Jylhä M
Abstract
BACKGROUND: The need for long-term care services increases with age. However, little is known about the predictors of long-term care (LTC) entry among the oldest old.
AIMS: Aim of this study was to assess predictors of LTC entry in a sample of men and women aged 90 years and older.
METHODS: This study was based on the Vitality 90 + Study, a population-based study of nonagenarians in the city of Tampere, Finland. Baseline information about health, functioning and living conditions were collected by mailed questionnaires. Information about LTC was drawn from care registers during the follow-up period extending up to 11 years. Cox regression models were used for the analyses, taking into account the competing risk of mortality.
RESULTS: During the mean follow-up period of 2.3 years, 844 (43%) subjects entered first time into LTC. Female gender (HR 1.39, 95% CI 1.14-1.69), having at least two chronic conditions (HR 1.24, 95% CI 1.07-1.44), living alone (HR 1.37, 95% CI 1.15-1.63) and help received sometimes (HR 1.23, 95% CI 1.02-1.49) or daily (HR 1.68, 95% CI 1.38-2.04) were independent predictors of LTC entry.
CONCLUSION: Risk of entering into LTC was increased among women, subjects with at least two chronic conditions, those living alone and with higher level of received help. Since number of nonagenarians will increase and the need of care thereby, it is essential to understand predictors of LTC entry to offer appropriate care for the oldest old in future.
PMID: 29222731 [PubMed - indexed for MEDLINE]
GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms.
GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms.
Sci Rep. 2018 Oct 10;8(1):15107
Authors: Zhao C, Wang Z
Abstract
Measuring the semantic similarity between Gene Ontology (GO) terms is an essential step in functional bioinformatics research. We implemented a software named GOGO for calculating the semantic similarity between GO terms. GOGO has the advantages of both information-content-based and hybrid methods, such as Resnik's and Wang's methods. Moreover, GOGO is relatively fast and does not need to calculate information content (IC) from a large gene annotation corpus but still has the advantage of using IC. This is achieved by considering the number of children nodes in the GO directed acyclic graphs when calculating the semantic contribution of an ancestor node giving to its descendent nodes. GOGO can calculate functional similarities between genes and then cluster genes based on their functional similarities. Evaluations performed on multiple pathways retrieved from the saccharomyces genome database (SGD) show that GOGO can accurately and robustly cluster genes based on functional similarities. We release GOGO as a web server and also as a stand-alone tool, which allows convenient execution of the tool for a small number of GO terms or integration of the tool into bioinformatics pipelines for large-scale calculations. GOGO can be freely accessed or downloaded from http://dna.cs.miami.edu/GOGO/ .
PMID: 30305653 [PubMed - in process]
A survey of ontology learning techniques and applications.
A survey of ontology learning techniques and applications.
Database (Oxford). 2018 Jan 01;2018:
Authors: Asim MN, Wasim M, Khan MUG, Mahmood W, Abbasi HM
Abstract
Ontologies have gained a lot of popularity and recognition in the semantic web because of their extensive use in Internet-based applications. Ontologies are often considered a fine source of semantics and interoperability in all artificially smart systems. Exponential increase in unstructured data on the web has made automated acquisition of ontology from unstructured text a most prominent research area. Several methodologies exploiting numerous techniques of various fields (machine learning, text mining, knowledge representation and reasoning, information retrieval and natural language processing) are being proposed to bring some level of automation in the process of ontology acquisition from unstructured text. This paper describes the process of ontology learning and further classification of ontology learning techniques into three classes (linguistics, statistical and logical) and discusses many algorithms under each category. This paper also explores ontology evaluation techniques by highlighting their pros and cons. Moreover, it describes the scope and use of ontology learning in several industries. Finally, the paper discusses challenges of ontology learning along with their corresponding future directions.
PMID: 30295720 [PubMed - in process]
Using Semantic Web Technologies to Enable Cancer Genomics Discovery at Petabyte Scale.
Using Semantic Web Technologies to Enable Cancer Genomics Discovery at Petabyte Scale.
Cancer Inform. 2018;17:1176935118774787
Authors: Cejovic J, Radenkovic J, Mladenovic V, Stanojevic A, Miletic M, Radanovic S, Bajcic D, Djordjevic D, Jelic F, Nesic M, Lau J, Grady P, Groves-Kirkby N, Kural D, Davis-Dusenbery B
Abstract
Increased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed. In response, we have developed a Semantic Web-based Data Browser, a tool allowing users to visually build and execute ontology-driven queries. This approach simplifies access to available data and improves the process of using them in analyses on the Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org). The Data Browser makes large data sets easily explorable and simplifies the retrieval of specific data of interest. Although initially implemented on top of The Cancer Genome Atlas (TCGA) data set, the Data Browser's architecture allows for seamless integration of other data sets. By deploying it on the CGC, we have enabled remote researchers to access data and perform collaborative investigations.
PMID: 30283230 [PubMed]
Cognitive Approaches for Medicine in Cloud Computing.
Cognitive Approaches for Medicine in Cloud Computing.
J Med Syst. 2018 Mar 03;42(4):70
Authors: Ogiela U, Takizawa M, Ogiela L
Abstract
This paper will present the application potential of the cognitive approach to data interpretation, with special reference to medical areas. The possibilities of using the meaning approach to data description and analysis will be proposed for data analysis tasks in Cloud Computing. The methods of cognitive data management in Cloud Computing are aimed to support the processes of protecting data against unauthorised takeover and they serve to enhance the data management processes. The accomplishment of the proposed tasks will be the definition of algorithms for the execution of meaning data interpretation processes in safe Cloud Computing.
HIGHLIGHTS: • We proposed a cognitive methods for data description. • Proposed a techniques for secure data in Cloud Computing. • Application of cognitive approaches for medicine was described.
PMID: 29502320 [PubMed - indexed for MEDLINE]
Supporting biomedical ontology evolution by identifying outdated concepts and the required type of change.
Supporting biomedical ontology evolution by identifying outdated concepts and the required type of change.
J Biomed Inform. 2018 Sep 08;:
Authors: Cardoso SD, Pruski C, Silveira MD
Abstract
The consistent evolution of ontologies is a major challenge for systems using semantically enriched data, for example, for annotating, indexing, or reasoning. The biomedical domain is a typical example where ontologies, expressed with different formalisms, have been used for a long time and whose dynamic nature requires the regular revision of underlying systems. However, the automatic identification of outdated concepts and proposition of revision actions to update them are still open research questions. Solutions to these problems are of great interest to organizations that manage huge and dynamic ontologies. In this paper, we present an approach for i) identifying the concepts of an ontology that require revision and ii) suggesting the type of revision. Our analysis is based on three aspects: structural information encoded in the ontology, relational information gained from external source of knowledge (i.e., PubMed and UMLS) and temporal information derived from the history of the ontology. Our approach aims to evaluate different methods and parameters used by supervised learning classifiers to identify both the set of concepts that need revision, and the type of revision. We applied our approach to four well-known biomedical ontologies / terminologies (ICD-9-CM, MeSH, NCIt and SNOMED CT) and compared our results to similar approaches. Our model shows accuracy ranging from 68% (for SNOMED CT) to 91% (for MeSH), and an average of 71% when considering all datasets together.
PMID: 30205172 [PubMed - as supplied by publisher]