Semantic Web

RDFizing the biosynthetic pathway of E.coli O-antigen to enable semantic sharing of microbiology data

Tue, 2021-11-23 06:00

BMC Microbiol. 2021 Nov 22;21(1):325. doi: 10.1186/s12866-021-02384-y.

ABSTRACT

BACKGROUND: The abundance of glycomics data that have accumulated has led to the development of many useful databases to aid in the understanding of the function of the glycans and their impact on cellular activity. At the same time, the endeavor for data sharing between glycomics databases with other biological databases have contributed to the creation of new knowledgebases. However, different data types in data description have impeded the data sharing for knowledge integration. To solve this matter, Semantic Web techniques including Resource Description Framework (RDF) and ontology development have been adopted by various groups to standardize the format for data exchange. These semantic data have contributed to the expansion of knowledgebases and hold promises of providing data that can be intelligently processed. On the other hand, bench biologists who are experts in experimental finding are end users and data producers. Therefore, it is indispensable to reduce the technical barrier required for bench biologists to manipulate their experimental data to be compatible with standard formats for data sharing.

RESULTS: There are many essential concepts and practical techniques for data integration but there is no method to enable researchers to easily apply Semantic Web techniques to their experimental data. We implemented our procedure on unformatted information of E.coli O-antigen structures collected from the web and show how this information can be expressed as formatted data applicable to Semantic Web standards. In particular, we described the E-coli O-antigen biosynthesis pathway using the BioPAX ontology developed to support data exchange between pathway databases.

CONCLUSIONS: The method we implemented to semantically describe O-antigen biosynthesis should be helpful for biologists to understand how glycan information, including relevant pathway reaction data, can be easily shared. We hope this method can contribute to lower the technical barrier that is required when experimental findings are formulated into formal representations and can lead bench scientists to readily participate in the construction of new knowledgebases that are integrated with existing ones. Such integration over the Semantic Web will enable future work in artificial intelligence and machine learning to enable computers to infer new relationships and hypotheses in the life sciences.

PMID:34809564 | DOI:10.1186/s12866-021-02384-y

Categories: Literature Watch

Knowledge Engineering Framework for IoT Robotics Applied to Smart Healthcare and Emotional Well-Being

Mon, 2021-11-22 06:00

Int J Soc Robot. 2021 Nov 16:1-28. doi: 10.1007/s12369-021-00821-6. Online ahead of print.

ABSTRACT

Social companion robots are getting more attention to assist elderly people to stay independent at home and to decrease their social isolation. When developing solutions, one remaining challenge is to design the right applications that are usable by elderly people. For this purpose, co-creation methodologies involving multiple stakeholders and a multidisciplinary researcher team (e.g., elderly people, medical professionals, and computer scientists such as roboticists or IoT engineers) are designed within the ACCRA (Agile Co-Creation of Robots for Ageing) project. This paper will address this research question: How can Internet of Robotic Things (IoRT) technology and co-creation methodologies help to design emotional-based robotic applications? This is supported by the ACCRA project that develops advanced social robots to support active and healthy ageing, co-created by various stakeholders such as ageing people and physicians. We demonstra this with three robots, Buddy, ASTRO, and RoboHon, used for daily life, mobility, and conversation. The three robots understand and convey emotions in real-time using the Internet of Things and Artificial Intelligence technologies (e.g., knowledge-based reasoning).

PMID:34804257 | PMC:PMC8594653 | DOI:10.1007/s12369-021-00821-6

Categories: Literature Watch

FAIR data for prehistoric mining archaeology

Mon, 2021-11-22 06:00

Int J Digit Libr. 2021;22(3):267-277. doi: 10.1007/s00799-020-00282-8. Epub 2020 Jan 23.

ABSTRACT

This paper presents an approach how to create FAIR data for prehistoric mining archaeology, based on the CIDOC CRM ontology and semantic web standards. The interdisciplinary Research Centre HiMAT (History of mining activities in the Tyrol and adjacent areas, University of Innsbruck) investigates mining history from prehistoric to modern times with an interdisciplinary approach. One of the projects carried out at the research centre is the multinational DACH project "Prehistoric copper production in the eastern and central Alps". For a specific geographical region of the project, the data transformation to open and re-usable data is investigated in a separate Open Research Data pilot project. The methodological approach will use the FAIR principles to make data Findable, Accessible, Interoperable and Re-usable. Every archaeological investigation in Austria has to be documented according to the requirements of the Austrian Federal Monuments Office. This documentation is deposited in the CERN-based EU supported research data repository ZENODO. For each deposited file, metadata are created through the application of the conceptual metadata schema CIDOC CRM, an ISO standard for Cultural Heritage Information, which was adopted by ARIADNE, the European Union Research Infrastructure for archaeological resources. Concepts specific to mining archaeology research are organized with the DARIAH Back Bone Thesaurus, a model for sustainable interoperable thesauri maintenance, developed in the European Union Digital Research Infrastructure for the Arts and Humanities. Metadata are created through the extraction of information from the documentation and the transformation to a knowledge graph using semantic web standards. To facilitate usage, graph data are exported to hierarchical and tabular formats representing sites and objects with their geographic locations, temporal and typological assignments and links to the research activities and documents. Metadata are deposited together with the documentation into the repository.

PMID:34803481 | PMC:PMC8591667 | DOI:10.1007/s00799-020-00282-8

Categories: Literature Watch

Internet-based language production research with overt articulation: Proof of concept, challenges, and practical advice

Sat, 2021-11-20 06:00

Behav Res Methods. 2021 Nov 19. doi: 10.3758/s13428-021-01686-3. Online ahead of print.

ABSTRACT

Language production experiments with overt articulation have thus far only scarcely been conducted online, mostly due to technical difficulties related to measuring voice onset latencies. Especially the poor audiovisual synchrony in web experiments (Bridges et al. 2020) is a challenge to time-locking stimuli and participants' spoken responses. We tested the viability of conducting language production experiments with overt articulation in online settings using the picture-word interference paradigm - a classic task in language production research. In three pre-registered experiments (N = 48 each), participants named object pictures while ignoring visually superimposed distractor words. We implemented a custom voice recording option in two different web experiment builders and recorded naming responses in audio files. From these stimulus-locked audio files, we extracted voice onset latencies offline. In a control task, participants classified the last letter of a picture name as a vowel or consonant via button-press, a task that shows comparable semantic interference effects. We expected slower responses when picture and distractor word were semantically related compared to unrelated, independently of task. This semantic interference effect is robust, but relatively small. It should therefore crucially depend on precise timing. We replicated this effect in an online setting, both for button-press and overt naming responses, providing a proof of concept that naming latency - a key dependent variable in language production research - can be reliably measured in online experiments. We discuss challenges for online language production research and suggestions of how to overcome them. The scripts for the online implementation are made available.

PMID:34799842 | DOI:10.3758/s13428-021-01686-3

Categories: Literature Watch

A Methodology for an Auto-Generated and Auto-Maintained HL7 FHIR OWL Ontology for Health Data Management

Fri, 2021-11-19 06:00

Stud Health Technol Inform. 2021 Nov 18;287:99-103. doi: 10.3233/SHTI210824.

ABSTRACT

The process of maintenance of an underlying semantic model that supports data management and addresses the interoperability challenges in the domain of telemedicine and integrated care is not a trivial task when performed manually. We present a methodology that leverages the provided serializations of the Health Level Seven International (HL7) Fast Health Interoperability Resources (FHIR) specification to generate a fully functional OWL ontology along with the semantic provisions for maintaining functionality upon future changes of the standard. The developed software makes a complete conversion of the HL7 FHIR Resources along with their properties and their semantics and restrictions. It covers all FHIR data types (primitive and complex) along with all defined resource types. It can operate to build an ontology from scratch or to update an existing ontology, providing the semantics that are needed, to preserve information described using previous versions of the standard. All the results based on the latest version of HL7 FHIR as a Web Ontology Language (OWL-DL) ontology are publicly available for reuse and extension.

PMID:34795090 | DOI:10.3233/SHTI210824

Categories: Literature Watch

Epione application: An integrated web-toolkit of clinical genomics and personalized medicine in systemic lupus erythematosus

Thu, 2021-11-18 06:00

Int J Mol Med. 2022 Jan;49(1):8. doi: 10.3892/ijmm.2021.5063. Epub 2021 Nov 18.

ABSTRACT

Genome wide association studies (GWAS) have identified autoimmune disease‑associated loci, a number of which are involved in numerous disease‑associated pathways. However, much of the underlying genetic and pathophysiological mechanisms remain to be elucidated. Systemic lupus erythematosus (SLE) is a chronic, highly heterogeneous autoimmune disease, characterized by differences in autoantibody profile, serum cytokines and a multi‑system involvement. This study presents the Epione application, an integrated bioinformatics web‑toolkit, designed to assist medical experts and researchers in more accurately diagnosing SLE. The application aims to identify the most credible gene variants and single nucleotide polymorphisms (SNPs) associated with SLE susceptibility, by using patient's genomic data to aid the medical expert in SLE diagnosis. The application contains useful knowledge of >70,000 SLE‑related publications that have been analyzed, using data mining and semantic techniques, towards extracting the SLE‑related genes and the corresponding SNPs. Probable genes associated with the patient's genomic profile are visualized with several graphs, including chromosome ideograms, statistic bars and regulatory networks through data mining studies with relative publications, to obtain a representative number of the most credible candidate genes and biological pathways associated with the SLE. Furthermore, an evaluation study was performed on a patient diagnosed with SLE and is presented herein. Epione has also been expanded in family‑related candidate patients to evaluate its predictive power. All the recognized gene variants that were previously considered to be associated with SLE were accurately identified in the output profile of the patient, and by comparing the results, novel findings have emerged. The Epione application may assist and facilitate in early stage diagnosis by using the patients' genomic profile to compare against the list of the most predictable candidate gene variants related to SLE. Its diagnosis‑oriented output presents the user with a structured set of results on variant association, position in genome and links to specific bibliography and gene network associations. The overall aim of the present study was to provide a reliable tool for the most effective study of SLE. This novel and accessible webserver tool of SLE is available at http://geneticslab.aua.gr/epione/.

PMID:34791504 | DOI:10.3892/ijmm.2021.5063

Categories: Literature Watch

PheneBank: a literature-based database of phenotypes

Wed, 2021-11-17 06:00

Bioinformatics. 2021 Nov 12:btab740. doi: 10.1093/bioinformatics/btab740. Online ahead of print.

ABSTRACT

MOTIVATION: Significant effort has been spent by curators to create coding systems for phenotypes such as the Human Phenotype Ontology (HPO), as well as disease-phenotype annotations. We aim to support the discovery of literature-based phenotypes and integrate them into the knowledge discovery process.

RESULTS: PheneBank is a Web-portal for retrieving human phenotype-disease associations that have been text-mined from the whole of Medline. Our approach exploits state-of-the-art machine learning for concept identification by utilising an expert annotated rare disease corpus from the PMC Text Mining subset. Evaluation of the system for entities is conducted on a gold-standard corpus of rare disease sentences and for associations against the Monarch initiative data.

AVAILABILITY: The PheneBank Web-portal freely available at http://www.phenebank.org. Annotated Medline data is available from Zenodo at DOI: 10.5281/zenodo.1408800. Semantic annotation software is freely available for non-commercial use at GitHub: https://github.com/pilehvar/phenebank.

SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.

PMID:34788791 | DOI:10.1093/bioinformatics/btab740

Categories: Literature Watch

The ontology of fast food facts: conceptualization of nutritional fast food data for consumers and semantic web applications

Wed, 2021-11-10 06:00

BMC Med Inform Decis Mak. 2021 Nov 9;21(Suppl 7):275. doi: 10.1186/s12911-021-01636-1.

ABSTRACT

BACKGROUND: Fast food with its abundance and availability to consumers may have health consequences due to the high calorie intake which is a major contributor to life threatening diseases. Providing nutritional information has some impact on consumer decisions to self regulate and promote healthier diets, and thus, government regulations have mandated the publishing of nutritional content to assist consumers, including for fast food. However, fast food nutritional information is fragmented, and we realize a benefit to collate nutritional data to synthesize knowledge for individuals.

METHODS: We developed the ontology of fast food facts as an opportunity to standardize knowledge of fast food and link nutritional data that could be analyzed and aggregated for the information needs of consumers and experts. The ontology is based on metadata from 21 fast food establishment nutritional resources and authored in OWL2 using Protégé.

RESULTS: Three evaluators reviewed the logical structure of the ontology through natural language translation of the axioms. While there is majority agreement (76.1% pairwise agreement) of the veracity of the ontology, we identified 103 out of the 430 statements that were erroneous. We revised the ontology and publicably published the initial release of the ontology. The ontology has 413 classes, 21 object properties, 13 data properties, and 494 logical axioms.

CONCLUSION: With the initial release of the ontology of fast food facts we discuss some future visions with the continued evolution of this knowledge base, and the challenges we plan to address, like the management and publication of voluminous amount of semantically linked fast food nutritional data.

PMID:34753474 | DOI:10.1186/s12911-021-01636-1

Categories: Literature Watch

A Virtual Community for Disability Advocacy: Development of a Searchable Artificial Intelligence-Supported Platform

Fri, 2021-11-05 06:00

JMIR Form Res. 2021 Nov 5;5(11):e33335. doi: 10.2196/33335.

ABSTRACT

BACKGROUND: The lack of availability of disability data has been identified as a major challenge hindering continuous disability equity monitoring. It is important to develop a platform that enables searching for disability data to expose systemic discrimination and social exclusion, which increase vulnerability to inequitable social conditions.

OBJECTIVE: Our project aims to create an accessible and multilingual pilot disability website that structures and integrates data about people with disabilities and provides data for national and international disability advocacy communities. The platform will be endowed with a document upload function with hybrid (automated and manual) paragraph tagging, while the querying function will involve an intelligent natural language search in the supported languages.

METHODS: We have designed and implemented a virtual community platform using Wikibase, Semantic Web, machine learning, and web programming tools to enable disability communities to upload and search for disability documents. The platform data model is based on an ontology we have designed following the United Nations Convention on the Rights of Persons with Disabilities (CRPD). The virtual community facilitates the uploading and sharing of validated information, and supports disability rights advocacy by enabling dissemination of knowledge.

RESULTS: Using health informatics and artificial intelligence techniques (namely Semantic Web, machine learning, and natural language processing techniques), we were able to develop a pilot virtual community that supports disability rights advocacy by facilitating uploading, sharing, and accessing disability data. The system consists of a website on top of a Wikibase (a Semantic Web-based datastore). The virtual community accepts 4 types of users: information producers, information consumers, validators, and administrators. The virtual community enables the uploading of documents, semiautomatic tagging of their paragraphs with meaningful keywords, and validation of the process before uploading the data to the disability Wikibase. Once uploaded, public users (information consumers) can perform a semantic search using an intelligent and multilingual search engine (QAnswer). Further enhancements of the platform are planned.

CONCLUSIONS: The platform ontology is flexible and can accommodate advocacy reports and disability policy and legislation from specific jurisdictions, which can be accessed in relation to the CRPD articles. The platform ontology can be expanded to fit international contexts. The virtual community supports information upload and search. Semiautomatic tagging and intelligent multilingual semantic search using natural language are enabled using artificial intelligence techniques, namely Semantic Web, machine learning, and natural language processing.

PMID:34738910 | DOI:10.2196/33335

Categories: Literature Watch

ContSOnto: A Formal Ontology for Continuity of Care

Thu, 2021-11-04 06:00

Stud Health Technol Inform. 2021 Oct 27;285:82-87. doi: 10.3233/SHTI210577.

ABSTRACT

The global pandemic over the past two years has reset societal agendas by identifying both strengths and weaknesses across all sectors. Focusing in particular on global health delivery, the ability of health care facilities to scale requirements and to meet service demands has detected the need for some national services and organisations to modernise their organisational processes and infrastructures. Core to requirements for modernisation is infrastructure to share information, specifically structural standardised approaches for both operational procedures and terminology services. Problems of data sharing (aka interoperability) is a main obstacle when patients are moving across healthcare facilities or travelling across border countries in cases where emergency treatment is needed. Experts in healthcare service delivery suggest that the best possible way to manage individual care is at home, using remote patient monitoring which ultimately reduces cost burden both for the citizen and service provider. Core to this practice will be advancing digitalisation of health care underpinned with safe integration and access to relevant and timely information. To tackle the data interoperability issue and provide a quality driven continuous flow of information from different health care information systems semantic terminology needs to be provided intact. In this paper we propose and present ContSonto a formal ontology for continuity of care based on ISO 13940:2015 ContSy and W3C Semantic Web Standards Language OWL (Web Ontology Language). ContSonto has several benefits including semantic interoperability, data harmonization and data linking. It can be use as a base model for data integration for different healthcare information models to generate knowledge graph to support shared care and decision making.

PMID:34734855 | DOI:10.3233/SHTI210577

Categories: Literature Watch

Associations between online food outlet access and online food delivery service use amongst adults in the UK: a cross-sectional analysis of linked data

Mon, 2021-11-01 06:00

BMC Public Health. 2021 Oct 31;21(1):1968. doi: 10.1186/s12889-021-11953-9.

ABSTRACT

BACKGROUND: Online food delivery services facilitate 'online' access to food outlets that typically sell lenergy-dense nutrient-poor food. Greater online food outlet access might be related to the use of this purchasing format and living with excess bodyweight, however, this is not known. We aimed to investigate the association between aspects of online food outlet access and online food delivery service use, and differences according to customer sociodemographic characteristics, as well as the association between the number of food outlets accessible online and bodyweight.

METHODS: In 2019, we used an automated data collection method to collect data on all food outlets in the UK registered with the leading online food delivery service Just Eat (n = 33,204). We linked this with contemporaneous data on food purchasing, bodyweight, and sociodemographic information collected through the International Food Policy Study (analytic sample n = 3067). We used adjusted binomial logistic, linear, and multinomial logistic regression models to examine associations.

RESULTS: Adults in the UK had online access to a median of 85 food outlets (IQR: 34-181) and 85 unique types of cuisine (IQR: 64-108), and 15.1% reported online food delivery service use in the previous week. Those with the greatest number of accessible food outlets (quarter four, 182-879) had 71% greater odds of online food delivery service use (OR: 1.71; 95% CI: 1.09, 2.68) compared to those with the least (quarter one, 0-34). This pattern was evident amongst adults with a university degree (OR: 2.11; 95% CI: 1.15, 3.85), adults aged between 18 and 29 years (OR: 3.27, 95% CI: 1.59, 6.72), those living with children (OR: 1.94; 95% CI: 1.01; 3.75), and females at each level of increased exposure. We found no association between the number of unique types of cuisine accessible online and online food delivery service use, or between the number of food outlets accessible online and bodyweight.

CONCLUSIONS: The number of food outlets accessible online is positively associated with online food delivery service use. Adults with the highest education, younger adults, those living with children, and females, were particularly susceptible to the greatest online food outlet access. Further research is required to investigate the possible health implications of online food delivery service use.

PMID:34719382 | PMC:PMC8557109 | DOI:10.1186/s12889-021-11953-9

Categories: Literature Watch

Complex Portal 2022: new curation frontiers

Sun, 2021-10-31 06:00

Nucleic Acids Res. 2021 Oct 29:gkab991. doi: 10.1093/nar/gkab991. Online ahead of print.

ABSTRACT

The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.

PMID:34718729 | DOI:10.1093/nar/gkab991

Categories: Literature Watch

A Digital Personal Health Library for Enabling Precision Health Promotion to Prevent Human Papilloma Virus-Associated Cancers

Fri, 2021-10-29 06:00

Front Digit Health. 2021 Jul 21;3:683161. doi: 10.3389/fdgth.2021.683161. eCollection 2021.

ABSTRACT

Human papillomavirus (HPV) causes the most prevalent sexually transmitted infection (STI) in the United States. Sexually active young adults are susceptible to HPV, accounting for approximately 50% of new STIs. Oncogenic HPV subtypes 16 and 18 are associated with squamous intraepithelial lesions and cancers and are mostly preventable through prophylactic HPV vaccination. Accordingly, this study's objectives are to (1) summarize SDoH barriers and implication for low HPV vaccination rates among young adults (18-26 years), (2) propose a digital health solution that utilizes the PHL to collect, integrate, and manage personalized sexual and health information, and (3) describe the features of the PHL-based app. Through the application of novel techniques from artificial intelligence, specifically knowledge representation, semantic web, and natural language processing, this proposed PHL-based application will compile clinical, biomedical, and SDoH data from multi-dimensional sources. Therefore, this application will provide digital health interventions that are customized to individuals' specific needs and capacities. The PHL-based application could promote management and usage of personalized digital health information to facilitate precision health promotion thereby, informing health decision-making regarding HPV vaccinations, routine HPV/STI testing, cancer screenings, vaccine safety/efficacy/side effects, and safe sexual practices. In addition to detecting vaccine hesitancy, disparities and perceived barriers, this application could address participants' specific needs/challenges with navigating health literacy, technical skills, peer influence, education, language, cultural and spiritual beliefs. Precision health promotion focused on improving knowledge acquisition and information-seeking behaviors, promoting safe sexual practices, increasing HPV vaccinations, and facilitating cancer screenings could be effective in preventing HPV-associated cancers.

PMID:34713154 | PMC:PMC8521976 | DOI:10.3389/fdgth.2021.683161

Categories: Literature Watch

DUKweb, diachronic word representations from the UK Web Archive corpus

Sat, 2021-10-16 06:00

Sci Data. 2021 Oct 15;8(1):269. doi: 10.1038/s41597-021-01047-x.

ABSTRACT

Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in '.uk'. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection.

PMID:34654827 | DOI:10.1038/s41597-021-01047-x

Categories: Literature Watch

NCATS Inxight Drugs: a comprehensive and curated portal for translational research

Thu, 2021-10-14 06:00

Nucleic Acids Res. 2021 Oct 14:gkab918. doi: 10.1093/nar/gkab918. Online ahead of print.

ABSTRACT

The United States has a complex regulatory scheme for marketing drugs. Understanding drug regulatory status is a daunting task that requires integrating data from many sources from the United States Food and Drug Administration (FDA), US government publications, and other processes related to drug development. At NCATS, we created Inxight Drugs (https://drugs.ncats.io), a web resource that attempts to address this challenge in a systematic manner. NCATS Inxight Drugs incorporates and unifies a wealth of data, including those supplied by the FDA and from independent public sources. The database offers a substantial amount of manually curated literature data unavailable from other sources. Currently, the database contains 125 036 product ingredients, including 2566 US approved drugs, 6242 marketed drugs, and 9684 investigational drugs. All substances are rigorously defined according to the ISO 11238 standard to comply with existing regulatory standards for unique drug substance identification. A special emphasis was placed on capturing manually curated and referenced data on treatment modalities and semantic relationships between substances. A supplementary resource 'Novel FDA Drug Approvals' features regulatory details of newly approved FDA drugs. The database is regularly updated using NCATS Stitcher data integration tool that automates data aggregation and supports full data access through a RESTful API.

PMID:34648031 | DOI:10.1093/nar/gkab918

Categories: Literature Watch

Enhancing reasoning through reduction of vagueness using fuzzy OWL-2 for representation of breast cancer ontologies

Wed, 2021-10-13 06:00

Neural Comput Appl. 2021 Oct 8:1-26. doi: 10.1007/s00521-021-06517-2. Online ahead of print.

ABSTRACT

The need to address the challenge of vagueness across several domains of applicability of ontology is gaining research attention. The presence of vagueness in knowledge represented with description logic impairs automating reasoning and inference making. The importance of reducing this vagueness in the formalization of medical knowledge representation is rising, considering the vulnerability of this domain to the expression of vague concepts or terms. This vagueness may be addressed from the perspective of ontology modeling language application such as ontology web language (OWL). Although several attempts have been made to tackle this problem in other disease prognoses such as diabetes and cardiovascular diseases, a similar effort is missing for breast cancer. Minimizing vagueness in breast cancer ontology is necessary to enhance automated reasoning and handle knowledge representation problems. This study proposes a framework for reducing vagueness in breast cancer ontology. The approach obtained breast cancer crisp ontology and applied fuzzy ontology elements based on the Fuzzy OWL2 model to formulate breast cancer fuzzy ontology. This was achieved by extending the elements of OWL2 (a more expressive version of OWL) with annotation properties to fuzzify the breast cancer crisp ontology. Results obtained showed a significant reduction of vagueness in the domain, yielding 0.38 for vagueness spread and 1.0 for vagueness explicitness. In addition, ontology metrics such as completeness, consistency, correctness and accuracy were also evaluated, and we obtained impressive performance. The implication of this result is the reduction of vagueness in breast cancer ontology, which provides increased computational reasoning support to applications using the ontology.

PMID:34642549 | PMC:PMC8500271 | DOI:10.1007/s00521-021-06517-2

Categories: Literature Watch

OntoRepliCov: an Ontology-Based Approach for Modeling the SARS-CoV-2 Replication Process

Mon, 2021-10-11 06:00

Procedia Comput Sci. 2021;192:487-496. doi: 10.1016/j.procs.2021.08.050. Epub 2021 Oct 1.

ABSTRACT

Understanding the replication machinery of viruses contributes to suggest and try effective antiviral strategies. Exhaustive knowledge about the proteins structure, their function, or their interaction is one of the preconditions for successfully modeling it. In this context, modeling methods based on a formal representation with a high semantic expressiveness would be relevant to extract proteins and their nucleotide or amino acid sequences as an element from the replication process. Consequently, our approach relies on the use of semantic technologies to design the SARS-CoV-2 replication machinery. This provides the ability to infer new knowledge related to each step of the virus replication. More specifically, we developed an ontology-based approach enriched with reasoning process of a complete replication machinery process for SARS-CoV-2. We present in this paper a partial overview of our ontology OntoRepliCov to describe one step of this process, namely, the continuous translation or protein synthesis, through classes, properties, axioms, and SWRL (Semantic Web Rule Language) rules.

PMID:34630741 | PMC:PMC8486259 | DOI:10.1016/j.procs.2021.08.050

Categories: Literature Watch

COVID-19 knowledge graph from semantic integration of biomedical literature and databases

Wed, 2021-10-06 06:00

Bioinformatics. 2021 Oct 6:btab694. doi: 10.1093/bioinformatics/btab694. Online ahead of print.

ABSTRACT

SUMMARY: The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis, and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator, and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download.

AVAILABILITY AND IMPLEMENTATION: The COVID-19 Knowledge Graph is publicly available under CC-BY 4.0 license at https://research.bioinformatics.udel.edu/covid19kg/.

PMID:34613368 | DOI:10.1093/bioinformatics/btab694

Categories: Literature Watch

Brain-Inspired Search Engine Assistant Based on Knowledge Graph

Tue, 2021-10-05 06:00

IEEE Trans Neural Netw Learn Syst. 2021 Oct 5;PP. doi: 10.1109/TNNLS.2021.3113026. Online ahead of print.

ABSTRACT

Search engines can quickly respond to a hyperlink list according to query keywords. However, when a query is complex, developers need to repeatedly refine search keywords and open a large number of web pages to find and summarize answers. Many research works of question and answering (Q&A) system attempt to assist search engines by providing simple, accurate, and understandable answers. However, without original semantic contexts, these answers lack explainability, making them difficult for users to trust and adopt. In this article, a brain-inspired search engine assistant named DeveloperBot based on knowledge graph is proposed, which aligns to the cognitive process of humans and has the capacity to answer complex queries with explainability. Specifically, DeveloperBot first constructs a multilayer query graph by splitting a complex multiconstraint query into several ordered constraints. Then, it models a constraint reasoning process as a subgraph search process inspired by a spreading activation model of cognitive science. In the end, novel features of the subgraph are extracted for decision-making. The corresponding reasoning subgraph and answer confidence are derived as explanations. The results of the decision-making demonstrate that DeveloperBot can estimate answers and answer confidences with high accuracy. We implement a prototype and conduct a user study to evaluate whether and how the direct answers and the explanations provided by DeveloperBot can assist developers' information needs.

PMID:34609944 | DOI:10.1109/TNNLS.2021.3113026

Categories: Literature Watch

SMAT: An attention-based deep learning solution to the automation of schema matching

Tue, 2021-10-05 06:00

Adv Databases Inf Syst. 2021 Aug;12843:260-274. doi: 10.1007/978-3-030-82472-3_19. Epub 2021 Aug 16.

ABSTRACT

Schema matching aims to identify the correspondences among attributes of database schemas. It is frequently considered as the most challenging and decisive stage existing in many contemporary web semantics and database systems. Low-quality algorithmic matchers fail to provide improvement while manually annotation consumes extensive human efforts. Further complications arise from data privacy in certain domains such as healthcare, where only schema-level matching should be used to prevent data leakage. For this problem, we propose SMAT, a new deep learning model based on state-of-the-art natural language processing techniques to obtain semantic mappings between source and target schemas using only the attribute name and description. SMAT avoids directly encoding domain knowledge about the source and target systems, which allows it to be more easily deployed across different sites. We also introduce a new benchmark dataset, OMAP, based on real-world schema-level mappings from the healthcare domain. Our extensive evaluation of various benchmark datasets demonstrates the potential of SMAT to help automate schema-level matching tasks.

PMID:34608464 | PMC:PMC8487677 | DOI:10.1007/978-3-030-82472-3_19

Categories: Literature Watch

Pages