Semantic Web
Effect on life expectancy of temporal sequence in a multimorbidity cluster of psychosis, diabetes, and congestive heart failure among 1·7 million individuals in Wales with 20-year follow-up: a retrospective cohort study using linked data
Lancet Public Health. 2023 Jul;8(7):e535-e545. doi: 10.1016/S2468-2667(23)00098-1.
ABSTRACT
BACKGROUND: To inform targeted public health strategies, it is crucial to understand how coexisting diseases develop over time and their associated impacts on patient outcomes and health-care resources. This study aimed to examine how psychosis, diabetes, and congestive heart failure, in a cluster of physical-mental health multimorbidity, develop and coexist over time, and to assess the associated effects of different temporal sequences of these diseases on life expectancy in Wales.
METHODS: In this retrospective cohort study, we used population-scale, individual-level, anonymised, linked, demographic, administrative, and electronic health record data from the Wales Multimorbidity e-Cohort. We included data on all individuals aged 25 years and older who were living in Wales on Jan 1, 2000 (the start of follow-up), with follow-up continuing until Dec 31, 2019, first break in Welsh residency, or death. Multistate models were applied to these data to model trajectories of disease in multimorbidity and their associated effect on all-cause mortality, accounting for competing risks. Life expectancy was calculated as the restricted mean survival time (bound by the maximum follow-up of 20 years) for each of the transitions from the health states to death. Cox regression models were used to estimate baseline hazards for transitions between health states, adjusted for sex, age, and area-level deprivation (Welsh Index of Multiple Deprivation [WIMD] quintile).
FINDINGS: Our analyses included data for 1 675 585 individuals (811 393 [48·4%] men and 864 192 [51·6%] women) with a median age of 51·0 years (IQR 37·0-65·0) at cohort entry. The order of disease acquisition in cases of multimorbidity had an important and complex association with patient life expectancy. Individuals who developed diabetes, psychosis, and congestive heart failure, in that order (DPC), had reduced life expectancy compared with people who developed the same three conditions in a different order: for a 50-year-old man in the third quintile of the WIMD (on which we based our main analyses to allow comparability), DPC was associated with a loss in life expectancy of 13·23 years (SD 0·80) compared with the general otherwise healthy or otherwise diseased population. Congestive heart failure as a single condition was associated with mean a loss in life expectancy of 12·38 years (0·00), and with a loss of 12·95 years (0·06) when preceded by psychosis and 13·45 years (0·13) when followed by psychosis. Findings were robust in people of older ages, more deprived populations, and women, except that the trajectory of psychosis, congestive heart failure, and diabetes was associated with higher mortality in women than men. Within 5 years of an initial diagnosis of diabetes, the risk of developing psychosis or congestive heart failure, or both, was increased.
INTERPRETATION: The order in which individuals develop psychosis, diabetes, and congestive heart failure as combinations of conditions can substantially affect life expectancy. Multistate models offer a flexible framework to assess temporal sequences of diseases and allow identification of periods of increased risk of developing subsequent conditions and death.
FUNDING: Health Data Research UK.
PMID:37393092 | DOI:10.1016/S2468-2667(23)00098-1
FAIR-Checker: supporting digital resource findability and reuse with Knowledge Graphs and Semantic Web standards
J Biomed Semantics. 2023 Jul 1;14(1):7. doi: 10.1186/s13326-023-00289-5.
ABSTRACT
The current rise of Open Science and Reproducibility in the Life Sciences requires the creation of rich, machine-actionable metadata in order to better share and reuse biological digital resources such as datasets, bioinformatics tools, training materials, etc. For this purpose, FAIR principles have been defined for both data and metadata and adopted by large communities, leading to the definition of specific metrics. However, automatic FAIRness assessment is still difficult because computational evaluations frequently require technical expertise and can be time-consuming. As a first step to address these issues, we propose FAIR-Checker, a web-based tool to assess the FAIRness of metadata presented by digital resources. FAIR-Checker offers two main facets: a "Check" module providing a thorough metadata evaluation and recommendations, and an "Inspect" module which assists users in improving metadata quality and therefore the FAIRness of their resource. FAIR-Checker leverages Semantic Web standards and technologies such as SPARQL queries and SHACL constraints to automatically assess FAIR metrics. Users are notified of missing, necessary, or recommended metadata for various resource categories. We evaluate FAIR-Checker in the context of improving the FAIRification of individual resources, through better metadata, as well as analyzing the FAIRness of more than 25 thousand bioinformatics software descriptions.
PMID:37393296 | DOI:10.1186/s13326-023-00289-5
Investigating the potential of the semantic web for education: Exploring Wikidata as a learning platform
Educ Inf Technol (Dordr). 2023 Mar 13:1-50. doi: 10.1007/s10639-023-11664-1. Online ahead of print.
ABSTRACT
Wikidata is a free, multilingual, open knowledge-base that stores structured, linked data. It has grown rapidly and as of December 2022 contains over 100 million items and millions of statements, making it the largest semantic knowledge-base in existence. Changing the interaction between people and knowledge, Wikidata offers various learning opportunities, leading to new applications in sciences, technology and cultures. These learning opportunities stem in part from the ability to query this data and ask questions that were difficult to answer in the past. They also stem from the ability to visualize query results, for example on a timeline or a map, which, in turn, helps users make sense of the data and draw additional insights from it. Research on the semantic web as learning platform and on Wikidata in the context of education is almost non-existent, and we are just beginning to understand how to utilize it for educational purposes. This research investigates the Semantic Web as a learning platform, focusing on Wikidata as a prime example. To that end, a methodology of multiple case studies was adopted, demonstrating Wikidata uses by early adopters. Seven semi-structured, in-depth interviews were conducted, out of which 10 distinct projects were extracted. A thematic analysis approach was deployed, revealing eight main uses, as well as benefits and challenges to engaging with the platform. The results shed light on Wikidata's potential as a lifelong learning process, enabling opportunities for improved Data Literacy and a worldwide social impact.
PMID:37361737 | PMC:PMC10009355 | DOI:10.1007/s10639-023-11664-1
PPIntegrator: semantic integrative system for protein-protein interaction and application for host-pathogen datasets
Bioinform Adv. 2023 Jun 1;3(1):vbad067. doi: 10.1093/bioadv/vbad067. eCollection 2023.
ABSTRACT
SUMMARY: Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system.
AVAILABILITY AND IMPLEMENTATION: https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.
PMID:37359724 | PMC:PMC10290227 | DOI:10.1093/bioadv/vbad067
3D model retrieval based on interactive attention CNN and multiple features
PeerJ Comput Sci. 2023 Feb 10;9:e1227. doi: 10.7717/peerj-cs.1227. eCollection 2023.
ABSTRACT
3D (three-dimensional) models are widely applied in our daily life, such as mechanical manufacture, games, biochemistry, art, virtual reality, and etc. With the exponential growth of 3D models on web and in model library, there is an increasing need to retrieve the desired model accurately according to freehand sketch. Researchers are focusing on applying machine learning technology to 3D model retrieval. In this article, we combine semantic feature, shape distribution features and gist feature to retrieve 3D model based on interactive attention convolutional neural networks (CNN). The purpose is to improve the accuracy of 3D model retrieval. Firstly, 2D (two-dimensional) views are extracted from 3D model at six different angles and converted into line drawings. Secondly, interactive attention module is embedded into CNN to extract semantic features, which adds data interaction between two CNN layers. Interactive attention CNN extracts effective features from 2D views. Gist algorithm and 2D shape distribution (SD) algorithm are used to extract global features. Thirdly, Euclidean distance is adopted to calculate the similarity of semantic feature, the similarity of gist feature and the similarity of shape distribution feature between sketch and 2D view. Then, the weighted sum of three similarities is used to compute the similarity between sketch and 2D view for retrieving 3D model. It solves the problem that low accuracy of 3D model retrieval is caused by the poor extraction of semantic features. Nearest neighbor (NN), first tier (FT), second tier (ST), F-measure (E(F)), and discounted cumulated gain (DCG) are used to evaluate the performance of 3D model retrieval. Experiments are conducted on ModelNet40 and results show that the proposed method is better than others. The proposed method is feasible in 3D model retrieval.
PMID:37346676 | PMC:PMC10280475 | DOI:10.7717/peerj-cs.1227
AI-SPedia: a novel ontology to evaluate the impact of research in the field of artificial intelligence
PeerJ Comput Sci. 2022 Sep 22;8:e1099. doi: 10.7717/peerj-cs.1099. eCollection 2022.
ABSTRACT
BACKGROUND: Sharing knowledge such as resources, research results, and scholarly documents, is of key importance to improving collaboration between researchers worldwide. Research results from the field of artificial intelligence (AI) are vital to share because of the extensive applicability of AI to several other fields of research. This has led to a significant increase in the number of AI publications over the past decade. The metadata of AI publications, including bibliometrics and altmetrics indicators, can be accessed by searching familiar bibliographical databases such as Web of Science (WoS), which enables the impact of research to be evaluated and identify rising researchers and trending topics in the field of AI.
PROBLEM DESCRIPTION: In general, bibliographical databases have two limitations in terms of the type and form of metadata we aim to improve. First, most bibliographical databases, such as WoS, are more concerned with bibliometric indicators and do not offer a wide range of altmetric indicators to complement traditional bibliometric indicators. Second, the traditional format in which data is downloaded from bibliographical databases limits users to keyword-based searches without considering the semantics of the data.
PROPOSED SOLUTION: To overcome these limitations, we developed a repository, named AI-SPedia. The repository contains semantic knowledge of scientific publications concerned with AI and considers both the bibliometric and altmetric indicators. Moreover, it uses semantic web technology to produce and store data to enable semantic-based searches. Furthermore, we devised related competency questions to be answered by posing smart queries against the AI-SPedia datasets.
RESULTS: The results revealed that AI-SPedia can evaluate the impact of AI research by exploiting knowledge that is not explicitly mentioned but extracted using the power of semantics. Moreover, a simple analysis was performed based on the answered questions to help make research policy decisions in the AI domain. The end product, AI-SPedia, is considered the first attempt to evaluate the impacts of AI scientific publications using both bibliometric and altmetric indicators and the power of semantic web technology.
PMID:37346315 | PMC:PMC10280256 | DOI:10.7717/peerj-cs.1099
A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL
J Cheminform. 2023 Jun 20;15(1):61. doi: 10.1186/s13321-023-00729-5.
ABSTRACT
Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database systems and databases stored in them to be interoperable with each other. One of the possible solutions to address this issue is to use systems based on Semantic Web technologies, namely on the Resource Description Framework (RDF) to express data and on the SPARQL query language to retrieve the data. Many existing biological and chemical databases are stored in the form of a relational database (RDB). Converting a relational database into the RDF form and storing it in a native RDF database system may not be desirable in many cases. It may be necessary to preserve the original database form, and having two versions of the same data may not be convenient. A solution may be to use a system mapping the relational database to the RDF form. Such a system keeps data in their original relational form and translates incoming SPARQL queries to equivalent SQL queries, which are evaluated by a relational-database system. This review compares different RDB-to-RDF mapping systems with a primary focus on those that can be used free of charge. In addition, it compares different approaches to expressing RDB-to-RDF mappings. The review shows that these systems represent a viable method providing sufficient performance. Their real-life performance is demonstrated on data and queries coming from the neXtProt project.
PMID:37340506 | DOI:10.1186/s13321-023-00729-5
Excess Hospital Burden Among Young People in Contact With Homelessness Services in South Australia: A Prospective Linked Data Study
J Adolesc Health. 2023 Sep;73(3):519-526. doi: 10.1016/j.jadohealth.2023.04.018. Epub 2023 Jun 16.
ABSTRACT
PURPOSE: Youth homelessness remains an ongoing public health issue worldwide. We aimed to describe the burden of emergency department (ED) presentations and hospitalizations among a South Australian population of young people in contact with specialist homelessness services (SHS).
METHODS: This whole-of-population study used de-identified, linked administrative data from the Better Evidence Better Outcomes Linked Data (BEBOLD) platform on all individuals born between 1996 and 1998 (N = 57,509). The Homelessness2Home data collection was used to identify 2,269 young people in contact with SHS at ages 16-17 years. We followed these 57,509 individuals to age 18-19 years and compared ED presentations and hospital separations related to mental health, self-harm, drug and alcohol, injury, oral health, respiratory conditions, diabetes, pregnancy, and potentially preventable hospitalizations between those in contact and not in contact with SHS.
RESULTS: Four percent of young people had contact with SHS at ages 16-17 years. Young people who had contact with SHS were 2 and 3 times more likely to have presented to an ED and hospital respectively, compared to those who did not contact SHS. This accounted for 13% of all ED presentations and 16% of all hospitalizations in this age group. Excess burden causes included mental health, self-harm, drug and alcohol, diabetes, and pregnancy. On average, young people in contact with SHS experienced an increased length of stay in ED (+0.6 hours) and hospital (+0.7 days) per presentation, and were more likely to not wait for treatment in ED and to self-discharge from hospital.
DISCUSSION: The 4% of young people who contacted SHS at ages 16-17 years accounted for 13% and 16% of all ED presentations and hospitalizations respectively at age 18-19 years. Prioritizing access to stable housing and primary health-care services for adolescents in contact with SHS in Australia could improve health outcomes and reduce health-care costs.
PMID:37330707 | DOI:10.1016/j.jadohealth.2023.04.018
Mortality and cause of death during inpatient psychiatric care in New South Wales, Australia: A retrospective linked data study
J Psychiatr Res. 2023 Aug;164:51-58. doi: 10.1016/j.jpsychires.2023.05.043. Epub 2023 May 23.
ABSTRACT
BACKGROUND: Premature mortality in people with mental illness is well-documented, yet deaths during inpatient psychiatric care have received little research attention. This study investigates mortality rates and causes of death during inpatient psychiatric care in New South Wales (NSW), Australia. Risk factors for inpatient death were also explored.
METHODS: A retrospective cohort study using linked administrative datasets with complete capture of psychiatric admissions in NSW from 2002 to 2012 (n = 421,580) was conducted. Univariate and multivariate random-effects logistic regression analyses were used to explore risk factors for inpatient death.
RESULTS: The mortality rate during inpatient psychiatric care was 1.12 deaths per 1000 episodes of care and appeared to decline over the study period. Suicide accounted for 17% of inpatient deaths, while physical health causes accounted for 75% of all deaths. Thirty percent of these deaths were considered potentially avoidable. In the multivariate model, male sex, unknown address and several physical health diagnoses were associated with increased deaths.
CONCLUSIONS: The mortality rate and number of avoidable deaths during inpatient psychiatric care were substantial and warrant further systemic investigation. This was driven by a dual burden of physical health conditions and suicide. Strategies to improve access to physical health care on psychiatric inpatient wards and prevent inpatient suicide are necessary. A coordinated approach to monitoring psychiatric inpatient deaths in Australia is not currently available and much needed.
PMID:37315354 | DOI:10.1016/j.jpsychires.2023.05.043
Hospital-service use in the last year of life by patients aged ⩾60 years who died of heart failure or cardiomyopathy: A retrospective linked data study
Palliat Med. 2023 Sep;37(8):1232-1240. doi: 10.1177/02692163231180912. Epub 2023 Jun 12.
ABSTRACT
BACKGROUND: Understanding patterns of health care use in the last year of life is critical in health services planning.
AIM: To describe hospital-based service and palliative care use in hospital in the year preceding death for patients who died of heart failure or cardiomyopathy in Queensland from 2008 to 2018 and had at least one hospitalisation in the year preceding death.
DESIGN: A retrospective data linkage study was conducted using administrative health data relating to hospitalisations, emergency department visits and deaths.
PARTICIPANTS AND SETTING: Participants included were those aged ⩾60 years, had a hospitalisation in their last year of life and died of heart failure or cardiomyopathy in Queensland, Australia.
RESULTS: Of the 4697 participants, there were 25,583 hospital admissions. Three quarters (n = 3420, 73%) of participants were aged ⩾80 years and over half died in hospital (n = 2886, 61%). The median number of hospital admissions in the last year of life was 3 (interquartile range [IQR] 2-5). The care type was recorded as 'acute' for 89% (n = 22,729) of hospital admissions, and few (n = 853, 3%) hospital admissions had a care type recorded as 'palliative.' Of the 4697 participants, 3458 had emergency department visit(s), presenting 10,330 times collectively.
CONCLUSION: In this study, patients who died of heart failure or cardiomyopathy were predominantly aged ⩾80 years and over half died in hospital. These patients experienced repeat acute hospitalisations in the year preceding death. Improving timely access to palliative care services in the outpatient or community setting is needed for patients with heart failure.
PMID:37306096 | DOI:10.1177/02692163231180912
A Systematic Review of Location Data for Depression Prediction
Int J Environ Res Public Health. 2023 May 29;20(11):5984. doi: 10.3390/ijerph20115984.
ABSTRACT
Depression contributes to a wide range of maladjustment problems. With the development of technology, objective measurement for behavior and functional indicators of depression has become possible through the passive sensing technology of digital devices. Focusing on location data, we systematically reviewed the relationship between depression and location data. We searched Scopus, PubMed, and Web of Science databases by combining terms related to passive sensing and location data with depression. Thirty-one studies were included in this review. Location data demonstrated promising predictive power for depression. Studies examining the relationship between individual location data variables and depression, homestay, entropy, and the normalized entropy variable of entropy dimension showed the most consistent and significant correlations. Furthermore, variables of distance, irregularity, and location showed significant associations in some studies. However, semantic location showed inconsistent results. This suggests that the process of geographical movement is more related to mood changes than to semantic location. Future research must converge across studies on location-data measurement methods.
PMID:37297588 | DOI:10.3390/ijerph20115984
ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis
medRxiv. 2023 May 21:2023.05.14.23289955. doi: 10.1101/2023.05.14.23289955. Preprint.
ABSTRACT
OBJECTIVE: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient A ggregated na R rative C odified H ealth (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.
METHODS: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p -values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients.
RESULTS: ARCH produces high-quality clinical embeddings and KG for over 60, 000 EHR concepts, as visualized in the R-shiny powered web-API ( https://celehs.hms.harvard.edu/ARCH/ ). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p -values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate.
CONCLUSIONS: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.
PMID:37293026 | PMC:PMC10246054 | DOI:10.1101/2023.05.14.23289955
HighAltitudeOmicsDB, an integrated resource for high-altitude associated genes and proteins, networks and semantic-similarities
Sci Rep. 2023 Jun 8;13(1):9307. doi: 10.1038/s41598-023-35792-3.
ABSTRACT
Millions of people worldwide visit, live or work in the hypoxic environment encountered at high altitudes and it is important to understand the biomolecular responses to this stress. This would help design mitigation strategies for high altitude illnesses. In spite of a number of studies spanning over 100 years, still the complex mechanisms controlling acclimatization to hypoxia remain largely unknown. To identify potential diagnostic, therapeutic and predictive markers for HA stress, it is important to comprehensively compare and analyse these studies. Towards this goal, HighAltitudeOmicsDB is a unique resource that provides a comprehensive, curated, user-friendly and detailed compilation of various genes/proteins which have been experimentally validated to be associated with various HA conditions, their protein-protein interactions (PPIs) and gene ontology (GO) semantic similarities. For each database entry, HighAltitudeOmicsDB additionally stores the level of regulation (up/down-regulation), fold change, study control group, duration and altitude of exposure, tissue of expression, source organism, level of hypoxia, method of experimental validation, place/country of study, ethnicity, geographical location etc. The database also collates information on disease and drug association, tissue-specific expression level, GO and KEGG pathway associations. The web resource is a unique server platform that offers interactive PPI networks and GO semantic similarity matrices among the interactors.These unique features help to offer mechanistic insights into the disease pathology. Hence, HighAltitudeOmicsDBis a unique platform for researchers working in this area to explore, fetch, compare and analyse HA-associated genes/proteins, their PPI networks, and GO semantic similarities. The database is available at http://www.altitudeomicsdb.in .
PMID:37291174 | DOI:10.1038/s41598-023-35792-3
A web framework for information aggregation and management of multilingual hate speech
Heliyon. 2023 May 9;9(5):e16084. doi: 10.1016/j.heliyon.2023.e16084. eCollection 2023 May.
ABSTRACT
Social media platforms have led to the creation of a vast amount of information produced by users and published publicly, facilitating participation in the public sphere, but also giving the opportunity for certain users to publish hateful content. This content mainly involves offensive/discriminative speech towards social groups or individuals (based on racial, religious, gender or other characteristics) and could possibly lead into subsequent hate actions/crimes due to persistent escalation. Content management and moderation in big data volumes can no longer be supported manually. In the current research, a web framework is presented and evaluated for the collection, analysis, and aggregation of multilingual textual content from various online sources. The framework is designed to address the needs of human users, journalists, academics, and the public to collect and analyze content from social media and the web in Spanish, Italian, Greek, and English, without prior training or a background in Computer Science. The backend functionality provides content collection and monitoring, semantic analysis including hate speech detection and sentiment analysis using machine learning models and rule-based algorithms, storing, querying, and retrieving such content along with the relevant metadata in a database. This functionality is assessed through a graphic user interface that is accessed using a web browser. An evaluation procedure was held through online questionnaires, including journalists and students, proving the feasibility of the use of the proposed framework by non-experts for the defined use-case scenarios.
PMID:37215824 | PMC:PMC10196859 | DOI:10.1016/j.heliyon.2023.e16084
Slowdowns in scalar implicature processing: Isolating the intention-reading costs in the Bott & Noveck task
Cognition. 2023 May 19;238:105480. doi: 10.1016/j.cognition.2023.105480. Online ahead of print.
ABSTRACT
An underinformative sentence, such as Some cats are mammals, is trivially true with a semantic (some and perhaps all) reading of the quantifier and false with a pragmatic (some but not all) one, with the latter reliably resulting in longer response times than the former in a truth evaluation task (Bott & Noveck, 2004). Most analyses attribute these prolonged reaction times, or costs, to the steps associated with the derivation of the scalar implicature. In the present work we investigate, across three experiments, whether such slowdowns can be attributed (at least partly) to the participant's need to adjust to the speaker's informative intention. In Experiment 1, we designed a web-based version of Bott & Noveck's (2004) laboratory task that would most reliably provide its classic results. In Experiment 2 we found that over the course of an experimental session, participants' pragmatic responses to underinformative sentences are initially reliably long and ultimately comparable to response times of logical interpretations to the same sentences. Such results cannot readily be explained by assuming that implicature derivation is a consistent source of processing effort. In Experiment 3, we further tested our account by examining how response times change as a function of the number of people said to produce the critical utterances. When participants are introduced (via a photo and description) to a single 'speaker', the results are similar to those found in Experiment 2. However, when they are introduced to two 'speakers', with the second 'speaker' appearing midway (after five encounters with underinformative items), we found a significant uptick in pragmatic response latencies to the underinformative item right after participants' meet their second speaker (i.e. at their sixth encounter with an underinformative item). Overall, we interpret these results as suggesting that at least part of the cost typically attributed to the derivation of a scalar implicature is actually a consequence of how participants think about the informative intentions of the person producing the underinformative sentences.
PMID:37210877 | DOI:10.1016/j.cognition.2023.105480
Disaster management ontology- an ontological approach to disaster management automation
Sci Rep. 2023 May 19;13(1):8091. doi: 10.1038/s41598-023-34874-6.
ABSTRACT
The geographical location of any region, as well as large-scale environmental changes caused by a variety of factors, invite a wide range of disasters. Floods, droughts, earthquakes, cyclones, landslides, tornadoes, and cloudbursts are all common natural disasters that destroy property and kill people. On average, 0.1% of the total deaths globally in the past decade have been due to natural disasters. The National Disaster Management Authority (NDMA), a branch of the Ministry of Home Affairs, plays an important role in disaster management in India by taking responsibility for risk mitigation, response, and recovery from all natural and man-made disasters. This article presents an ontology-based disaster management framework based on the NDMA's responsibility matrix. This ontological base framework is named as Disaster Management Ontology (DMO). It aids in task distribution among necessary authorities at various stages of a disaster, as well as a knowledge-driven decision support system for financial assistance to victims. In the proposed DMO, ontology has been used to integrate knowledge as well as a working platform for reasoners, and the Decision Support System (DSS) ruleset is written in Semantic Web Rule Language (SWRL), which is based on the First Order Logic (FOL) concept. In addition, OntoGraph, a class view of taxonomy, is used to make taxonomy more interactive for users.
PMID:37208434 | DOI:10.1038/s41598-023-34874-6
The PrescIT Knowledge Graph: Supporting ePrescription to Prevent Adverse Drug Reactions
Stud Health Technol Inform. 2023 May 18;302:551-555. doi: 10.3233/SHTI230203.
ABSTRACT
Adverse Drug Reactions (ADRs) are an important public health issue as they can impose significant health and monetary burdens. This paper presents the engineering and use case of a Knowledge Graph, supporting the prevention of ADRs as part of a Clinical Decision Support System (CDSS) developed in the context of the PrescIT project. The presented PrescIT Knowledge Graph is built upon Semantic Web technologies namely the Resource Description Framework (RDF), and integrates widely relevant data sources and ontologies, i.e., DrugBank, SemMedDB, OpenPVSignal Knowledge Graph and DINTO, resulting in a lightweight and self-contained data source for evidence-based ADRs identification.
PMID:37203746 | DOI:10.3233/SHTI230203
An Annotation Workbench for Semantic Annotation of Data Collection Instruments
Stud Health Technol Inform. 2023 May 18;302:108-112. doi: 10.3233/SHTI230074.
ABSTRACT
Semantic interoperability, i.e., the ability to automatically interpret the shared information in a meaningful way, is one of the most important requirements for data analysis of different sources. In the area of clinical and epidemiological studies, the target of the National Research Data Infrastructure for Personal Health Data (NFDI4Health), interoperability of data collection instruments such as case report forms (CRFs), data dictionaries and questionnaires is critical. Retrospective integration of semantic codes into study metadata at item-level is important, as ongoing or completed studies contain valuable information, which should be preserved. We present a first version of a Metadata Annotation Workbench to support annotators in dealing with a variety of complex terminologies and ontologies. User-driven development with users from the fields of nutritional epidemiology and chronic diseases ensured that the service fulfills the basic requirements for a semantic metadata annotation software for these NFDI4Health use cases. The web application can be accessed using a web browser and the source code of the software is available with an open-source MIT license.
PMID:37203619 | DOI:10.3233/SHTI230074
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
IEEE Trans Pattern Anal Mach Intell. 2023 May 18;PP. doi: 10.1109/TPAMI.2023.3277122. Online ahead of print.
ABSTRACT
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications.
PMID:37200120 | DOI:10.1109/TPAMI.2023.3277122
Integrating collective know-how for multicriteria decision support in agrifood chains-application to cheesemaking
Front Artif Intell. 2023 Apr 28;6:1145007. doi: 10.3389/frai.2023.1145007. eCollection 2023.
ABSTRACT
Agrifood chain processes are based on a multitude of knowledge, know-how and experiences forged over time. This collective expertise must be shared to improve food quality. Here we test the hypothesis that it is possible to design and implement a comprehensive methodology to create a knowledge base integrating collective expertise, while also using it to recommend technical actions required to improve food quality. The method used to test this hypothesis consists firstly in listing the functional specifications that were defined in collaboration with several partners (technical centers, vocational training schools, producers) over the course of several projects carried out in recent years. Secondly, we propose an innovative core ontology that utilizes the international languages of the Semantic Web to effectively represent knowledge in the form of decision trees. These decision trees will depict potential causal relationships between situations of interest and provide recommendations for managing them through technological actions, as well as a collective assessment of the efficiency of those actions. We show how mind map files created using mind-mapping tools are automatically translated into an RDF knowledge base using the core ontological model. Thirdly, a model to aggregate individual assessments provided by technicians and associated with technical action recommendations is proposed and evaluated. Finally, a multicriteria decision-support system (MCDSS) using the knowledge base is presented. It consists of an explanatory view allowing navigation in a decision tree and an action view for multicriteria filtering and possible side effect identification. The different types of MCDSS-delivered answers to a query expressed in the action view are explained. The MCDSS graphical user interface is presented through a real-use case. Experimental assessments have been performed and confirm that tested hypothesis is relevant.
PMID:37187891 | PMC:PMC10175634 | DOI:10.3389/frai.2023.1145007