Semantic Web
A knowledge graph-based data harmonization framework for secondary data reuse
Comput Methods Programs Biomed. 2023 Nov 10;243:107918. doi: 10.1016/j.cmpb.2023.107918. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: The adoption of new technologies in clinical care systems has propitiated the availability of a great amount of valuable data. However, this data is usually heterogeneous, requiring its harmonization to be integrated and analysed. We propose a semantic-driven harmonization framework that (1) enables the meaningful sharing and integration of healthcare data across institutions and (2) facilitates the analysis and exploitation of the shared data.
METHODS: The framework includes an ontology-based common data model (i.e. SCDM), a data transformation pipeline and a semantic query system. Heterogeneous datasets, mapped to different terminologies, are integrated by using an ontology-based infrastructure rooted in a top-level ontology. A graph database is generated by using these mappings, and web-based semantic query system facilitates data exploration.
RESULTS: Several datasets from different European institutions have been integrated by using the framework in the context of the European H2020 Precise4Q project. Through the query system, data scientists were able to explore data and use it for building machine learning models.
CONCLUSIONS: The flexible data representation using RDF, together with the formal semantic underpinning provided by the SCDM, have enabled the semantic integration, query and advanced exploitation of heterogeneous data in the context of the Precise4Q project.
PMID:37981455 | DOI:10.1016/j.cmpb.2023.107918
Factors Influencing the Answerability and Popularity of a Health-Related Post in the Question-and-Answer Community: Infodemiology Study of Metafilter
J Med Internet Res. 2023 Nov 17;25:e48858. doi: 10.2196/48858.
ABSTRACT
BACKGROUND: The web-based health question-and-answer (Q&A) community has become the primary and handy way for people to access health information and knowledge directly.
OBJECTIVE: The objective of our study is to investigate how content-related, context-related, and user-related variables influence the answerability and popularity of health-related posts based on a user-dynamic, social network, and topic-dynamic semantic network, respectively.
METHODS: Full-scale data on health consultations were acquired from the Metafilter Q&A community. These variables were designed in terms of context, content, and contributors. Negative binomial regression models were used to examine the influence of these variables on the favorite and comment counts of a health-related post.
RESULTS: A total of 18,099 post records were collected from a well-known Q&A community. The findings of this study include the following. Content-related variables have a strong impact on both the answerability and popularity of posts. Notably, sentiment values were positively related to favorite counts and negatively associated with comment counts. User-related variables significantly affected the answerability and popularity of posts. Specifically, participation intensity was positively related to comment count and negatively associated with favorite count. Sociability breadth only had a significant impact on comment count. Context-related variables have a more substantial influence on the popularity of posts than on their answerability. The topic diversity variable exhibits an inverse correlation with the comment count while manifesting a positive correlation with the favorite count. Nevertheless, topic intensity has a significant effect only on favorite count.
CONCLUSIONS: The research results not only reveal the factors influencing the answerability and popularity of health-related posts, which can help them obtain high-quality answers more efficiently, but also provide a theoretical basis for platform operators to enhance user engagement within health Q&A communities.
PMID:37976090 | DOI:10.2196/48858
Digital Personal Health Coaching Platform for Promoting Human Papillomavirus Infection Vaccinations and Cancer Prevention: Knowledge Graph-Based Recommendation System
JMIR Form Res. 2023 Nov 15;7:e50210. doi: 10.2196/50210.
ABSTRACT
BACKGROUND: Health promotion can empower populations to gain more control over their well-being by using digital interventions that focus on preventing the root causes of diseases. Digital platforms for personalized health coaching can improve health literacy and information-seeking behavior, leading to better health outcomes. Personal health records have been designed to enhance patients' self-management of a disease or condition. Existing personal health records have been mostly designed and deployed as a supplementary service that acts as views into electronic health records.
OBJECTIVE: We aim to overcome some of the limitations of electronic health records. This study aims to design and develop a personal health library (PHL) that generates personalized recommendations for human papillomavirus (HPV) vaccine promotion and cancer prevention.
METHODS: We have designed a proof-of-concept prototype of the Digital Personal Health Librarian, which leverages machine learning; natural language processing; and several innovative technological infrastructures, including the Semantic Web, social linked data, web application programming interfaces, and hypermedia-based discovery, to generate a personal health knowledge graph.
RESULTS: We have designed and implemented a proof-of-the-concept prototype to showcase and demonstrate how the PHL can be used to store an individual's health data, for example, a personal health knowledge graph. This is integrated with web-scale knowledge to support HPV vaccine promotion and prevent HPV-associated cancers among adolescents and their caregivers. We also demonstrated how the Digital Personal Health Librarian uses the PHL to provide evidence-based insights and knowledge-driven explanations that are personalized and inform health decision-making.
CONCLUSIONS: Digital platforms such as the PHL can be instrumental in improving precision health promotion and education strategies that address population-specific needs (ie, health literacy, digital competency, and language barriers) and empower individuals by facilitating knowledge acquisition to make healthy choices.
PMID:37966885 | DOI:10.2196/50210
Factors Associated With Transition From Community to Permanent Residential Aged Care Following Stroke: A Linked Registry Data Study
Stroke. 2023 Dec;54(12):3117-3127. doi: 10.1161/STROKEAHA.123.043972. Epub 2023 Nov 13.
ABSTRACT
BACKGROUND: Understanding factors that influence the transition to permanent residential aged care following a stroke or transient ischemic attack may inform strategies to support people to live at home longer. We aimed to identify the demographic, clinical, and system factors that may influence the transition from living in the community to permanent residential care in the 6 to 18 months following stroke/transient ischemic attack.
METHODS: Linked data cohort analysis of adults from Queensland and Victoria aged ≥65 years and registered in the Australian Stroke Clinical Registry (2012-2016) with a clinical diagnosis of stroke/transient ischemic attack and living in the community in the first 6 months post-hospital discharge. Participant data were linked with primary care, pharmaceutical, aged care, death, and hospital data. Multivariable survival analysis was performed to determine demographic, clinical, and system factors associated with the transition to permanent residential care in the 6 to 18 months following stroke, with death modeled as a competing risk.
RESULTS: Of 11 176 included registrants (median age, 77.2 years; 44% female), 520 (5%) transitioned to permanent residential care between 6 and 18 months. Factors most associated with transition included the history of urinary tract infections (subhazard ratio [SHR], 1.41 [95% CI, 1.16-1.71]), dementia (SHR, 1.66 [95% CI, 1.14-2.42]), increasing age (65-74 versus 85+ years; SHR, 1.75 [95% CI, 1.31-2.34]), living in regional Australia (SHR, 31 [95% CI, 1.08-1.60]), and aged care service approvals: respite (SHR, 4.54 [95% CI, 3.51-5.85]) and high-level home support (SHR, 1.80 [95% CI, 1.30-2.48]). Protective factors included being dispensed antihypertensive medications (SHR, 0.68 [95% CI, 0.53-0.87]), seeing a cardiologist (SHR, 0.72 [95% CI, 0.57-0.91]) following stroke, and less severe stroke (SHR, 0.71 [95% CI, 0.58-0.88]).
CONCLUSIONS: Our findings provide an improved understanding of factors that influence the transition from community to permanent residential care following stroke and can inform future strategies designed to delay this transition.
PMID:37955141 | DOI:10.1161/STROKEAHA.123.043972
Normalization of drug and therapeutic concepts with Thera-Py
JAMIA Open. 2023 Nov 8;6(4):ooad093. doi: 10.1093/jamiaopen/ooad093. eCollection 2023 Dec.
ABSTRACT
OBJECTIVE: The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation.
MATERIALS AND METHODS: For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record.
RESULTS: We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%).
CONCLUSION: We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin.
PMID:37954974 | PMC:PMC10637840 | DOI:10.1093/jamiaopen/ooad093
Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology
Med Image Anal. 2023 Nov 7;91:103021. doi: 10.1016/j.media.2023.103021. Online ahead of print.
ABSTRACT
The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain. To address this, we present a model named medical cross-attention vision-language model (Medical X-VL), which leverages key components to be tailored for the medical domain. The model is based on the following components: self-supervised unimodal models in medical domain and a fusion encoder to bridge them, momentum distillation, sentencewise contrastive learning for medical reports, and sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for monitoring AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed current state-of-the-art models in two medical image datasets, suggesting a novel clinical application of our monitoring AI model to alleviate human errors. Our method demonstrates a more specialized capacity for fine-grained understanding, which presents a distinct advantage particularly applicable to the medical domain.
PMID:37952385 | DOI:10.1016/j.media.2023.103021
MMLKG: Knowledge Graph for Mathematical Definitions, Statements and Proofs
Sci Data. 2023 Nov 10;10(1):791. doi: 10.1038/s41597-023-02681-3.
ABSTRACT
Nowadays, Knowledge Graphs (KGs) are important and developing in different areas. However, there is a lack of genuinely interoperable datasets representing mathematics that allow for information exchange between datasets in the Web ecosystem. In this paper, we address this matter based on the Mizar Mathematical Library (MML), a collection of articles written in the Mizar language. MML includes definitions and theorems with proofs to which authors can easily refer from newly written Mizar articles. However, extracting information directly from Mizar scripts by external projects is not very straightforward. Therefore, we propose a new data storage and retrieval approach based on the Knowledge Organization System (KOS) model and the KG concept that provides a way to organize and access knowledge. We present Mizar Mathematical Library Knowledge Graph (MMLKG), a thesaurus for describing mathematical objects. MMLKG supports semantic interoperability and allows linking data from different sources, e.g., Wikidata. Moreover, it satisfies the FAIR data principles. The data is publicly available via a Cypher endpoint.
PMID:37949866 | DOI:10.1038/s41597-023-02681-3
Ontology-driven analysis of marine metagenomics: what more can we learn from our data?
Gigascience. 2022 Dec 28;12:giad088. doi: 10.1093/gigascience/giad088.
ABSTRACT
BACKGROUND: The proliferation of metagenomic sequencing technologies has enabled novel insights into the functional genomic potentials and taxonomic structure of microbial communities. However, cyberinfrastructure efforts to manage and enable the reproducible analysis of sequence data have not kept pace. Thus, there is increasing recognition of the need to make metagenomic data discoverable within machine-searchable frameworks compliant with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship. Although a variety of metagenomic web services exist, none currently leverage the hierarchically structured terminology encoded within common life science ontologies to programmatically discover data.
RESULTS: Here, we integrate large-scale marine metagenomic datasets with community-driven life science ontologies into a novel FAIR web service. This approach enables the retrieval of data discovered by intersecting the knowledge represented within ontologies against the functional genomic potential and taxonomic structure computed from marine sequencing data. Our findings highlight various microbial functional and taxonomic patterns relevant to the ecology of prokaryotes in various aquatic environments.
CONCLUSIONS: In this work, we present and evaluate a novel Semantic Web architecture that can be used to ask novel biological questions of existing marine metagenomic datasets. Finally, the FAIR ontology searchable data products provided by our API can be leveraged by future research efforts.
PMID:37941395 | DOI:10.1093/gigascience/giad088
Modulating Complex Sentence Processing in Aphasia Through Attention and Semantic Networks
J Speech Lang Hear Res. 2023 Dec 11;66(12):5011-5035. doi: 10.1044/2023_JSLHR-23-00298. Epub 2023 Nov 7.
ABSTRACT
PURPOSE: Lexical processing impairments such as delayed and reduced activation of lexical-semantic information have been linked to syntactic processing disruptions and sentence comprehension deficits in individuals with aphasia (IWAs). Lexical-level deficits can also preclude successful lexical encoding during sentence processing and amplify the processing costs of similarity-based interference during syntactic retrieval. We investigate whether two manipulations to engage attention and pre-activate semantic features of a target (to-be-retrieved) noun will (a) boost lexical activation during initial lexical encoding and (b) facilitate syntactic dependency linking through improved resolution of interference in IWAs and neurologically unimpaired age-matched controls (AMCs).
METHOD: Eye-tracking-while-listening with a visual world paradigm was used to investigate whether semantic and attentional manipulations modulated initial lexical processing and downstream syntactic retrieval of the direct-object noun in object-relative sentences.
RESULTS: In the attention and semantic manipulations, the AMC group showed no changes in initial lexical access levels; however, gaze patterns revealed clear facilitations in dependency linking and interference resolution. In the IWA group, the attentional cue increased and maintained activation of N1 with modest facilitations in dependency linking. In the semantic condition, IWA results showed a greater degree of facilitation during dependency linking.
CONCLUSIONS: The results suggest that attention and semantic activation are parameters that may be manipulated to strengthen encoding of lexical representations to facilitate retrieval (i.e., dependency linking) and mitigate similarity-based interference. In IWAs, these manipulations may help to reduce lexical processing deficits that can preclude successful encoding.
PMID:37934886 | DOI:10.1044/2023_JSLHR-23-00298
Dimensions of equality in uptake of COVID-19 vaccination in Wales, UK: A multivariable linked data population analysis
Vaccine. 2023 Nov 30;41(49):7333-7341. doi: 10.1016/j.vaccine.2023.10.066. Epub 2023 Nov 4.
ABSTRACT
Vaccination has proven to be effective at preventing severe outcomes of COVID-19 infection, and uptake in the population has been high in Wales. However, there is a risk that high-level vaccination coverage statistics may mask hidden inequalities in under-served populations, many of whom may be at increased risk of severe outcomes of COVID-19 infection. The study population included 1,436,229 individuals aged 18 years and over, alive and residence in Wales as at 31st July 2022, and excluded immunosuppressed or care home residents. We compared people who had received one or more vaccinations to those with no vaccination using linked data from nine datasets within the Secure Anonymised Information Linkage (SAIL) databank. Multivariable analysis was undertaken to determine the impact of a range of sociodemographic characteristics on vaccination uptake, including ethnicity, country of birth, severe mental illness, homelessness and substance use. We found that overall uptake of first dose of COVID-19 vaccination was high in Wales (92.1 %), with the highest among those aged 80 years and over and females. Those aged under 40 years, household composition (aOR 0.38 95 %CI 0.35-0.41 for 10+ size household compared to two adult household) and being born outside the UK (aOR 0.44 95 %CI 0.43-0.46) had the strongest negative associations with vaccination uptake. This was followed by a history of substance misuse (aOR 0.45 95 %CI 0.44-0.46). Despite high-level population coverage in Wales, significant inequalities remain across several underserved groups. Factors associated with vaccination uptake should not be considered in isolation, to avoid drawing incorrect conclusions. Ensuring equitable access to vaccination is essential to protecting under-served groups from COVID-19 and further work needs to be done to address these gaps in coverage, with focus on tailored vaccination pathways and advocacy, using trusted partners and communities.
PMID:37932133 | DOI:10.1016/j.vaccine.2023.10.066
CAENet: Contrast adaptively enhanced network for medical image segmentation based on a differentiable pooling function
Comput Biol Med. 2023 Dec;167:107578. doi: 10.1016/j.compbiomed.2023.107578. Epub 2023 Oct 17.
ABSTRACT
Pixel differences between classes with low contrast in medical image semantic segmentation tasks often lead to confusion in category classification, posing a typical challenge for recognition of small targets. To address this challenge, we propose a Contrastive Adaptive Augmented Semantic Segmentation Network with a differentiable pooling function. Firstly, an Adaptive Contrast Augmentation module is constructed to automatically extract local high-frequency information, thereby enhancing image details and accentuating the differences between classes. Subsequently, the Frequency-Efficient Channel Attention mechanism is designed to select useful features in the encoding phase, where multifrequency information is employed to extract channel features. One-dimensional convolutional cross-channel interactions are adopted to reduce model complexity. Finally, a differentiable approximation of max pooling is introduced in order to replace standard max pooling, strengthening the connectivity between neurons and reducing information loss caused by downsampling. We evaluated the effectiveness of our proposed method through several ablation experiments and comparison experiments under homogeneous conditions. The experimental results demonstrate that our method competes favorably with other state-of-the-art networks on five medical image datasets, including four public medical image datasets and one clinical image dataset. It can be effectively applied to medical image segmentation.
PMID:37918260 | DOI:10.1016/j.compbiomed.2023.107578
Shape Expressions (ShEx) Schemas for the FHIR R5 Specification
J Biomed Inform. 2023 Oct 31:104534. doi: 10.1016/j.jbi.2023.104534. Online ahead of print.
ABSTRACT
This work continues along a visionary path of using Semantic Web standards such as RDF and ShEx to make healthcare data easier to integrate for research and leading-edge patient care. The work extends the ability to use ShEx schemas to validate FHIR RDF data, thereby enhancing the semantic web ecosystem for working with FHIR and non-FHIR data using the same ShEx validation framework. It updates FHIR's ShEx schemas to fix outstanding issues and reflect changes in the definition of FHIR RDF. In addition, it experiments with expressing FHIRPath constraints (which are not captured in the XML or JSON schemas) in ShEx schemas. These extended ShEx schemas were incorporated into the FHIR R5 specification and used to successfully validate FHIR R5 examples that are included with the FHIR specification, revealing several errors in the examples.
PMID:37918622 | DOI:10.1016/j.jbi.2023.104534
Data quality and patient characteristics in European ANCA-associated vasculitis registries: data retrieval by federated querying
Ann Rheum Dis. 2023 Oct 31:ard-2023-224571. doi: 10.1136/ard-2023-224571. Online ahead of print.
ABSTRACT
OBJECTIVES: This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries.
METHODS: Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis.
RESULTS: A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%-100% to 60%-100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%-91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively.
CONCLUSIONS: In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings.
PMID:37907255 | DOI:10.1136/ard-2023-224571
AIMS: An Automatic Semantic Machine Learning Microservice Framework to Support Biomedical and Bioengineering Research
Bioengineering (Basel). 2023 Sep 27;10(10):1134. doi: 10.3390/bioengineering10101134.
ABSTRACT
The fusion of machine learning and biomedical research offers novel ways to understand, diagnose, and treat various health conditions. However, the complexities of biomedical data, coupled with the intricate process of developing and deploying machine learning solutions, often pose significant challenges to researchers in these fields. Our pivotal achievement in this research is the introduction of the Automatic Semantic Machine Learning Microservice (AIMS) framework. AIMS addresses these challenges by automating various stages of the machine learning pipeline, with a particular emphasis on the ontology of machine learning services tailored to the biomedical domain. This ontology encompasses everything from task representation, service modeling, and knowledge acquisition to knowledge reasoning and the establishment of a self-supervised learning policy. Our framework has been crafted to prioritize model interpretability, integrate domain knowledge effortlessly, and handle biomedical data with efficiency. Additionally, AIMS boasts a distinctive feature: it leverages self-supervised knowledge learning through reinforcement learning techniques, paired with an ontology-based policy recording schema. This enables it to autonomously generate, fine-tune, and continually adapt to machine learning models, especially when faced with new tasks and data. Our work has two standout contributions demonstrating that machine learning processes in the biomedical domain can be automated, while integrating a rich domain knowledge base and providing a way for machines to have self-learning ability, ensuring they handle new tasks effectively. To showcase AIMS in action, we have highlighted its prowess in three case studies of biomedical tasks. These examples emphasize how our framework can simplify research routines, uplift the caliber of scientific exploration, and set the stage for notable advances.
PMID:37892864 | DOI:10.3390/bioengineering10101134
Models and Approaches for Comprehension of Dysarthric Speech Using Natural Language Processing: Systematic Review
JMIR Rehabil Assist Technol. 2023 Oct 27;10:e44489. doi: 10.2196/44489.
ABSTRACT
BACKGROUND: Speech intelligibility and speech comprehension for dysarthric speech has attracted much attention recently. Dysarthria is characterized by irregularities in the speed, strength, pitch, breath control, range, steadiness, and accuracy of muscle movements required for articulatory aspects of speech production.
OBJECTIVE: This study examined the contributions made by other studies involved in dysarthric speech comprehension. We focused on the modes of meaning extraction used in generalizing speaker-listener underpinnings in light of semantic ontology extraction as a desired technique, applied method types, speech representations used, and databases sourced from.
METHODS: This study involved a systematic literature review using 7 electronic databases: Cochrane Database of Systematic Reviews, Web of Science Core Collection, Scopus, PubMed, ACM, IEEE Xplore, and Google Scholar. The main eligibility criterion was the extraction of meaning from dysarthric speech using natural language processing or understanding approaches to improve on dysarthric speech comprehension. In total, out of 834 search results, 30 studies that matched the eligibility requirements were acquired following screening by 2 independent reviewers, with a lack of consensus being resolved through joint discussion or consultation with a third party. In order to evaluate the studies' methodological quality, the risk of bias assessment was based on the Cochrane risk-of-bias tool version 2 (RoB2) with 23 of the studies (77%) registering low risk of bias and 7 studies (33%) raising some concern over the risk of bias. The overall quality assessment of the study was done using TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis).
RESULTS: Following a review of 30 primary studies, this study revealed that the reviewed studies focused on natural language understanding or clinical approaches, with an increase in proposed solutions from 2020 onwards. Most studies relied on speaker-dependent speech features, while others used speech patterns, semantic knowledge, or hybrid approaches. The prevalent use of vector representation aligned with natural language understanding models, while Mel-frequency cepstral coefficient representation and no representation approaches were applied in neural networks. Hybrid representation studies aimed to reconstruct dysarthric speech or improve comprehension. Comprehensive databases, like TORGO and UA-Speech, were commonly used in combination with other curated databases, while primary data was preferred for specific or unique research objectives.
CONCLUSIONS: We found significant gaps in dysarthric speech comprehension characterized by the lack of inclusion of important listener or speech-independent features in the speech representations, mode of extraction, and data sources used. Further research is therefore proposed regarding the formulation of models that accommodate listener and speech-independent features through semantic ontologies that will be useful in the inclusion of key features of listener and speech-independent features for meaning extraction of dysarthric speech.
PMID:37889538 | DOI:10.2196/44489
Chemical Species Ontology for Data Integration and Knowledge Discovery
J Chem Inf Model. 2023 Oct 26. doi: 10.1021/acs.jcim.3c00820. Online ahead of print.
ABSTRACT
Web ontologies are important tools in modern scientific research because they provide a standardized way to represent and manage web-scale amounts of complex data. In chemistry, a semantic database for chemical species is indispensable for its ability to interrelate and infer relationships, enabling a more precise analysis and prediction of chemical behavior. This paper presents OntoSpecies, a web ontology designed to represent chemical species and their properties. The ontology serves as a core component of The World Avatar knowledge graph chemistry domain and includes a wide range of identifiers, chemical and physical properties, chemical classifications and applications, and spectral information associated with each species. The ontology includes provenance and attribution metadata, ensuring the reliability and traceability of data. Most of the information about chemical species are sourced from PubChem and ChEBI data on the respective compound Web pages using a software agent, making OntoSpecies a comprehensive semantic database of chemical species able to solve novel types of problems in the field. Access to this reliable source of chemical data is provided through a SPARQL end point. The paper presents example use cases to demonstrate the contribution of OntoSpecies in solving complex tasks that require integrated semantically searchable chemical data. The approach presented in this paper represents a significant advancement in the field of chemical data management, offering a powerful tool for representing, navigating, and analyzing chemical information to support scientific research.
PMID:37883649 | DOI:10.1021/acs.jcim.3c00820
The SIB Swiss Institute of Bioinformatics Semantic Web of data
Nucleic Acids Res. 2023 Oct 25:gkad902. doi: 10.1093/nar/gkad902. Online ahead of print.
ABSTRACT
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently offer semantically enriched data in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), as well as the Swiss Personalized Health Network initiative (SPHN) which also employs this enrichment. The semantic enrichment facilitates the manipulation of large data sets from public databases and private data sets. Examples are provided to illustrate that the data from the SIB databases can not only be queried using precise criteria individually, but also across multiple databases, including a variety of non-SIB databases. Data manipulation, be it exploration, extraction, annotation, combination, and publication, is possible using the SPARQL query language. Providing documentation, tutorials and sample queries makes it easier to navigate this web of semantic data. Through this paper, the reader will discover how the existing SIB knowledge graphs can be leveraged to tackle the complex biological or clinical questions that are being addressed today.
PMID:37878411 | DOI:10.1093/nar/gkad902
American literature news narration based on computer web technology
PLoS One. 2023 Oct 16;18(10):e0292446. doi: 10.1371/journal.pone.0292446. eCollection 2023.
ABSTRACT
Driven by internet technology, online has become the main way of news dissemination, but redundant information such as navigation bars and advertisements affects people's access to news content. The research aims to enable users to obtain pure news content from redundant web information. Firstly, based on the narrative characteristics of literary news, the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is employed to extract pure news content from the analyzed web pages. The algorithm uses keyword matching, text analysis, and semantic processing to determine news content's boundaries and key information. Secondly, the news text classification algorithm (support vector machine, K-nearest neighbor, AdaBoost algorithm) is selected through comparative experiments. The news extraction system based on keyword feature and extended Document Object Model (DOM) tree is constructed. DOM technology analyzes web page structure and extracts key elements and information. Finally, the research can get their narrative characteristics by studying the narrative sequence and structure of 15 American literary news reports. The results reveal that the most used narrative sequence in American literary news is sequence and flashback. The narrative duration is dominated by the victory rate and outline, supplemented by scenes and pauses. In addition, 53.3% of the narrative structures used in literary news are time-connected. This narrative structure can help reporters have a clear conceptual structure when writing, help readers quickly grasp and understand the context of the event and the life course of the protagonists in the report, and increase the report's readability. This research on the narrative characteristics of American literature news can provide media practitioners with a reference on news narrative techniques and strategies.
PMID:37844094 | DOI:10.1371/journal.pone.0292446
A multimodal discourse study of selected COVID-19 online public health campaign texts in Nigeria
Discourse Soc. 2023 Jan;34(1):96-119. doi: 10.1177/09579265221145098.
ABSTRACT
This paper discusses web-based public health discursive practices during the Coronavirus (COVID-19) pandemic in Nigeria. It utilises a multimodal discourse approach to explore how a combination of textual and visual resources was deployed to communicate informative and educative public health safety campaigns during the period. Essentially, this study discusses multimodal resources as a rhetorical technique for creating a public discursive engagement space designed to educate the public and mitigate the effect of the pandemic. The dataset was collected during and after the lockdown in 2020 (March-September) through media monitoring and manual downloading of relevant online COVID-19 posts, messages and public health advisories largely from WhatsApp platforms and the portals of some Nigerian national newspapers. Using insights from relevant approaches in discourse analysis (e.g. Multimodal Discourse and Critical Discourse Analysis), we adopted a qualitative content analysis approach to analyse on how online posts as multimodal resources amplify the role of social media affordances in producing and promoting public safety messages helped to control the spread and mitigate the effects of the pandemic. The study also shows that discursive and multimodal resources were deliberately deployed to increase the effectiveness of the technology-driven public health campaign. To a large extent, multimodal resources were found to complement lexico-semantic properties of online communication, where social media messages are created, crafted and reconstructed within a uniquely Nigerian public discourse context. The study further illustrates the increasing importance of web-based platforms as discursive sites for enacting and negotiating meanings during event-driven social activities and public engagement in the Global South.
PMID:37829509 | PMC:PMC9827133 | DOI:10.1177/09579265221145098
Words for the hearts: a corpus study of metaphors in online depression communities
Front Psychol. 2023 Aug 30;14:1227123. doi: 10.3389/fpsyg.2023.1227123. eCollection 2023.
ABSTRACT
PURPOSE/SIGNIFICANCE: Humans understand, think, and express themselves through metaphors. The current paper emphasizes the importance of identifying the metaphorical language used in online health communities (OHC) to understand how users frame and make sense of their experiences, which can boost the effectiveness of counseling and interventions for this population.
METHODS/PROCESS: We used a web crawler to obtain a corpus of an online depression community. We introduced a three-stage procedure for metaphor identification in a Chinese Corpus: (1) combine MIPVU to identify metaphorical expressions (ME) bottom-up and formulate preliminary working hypotheses; (2) collect more ME top-down in the corpus by performing semantic domain analysis on identified ME; and (3) analyze ME and categorize conceptual metaphors using a reference list. In this way, we have gained a greater understanding of how depression sufferers conceptualize their experience metaphorically in an under-represented language in the literature (Chinese) of a new genre (online health community).
RESULTS/CONCLUSION: Main conceptual metaphors for depression are classified into PERSONAL LIFE, INTERPERSONAL RELATIONSHIP, TIME, and CYBERCULTURE metaphors. Identifying depression metaphors in the Chinese corpus pinpoints the sociocultural environment people with depression are experiencing: lack of offline support, social stigmatization, and substitutability of offline support with online support. We confirm a number of depression metaphors found in other languages, providing a theoretical basis for researching, identifying, and treating depression in multilingual settings. Our study also identifies new metaphors with source-target connections based on embodied, sociocultural, and idiosyncratic levels. From these three levels, we analyze metaphor research's theoretical and practical implications, finding ways to emphasize its inherent cross-disciplinarity meaningfully.
PMID:37829080 | PMC:PMC10566633 | DOI:10.3389/fpsyg.2023.1227123