Semantic Web
Editorial: Knowledge graph technologies: the next Frontier of the food, agriculture, and water domains
Front Artif Intell. 2023 Nov 28;6:1319844. doi: 10.3389/frai.2023.1319844. eCollection 2023.
NO ABSTRACT
PMID:38089711 | PMC:PMC10715242 | DOI:10.3389/frai.2023.1319844
forager: a Python package and web interface for modeling mental search
Behav Res Methods. 2023 Dec 12. doi: 10.3758/s13428-023-02296-x. Online ahead of print.
ABSTRACT
Analyzing data from the verbal fluency task (e.g., "name all the animals you can in a minute") is of interest to both memory researchers and clinicians due to its broader implications for memory search and retrieval. Recent work has proposed several computational models to examine nuanced differences in search behavior, which can provide insights into the mechanisms underlying memory search. A prominent account of memory search within the fluency task was proposed by Hills et al. (2012), where mental search is modeled after how animals forage for food in physical space. Despite the broad potential utility of these models to scientists and clinicians, there is currently no open-source program to apply and compare existing foraging models or clustering algorithms without extensive, often redundant programming. To remove this barrier to studying search patterns in the fluency task, we created forager, a Python package ( https://github.com/thelexiconlab/forager ) and web interface ( https://forager.research.bowdoin.edu/ ). forager provides multiple automated methods to designate clusters and switches within a fluency list, implements a novel set of computational models that can examine the influence of multiple lexical sources (semantic, phonological, and frequency) on memory search using semantic embeddings, and also enables researchers to evaluate relative model performance at the individual and group level. The package and web interface cater to users with various levels of programming experience. In this work, we introduce forager's basic functionality and use cases that demonstrate its utility with pre-existing behavioral and clinical data sets of the semantic fluency task.
PMID:38087144 | DOI:10.3758/s13428-023-02296-x
POOE: predicting oomycete effectors based on a pre-trained large protein language model
mSystems. 2024 Jan 23;9(1):e0100423. doi: 10.1128/msystems.01004-23. Epub 2023 Dec 11.
ABSTRACT
Oomycetes are fungus-like eukaryotic microorganisms which can cause catastrophic diseases in many plants. Successful infection of oomycetes depends highly on their effector proteins that are secreted into plant cells to subvert plant immunity. Thus, systematic identification of effectors from the oomycete proteomes remains an initial but crucial step in understanding plant-pathogen relationships. However, the number of experimentally identified oomycete effectors is still limited. Currently, only a few bioinformatics predictors exist to detect potential effectors, and their prediction performance needs to be improved. Here, we used the sequence embeddings from a pre-trained large protein language model (ProtTrans) as input and developed a support vector machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance with an area under the precision-recall curve of 0.804 (area under the receiver operating characteristic curve = 0.893, accuracy = 0.874, precision = 0.777, recall = 0.684, and specificity = 0.936) in the fivefold cross-validation, considerably outperforming various combinations of popular machine learning algorithms and other commonly used sequence encoding schemes. A similar prediction performance was also observed in the independent test. Compared with the existing oomycete effector prediction methods, POOE provided very competitive and promising performance, suggesting that ProtTrans effectively captures rich protein semantic information and dramatically improves the prediction task. We anticipate that POOE can accelerate the identification of oomycete effectors and provide new hints to systematically understand the functional roles of effectors in plant-pathogen interactions. The web server of POOE is freely accessible at http://zzdlab.com/pooe/index.php. The corresponding source codes and data sets are also available at https://github.com/zzdlabzm/POOE.IMPORTANCEIn this work, we use the sequence representations from a pre-trained large protein language model (ProtTrans) as input and develop a Support Vector Machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance in the independent test set, considerably outperforming existing oomycete effector prediction methods. We expect that this new bioinformatics tool will accelerate the identification of oomycete effectors and further guide the experimental efforts to interrogate the functional roles of effectors in plant-pathogen interaction.
PMID:38078741 | PMC:PMC10804963 | DOI:10.1128/msystems.01004-23
Costs of breast cancer recurrence after initial treatment for HR+, HER2-, high-risk early breast cancer: estimates from SEER-Medicare linked data
J Med Econ. 2024 Jan-Dec;27(1):84-96. doi: 10.1080/13696998.2023.2291266. Epub 2023 Dec 19.
ABSTRACT
OBJECTIVE: To assess the costs of treated recurrence and survival in elderly patients with early breast cancer (EBC) at high risk of recurrence using Surveillance Epidemiology and End Results (SEER) registry-Medicare linked claims data.
METHODS: This retrospective study included patients aged ≥65 years with hormone receptor-positive (HR+), human epidermal growth factor receptor 2 negative (HER2-), node-positive EBC at high risk of recurrence. Treated recurrences were defined based on treatment events/procedure codes from claims. Primary outcomes were monthly total extra costs and cumulative extra costs of treated recurrence relative to patients with non/untreated recurrence. Costs were calculated using a Kaplan-Meier sampling average estimator method and inflated to 2021 US$. Secondary outcomes included analysis by recurrence type and overall survival (OS) after recurrence. Subgroup analysis evaluated costs in patients with Medicare Part D coverage.
RESULTS: Among 3,081 eligible patients [mean (SD) age at diagnosis was 74.5 (7.1) years], the majority were females (97.4%) and white (87.8%). Treated recurrence was observed in 964 patients (31.3%). The monthly extra cost of treated recurrence was highest at the beginning of the first treated recurrence episode, with 6-year cumulative cost of $117,926. Six-year cumulative extra costs were higher for patients with distant recurrences ($168,656) than for patients with locoregional recurrences ($96,465). Median OS was 4.34 years for all treated recurrences, 1.92 years for distant recurrence, and 6.78 years for locoregional recurrence. Similar cumulative extra cost trends were observed in the subgroup with Part D coverage as in the overall population.
LIMITATIONS: This study utilizes claims data to identify treated recurrence. Due to age constraints of the dataset, results may not extrapolate to a younger population where EBC is commonly diagnosed.
CONCLUSION: EBC recurrence in this elderly population has substantial costs, particularly in patients with distant recurrences. Therapies that delay or prevent recurrence may reduce long-term costs significantly.
PMID:38059275 | DOI:10.1080/13696998.2023.2291266
Dropout rate and associated factors of community-based health insurance beneficiaries in Ethiopia: a systematic review and meta-analysis
BMC Public Health. 2023 Dec 5;23(1):2425. doi: 10.1186/s12889-023-17351-7.
ABSTRACT
BACKGROUND: Ethiopia aims to achieve universal healthcare using health insurance. To do so, it has been implementing community-based health insurance since 2011. However, the retention of members by the scheme has not yet been evaluated nationally. The systematic review and meta-analysis aimed to evaluate the dropout rate and associated factors among the scheme's beneficiaries in Ethiopia.
METHODS: On December 19, 2022, searches were conducted in Scopus, Hinari, PubMed, Semantic Scholar, and Google Scholar. Searches were also conducted on the general web and electronic repositories, including the Ethiopian Health Insurance Service, the International Institute for Primary Health Care-Ethiopia, and various higher education institutions. The Joanna Briggs Institute's tools and the "preferred reporting items for systematic reviews and meta-analyses 2020 statement" were used to evaluate bias and frame the review, respectively. Data were analyzed using Stata 17 and RevMan 5. To assess heterogeneity, we conducted subgroup analysis and used a random model to calculate odds ratios with a p value less than 0.05 and a 95% CI.
RESULTS: In total, 14 articles were included in the qualitative synthesis, of which 12 were selected for the quantitative analysis. The pooled estimate revealed that the dropout rate of beneficiaries from the scheme was 34.0% (95% CI: 23-44%), provided that the renewal rate was 66.0%, and was found to be influenced by socio-demographic, health status, length of enrolment, knowledge, attitude, the scheme, and health service-related variables. The southern and Oromia regions reported the lowest and highest dropout rates, with 27.0% (95% CI: 24-29%) and 48.0% (95% CI: 18-78%), respectively. The dropout rates increased from 12.3% in 2012-2015 to 34.4% in 2020-2021.
CONCLUSION: More than one-third of the scheme's beneficiaries were found to have dropped out, and this has been found to increase over time, dictating that a community-based strategy and intervention, from the supply, insurer, and demand sides, seem indispensable in minimizing this huge dropout rate.
PMID:38053053 | DOI:10.1186/s12889-023-17351-7
Losing the chain of thought: A meta-analysis of functional neuroimaging studies using verbal tasks in schizophrenia
J Psychiatr Res. 2023 Nov 30;169:238-246. doi: 10.1016/j.jpsychires.2023.11.013. Online ahead of print.
ABSTRACT
BACKGROUND: Disorganization symptoms are a main feature of schizophrenia, which include illogical and incoherent thinking, circumstantiality, tangentiality and loose associations. As these symptoms entail language deficits, several functional neuroimaging studies have been performed in schizophrenia using verbal tasks, producing somewhat heterogenous results. Hence, we performed a meta-analysis seeking to identify the most reliable neural alterations observed in schizophrenia patients during such tasks.
METHODS: Web of Sciences, PubMed, and EMBASE were searched for functional neuroimaging studies during verbal tasks (e.g. verbal fluency and semantic processing) in schizophrenia. Out of 795 screened articles, 33 were eligible for this meta-analysis. A coordinated-based meta-analysis was performed with the activation likelihood estimation (ALE) approach, using the cluster-level family-wise error (FWE) correction set at p < 0.05.
RESULTS: In schizophrenia, hyperactivations were observed in the left inferior frontal gyrus (IFG) and middle frontal gyrus (MFG) and hypoactivations were observed in the right IFG, the precentral gyrus and the left caudate nucleus. Another analysis pooling hyper- and hypoactivations revealed altered activations, firstly, in the left IFG and MFG, secondly, in the left precentral gyrus, IFG and insula, and, thirdly, in the left angular gyrus and precuneus. In the light of these results, not only classic language-related regions are abnormally activated during verbal tasks in schizophrenia, but also brain regions involved in executive functions, autobiographical memory and, unexpectedly, in motor functions. Further functional neuroimaging studies are needed to investigate the role of the striatum in linguistic sequencing in schizophrenia.
PMID:38048673 | DOI:10.1016/j.jpsychires.2023.11.013
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review
Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.
ABSTRACT
OBJECTIVE: Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care.
METHODS: We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs.
RESULTS: Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP.
CONCLUSIONS: This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.
PMID:38042599 | DOI:10.1016/j.artmed.2023.102701
Comparing the beliefs regarding biological or psychological causalities toward stereotyped perception of people who stutter
Front Psychol. 2023 Nov 16;14:1279169. doi: 10.3389/fpsyg.2023.1279169. eCollection 2023.
ABSTRACT
PURPOSE: Developmental stuttering is a fluency disorder that may be caused by neurological, genetic, or familial factors. However, a general perception that stuttering is caused by psychological problems could lead to negative attitudes toward stuttering, causing prejudice or discrimination against people who stutter (PWS). Thus, our study aimed to investigate whether certain beliefs in etiology of stuttering are related to the negative perception of stuttering.
METHODS: A web-based survey of 413 native Japanese adults, aged 20-69, who did not suffer from stuttering, schizophrenia, or depression, was conducted in August 2021. The participants were recruited through the Web monitor panel. Participants were divided into three uniform groups based on their response to a 27-item questionnaire about their implicit belief regarding the etiology of stuttering: belief in the biological model (stuttering-biological group), belief in the psychological model (stuttering-psychological group), and the control group (those who responded to perception of healthy adult males). Participants were also asked to respond to 25 items of semantic differential scales about perception of stuttering or healthy adult males. Responses were summarized into several factors by factor analysis, and factor scores were compared among the three groups. The stuttering-biological group had the fewest participants, comprising 80 individuals. Overall, a total of 240 participants, 80 from each group, were included in the analysis.
RESULTS: Some pairs of stereotypes included in semantic differential scales revealed differences between the groups; PWS, irrespective of the participants of the biological or psychological group, were considered as having negative stereotyping properties such as being "tense," "anxious," or "afraid." Additionally, three concepts from the factor analysis of these 25 items were analyzed using an analysis of variance, and significant differences were found; the mean factor score of the "danger" stereotype was lower in the stuttering-biological group compared to the stuttering-psychological group.
CONCLUSION: Although the simplification of the biological model is not recommended, anti-stigma campaigns to educate people that stuttering is caused by multidimensional factors, not just psychological ones, could change the general public's negative perceptions of stuttering.
PMID:38034304 | PMC:PMC10687552 | DOI:10.3389/fpsyg.2023.1279169
Measuring trust: a text analysis approach to compare, contrast, and select trust questionnaires
Front Psychol. 2023 Nov 15;14:1192020. doi: 10.3389/fpsyg.2023.1192020. eCollection 2023.
ABSTRACT
INTRODUCTION: Trust has emerged as a prevalent construct to describe relationships between people and between people and technology in myriad domains. Across disciplines, researchers have relied on many different questionnaires to measure trust. The degree to which these questionnaires differ has not been systematically explored. In this paper, we use a word-embedding text analysis technique to identify the differences and common themes across the most used trust questionnaires and provide guidelines for questionnaire selection.
METHODS: A review was conducted to identify the existing trust questionnaires. In total, we included 46 trust questionnaires from three main domains (i.e., Automation, Humans, and E-commerce) with a total of 626 items measuring different trust layers (i.e., Dispositional, Learned, and Situational). Next, we encoded the words within each questionnaire using GloVe word embeddings and computed the embedding for each questionnaire item, and for each questionnaire. We reduced the dimensionality of the resulting dataset using UMAP to visualize these embeddings in scatterplots and implemented the visualization in a web app for interactive exploration of the questionnaires (https://areen.shinyapps.io/Trust_explorer/).
RESULTS: At the word level, the semantic space serves to produce a lexicon of trust-related words. At the item and questionnaire level, the analysis provided recommendation on questionnaire selection based on the dispersion of questionnaires' items and at the domain and layer composition of each questionnaire. Along with the web app, the results help explore the semantic space of trust questionnaires and guide the questionnaire selection process.
DISCUSSION: The results provide a novel means to compare and select trust questionnaires and to glean insights about trust from spoken dialog or written comments.
PMID:38034296 | PMC:PMC10684734 | DOI:10.3389/fpsyg.2023.1192020
Semantic and Correlation Disentangled Graph Convolutions for Multilabel Image Recognition
IEEE Trans Neural Netw Learn Syst. 2023 Nov 30;PP. doi: 10.1109/TNNLS.2023.3333542. Online ahead of print.
ABSTRACT
Multilabel image recognition (MLR) aims to annotate an image with comprehensive labels and suffers from object occlusion or small object sizes within images. Although the existing works attempt to capture and exploit label correlations to tackle these issues, they predominantly rely on global statistical label correlations as prior knowledge for guiding label prediction, neglecting the unique label correlations present within each image. To overcome this limitation, we propose a semantic and correlation disentangled graph convolution (SCD-GC) method, which builds the image-specific graph and employs graph propagation to reason the labels effectively. Specifically, we introduce a semantic disentangling module to extract categorywise semantic features as graph nodes and develop a correlation disentangling module to extract image-specific label correlations as graph edges. Performing graph convolutions on this image-specific graph allows for better mining of difficult labels with weak visual representations. Visualization experiments reveal that our approach successfully disentangles the dominant label correlations existing within the input image. Through extensive experimentation, we demonstrate that our method achieves superior results on the challenging Microsoft COCO (MS-COCO), PASCAL visual object classes (PASCAL-VOC), NUS web image dataset (NUS-WIDE), and Visual Genome 500 (VG-500) datasets. Code is available at GitHub: https://github.com/caigitrepo/SCDGC.
PMID:38032778 | DOI:10.1109/TNNLS.2023.3333542
BERTs of a feather: Studying inter- and intra-group communication via information theory and language models
Behav Res Methods. 2023 Nov 29. doi: 10.3758/s13428-023-02267-2. Online ahead of print.
ABSTRACT
When communicating, individuals alter their language to fulfill a myriad of social functions. In particular, linguistic convergence and divergence are fundamental in establishing and maintaining group identity. Quantitatively characterizing linguistic convergence is important when testing hypotheses surrounding language, including interpersonal and group communication. We provide a quantitative interpretation of linguistic convergence grounded in information theory. We then construct a computational model, built on top of a neural network model of language, that can be deployed to measure and test hypotheses about linguistic convergence in "big data." We demonstrate the utility of our convergence measurement in two case studies: (1) showing that our measurement is indeed sensitive to linguistic convergence across turns in dyadic conversation, and (2) showing that our convergence measurement is sensitive to social factors that mediate convergence in Internet-based communities (specifically, r/MensRights and r/MensLib). Our measurement also captures differences in which social factors influence web-based communities. We conclude by discussing methodological and theoretical implications of this semantic convergence analysis.
PMID:38030924 | DOI:10.3758/s13428-023-02267-2
Transcranial direct current stimulation in semantic variant of primary progressive aphasia: a state-of-the-art review
Front Hum Neurosci. 2023 Nov 8;17:1219737. doi: 10.3389/fnhum.2023.1219737. eCollection 2023.
ABSTRACT
The semantic variant of primary progressive aphasia (svPPA), known also as "semantic dementia (SD)," is a neurodegenerative disorder that pertains to the frontotemporal lobar degeneration clinical syndromes. There is currently no approved pharmacological therapy for all frontotemporal dementia variants. Transcranial direct current stimulation (tDCS) is a promising non-invasive brain stimulation technique capable of modulating cortical excitability through a sub-threshold shift in neuronal resting potential. This technique has previously been applied as adjunct treatment in Alzheimer's disease, while data for frontotemporal dementia are controversial. In this scoped review, we summarize and critically appraise the currently available evidence regarding the use of tDCS for improving performance in naming and/or matching tasks in patients with svPPA. Clinical trials addressing this topic were identified through MEDLINE (accessed by PubMed) and Web of Science, as of November 2022, week 3. Clinical trials have been unable to show a significant benefit of tDCS in enhancing semantic performance in svPPA patients. The heterogeneity of the studies available in the literature might be a possible explanation. Nevertheless, the results of these studies are promising and may offer valuable insights into methodological differences and overlaps, raising interest among researchers in identifying new non-pharmacological strategies for treating svPPA patients. Further studies are therefore warranted to investigate the potential therapeutic role of tDCS in svPPA.
PMID:38021245 | PMC:PMC10663282 | DOI:10.3389/fnhum.2023.1219737
Effects of short-term second language learning on the development of individual semantic networks in written and spoken language
Neurosci Lett. 2024 Jan 1;818:137558. doi: 10.1016/j.neulet.2023.137558. Epub 2023 Nov 23.
ABSTRACT
Previous studies have primarily focused on the relationship between native language (L1) and second language (L2) in the brain, specifically in one language modality, such as written or spoken language. However, there is limited research on how L2 proficiency impacts both modalities. This study aimed to investigate the functional networks involved in reading and speech comprehension for both L1 and L2, and observe changes in these networks as L2 proficiency improves. The dataset used in this study was obtained from a previous research conducted by Gurunandan et al., which involved Spanish-English bilingual participants undergoing a three-month English training program. Participants underwent fMRI scanning and performed a semantic animacy judgment task in both spoken and written language before and after training. Through analysis, distinct neural networks associated with spoken and written language were found between individuals' L1 and L2, both before and after training. Moreover, as L2 proficiency improved, the spoken and written networks for L2 remained distinct from those of the L1. These findings suggest that short-term L2 learning experiences can modify neural networks, but may not be enough to achieve native-like proficiency, supporting the accommodation hypothesis. These results have important implications for language learning and education, indicating that additional short-term training and exposure alone may not bridge the gap between L1 and L2 processing networks.
PMID:38007086 | DOI:10.1016/j.neulet.2023.137558
Cox regression with linked data
Stat Med. 2024 Jan 30;43(2):296-314. doi: 10.1002/sim.9960. Epub 2023 Nov 20.
ABSTRACT
Record linkage is increasingly used, especially in medical studies, to combine data from different databases that refer to the same entities. The linked data can bring analysts novel and valuable knowledge that is impossible to obtain from a single database. However, linkage errors are usually unavoidable, regardless of record linkage methods, and ignoring these errors may lead to biased estimates. While different methods have been developed to deal with the linkage errors in the generalized linear model, there is not much interest on Cox regression model, although this is one of the most important statistical models in clinical and epidemiological research. In this work, we propose an adjusted estimating equation for secondary Cox regression analysis, where linked data have been prepared by a third-party operator, and no information on matching variables is available to the analyst. Through a Monte Carlo simulation study, the proposed method is shown to lead to substantial bias reductions in the estimation of the parameters of the Cox model caused by false links. An asymptotically unbiased variance estimator for the adjusted estimators of Cox regression coefficients is also proposed. Finally, the proposed method is applied to a linked database from the Brest stroke registry in France.
PMID:37985942 | DOI:10.1002/sim.9960
ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition
IEEE Trans Neural Netw Learn Syst. 2023 Nov 21;PP. doi: 10.1109/TNNLS.2023.3331841. Online ahead of print.
ABSTRACT
The canonical approach to video action recognition dictates a neural network model to do a classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined categories, limiting their transferability on new datasets with unseen concepts. In this article, we provide a new perspective on action recognition by attaching importance to the semantic information of label texts rather than simply mapping them into numbers. Specifically, we model this task as a video-text matching problem within a multimodal learning framework, which strengthens the video representation with more semantic language supervision and enables our model to do zero-shot action recognition without any further labeled data or parameters' requirements. Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, adapt and fine-tune." This paradigm first learns powerful representations from pre-training on a large amount of web image-text or video-text data. Then, it makes the action recognition task to act more like pre-training problems via adaptation engineering. Finally, it is fine-tuned end-to-end on target datasets to obtain strong performance. We give an instantiation of the new paradigm, ActionCLIP, which not only has superior and flexible zero-shot/few-shot transfer ability but also reaches a top performance on general action recognition task, achieving 83.8% top-1 accuracy on Kinetics-400 with a ViT-B/16 as the backbone. Code is available at https://github.com/sallymmx/ActionCLIP.git.
PMID:37988204 | DOI:10.1109/TNNLS.2023.3331841
A knowledge graph-based data harmonization framework for secondary data reuse
Comput Methods Programs Biomed. 2023 Nov 10;243:107918. doi: 10.1016/j.cmpb.2023.107918. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: The adoption of new technologies in clinical care systems has propitiated the availability of a great amount of valuable data. However, this data is usually heterogeneous, requiring its harmonization to be integrated and analysed. We propose a semantic-driven harmonization framework that (1) enables the meaningful sharing and integration of healthcare data across institutions and (2) facilitates the analysis and exploitation of the shared data.
METHODS: The framework includes an ontology-based common data model (i.e. SCDM), a data transformation pipeline and a semantic query system. Heterogeneous datasets, mapped to different terminologies, are integrated by using an ontology-based infrastructure rooted in a top-level ontology. A graph database is generated by using these mappings, and web-based semantic query system facilitates data exploration.
RESULTS: Several datasets from different European institutions have been integrated by using the framework in the context of the European H2020 Precise4Q project. Through the query system, data scientists were able to explore data and use it for building machine learning models.
CONCLUSIONS: The flexible data representation using RDF, together with the formal semantic underpinning provided by the SCDM, have enabled the semantic integration, query and advanced exploitation of heterogeneous data in the context of the Precise4Q project.
PMID:37981455 | DOI:10.1016/j.cmpb.2023.107918
Factors Influencing the Answerability and Popularity of a Health-Related Post in the Question-and-Answer Community: Infodemiology Study of Metafilter
J Med Internet Res. 2023 Nov 17;25:e48858. doi: 10.2196/48858.
ABSTRACT
BACKGROUND: The web-based health question-and-answer (Q&A) community has become the primary and handy way for people to access health information and knowledge directly.
OBJECTIVE: The objective of our study is to investigate how content-related, context-related, and user-related variables influence the answerability and popularity of health-related posts based on a user-dynamic, social network, and topic-dynamic semantic network, respectively.
METHODS: Full-scale data on health consultations were acquired from the Metafilter Q&A community. These variables were designed in terms of context, content, and contributors. Negative binomial regression models were used to examine the influence of these variables on the favorite and comment counts of a health-related post.
RESULTS: A total of 18,099 post records were collected from a well-known Q&A community. The findings of this study include the following. Content-related variables have a strong impact on both the answerability and popularity of posts. Notably, sentiment values were positively related to favorite counts and negatively associated with comment counts. User-related variables significantly affected the answerability and popularity of posts. Specifically, participation intensity was positively related to comment count and negatively associated with favorite count. Sociability breadth only had a significant impact on comment count. Context-related variables have a more substantial influence on the popularity of posts than on their answerability. The topic diversity variable exhibits an inverse correlation with the comment count while manifesting a positive correlation with the favorite count. Nevertheless, topic intensity has a significant effect only on favorite count.
CONCLUSIONS: The research results not only reveal the factors influencing the answerability and popularity of health-related posts, which can help them obtain high-quality answers more efficiently, but also provide a theoretical basis for platform operators to enhance user engagement within health Q&A communities.
PMID:37976090 | DOI:10.2196/48858
Digital Personal Health Coaching Platform for Promoting Human Papillomavirus Infection Vaccinations and Cancer Prevention: Knowledge Graph-Based Recommendation System
JMIR Form Res. 2023 Nov 15;7:e50210. doi: 10.2196/50210.
ABSTRACT
BACKGROUND: Health promotion can empower populations to gain more control over their well-being by using digital interventions that focus on preventing the root causes of diseases. Digital platforms for personalized health coaching can improve health literacy and information-seeking behavior, leading to better health outcomes. Personal health records have been designed to enhance patients' self-management of a disease or condition. Existing personal health records have been mostly designed and deployed as a supplementary service that acts as views into electronic health records.
OBJECTIVE: We aim to overcome some of the limitations of electronic health records. This study aims to design and develop a personal health library (PHL) that generates personalized recommendations for human papillomavirus (HPV) vaccine promotion and cancer prevention.
METHODS: We have designed a proof-of-concept prototype of the Digital Personal Health Librarian, which leverages machine learning; natural language processing; and several innovative technological infrastructures, including the Semantic Web, social linked data, web application programming interfaces, and hypermedia-based discovery, to generate a personal health knowledge graph.
RESULTS: We have designed and implemented a proof-of-the-concept prototype to showcase and demonstrate how the PHL can be used to store an individual's health data, for example, a personal health knowledge graph. This is integrated with web-scale knowledge to support HPV vaccine promotion and prevent HPV-associated cancers among adolescents and their caregivers. We also demonstrated how the Digital Personal Health Librarian uses the PHL to provide evidence-based insights and knowledge-driven explanations that are personalized and inform health decision-making.
CONCLUSIONS: Digital platforms such as the PHL can be instrumental in improving precision health promotion and education strategies that address population-specific needs (ie, health literacy, digital competency, and language barriers) and empower individuals by facilitating knowledge acquisition to make healthy choices.
PMID:37966885 | DOI:10.2196/50210
Factors Associated With Transition From Community to Permanent Residential Aged Care Following Stroke: A Linked Registry Data Study
Stroke. 2023 Dec;54(12):3117-3127. doi: 10.1161/STROKEAHA.123.043972. Epub 2023 Nov 13.
ABSTRACT
BACKGROUND: Understanding factors that influence the transition to permanent residential aged care following a stroke or transient ischemic attack may inform strategies to support people to live at home longer. We aimed to identify the demographic, clinical, and system factors that may influence the transition from living in the community to permanent residential care in the 6 to 18 months following stroke/transient ischemic attack.
METHODS: Linked data cohort analysis of adults from Queensland and Victoria aged ≥65 years and registered in the Australian Stroke Clinical Registry (2012-2016) with a clinical diagnosis of stroke/transient ischemic attack and living in the community in the first 6 months post-hospital discharge. Participant data were linked with primary care, pharmaceutical, aged care, death, and hospital data. Multivariable survival analysis was performed to determine demographic, clinical, and system factors associated with the transition to permanent residential care in the 6 to 18 months following stroke, with death modeled as a competing risk.
RESULTS: Of 11 176 included registrants (median age, 77.2 years; 44% female), 520 (5%) transitioned to permanent residential care between 6 and 18 months. Factors most associated with transition included the history of urinary tract infections (subhazard ratio [SHR], 1.41 [95% CI, 1.16-1.71]), dementia (SHR, 1.66 [95% CI, 1.14-2.42]), increasing age (65-74 versus 85+ years; SHR, 1.75 [95% CI, 1.31-2.34]), living in regional Australia (SHR, 31 [95% CI, 1.08-1.60]), and aged care service approvals: respite (SHR, 4.54 [95% CI, 3.51-5.85]) and high-level home support (SHR, 1.80 [95% CI, 1.30-2.48]). Protective factors included being dispensed antihypertensive medications (SHR, 0.68 [95% CI, 0.53-0.87]), seeing a cardiologist (SHR, 0.72 [95% CI, 0.57-0.91]) following stroke, and less severe stroke (SHR, 0.71 [95% CI, 0.58-0.88]).
CONCLUSIONS: Our findings provide an improved understanding of factors that influence the transition from community to permanent residential care following stroke and can inform future strategies designed to delay this transition.
PMID:37955141 | DOI:10.1161/STROKEAHA.123.043972
Normalization of drug and therapeutic concepts with Thera-Py
JAMIA Open. 2023 Nov 8;6(4):ooad093. doi: 10.1093/jamiaopen/ooad093. eCollection 2023 Dec.
ABSTRACT
OBJECTIVE: The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation.
MATERIALS AND METHODS: For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record.
RESULTS: We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%).
CONCLUSION: We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin.
PMID:37954974 | PMC:PMC10637840 | DOI:10.1093/jamiaopen/ooad093