Semantic Web
Racial and Ethnic Disparities in Health-Related Quality of Life for Patients with Colorectal Cancer: Analysis of the SEER-MHOS Linked Data Set
Int J Radiat Oncol Biol Phys. 2023 Oct 1;117(2S):e296. doi: 10.1016/j.ijrobp.2023.06.2305.
ABSTRACT
PURPOSE/OBJECTIVE(S): We hypothesized that racial and ethnic disparities exist in health-related quality of life (HRQOL) among older adults with colorectal cancer, both before and after diagnosis.
MATERIALS/METHODS: The Surveillance, Epidemiology, and End Results and Medicare Health Outcomes Survey (SEER-MHOS) linked data set was used to identify patients 65 years old and above who were diagnosed with colorectal cancer between 1996 and 2015. Self-reported race/ethnicity, the predictor of interest, was categorized as White (W), Asian/Pacific Islander (API), Black/African American (B), or Hispanic (H). HRQOL data from the 36-Item Short Form Survey and Veterans RAND 12-Item Health Survey were extracted within 24 months pre- and post-diagnosis. HRQOL was measured using the Physical Component Summary (PCS), Mental Component Summary (MCS), and Total Component Summary (TCS, a summation of PCS and MCS), which were the response variables. Associations were assessed via univariable (UVA) and multivariable (MVA) linear regression analysis, adjusting for age, sex, region, marital status, education, income, number of comorbidities, limitations in activities of daily living, stage, and histology. Pairwise comparisons were performed between all racial and ethnic groups.
RESULTS: We identified a total of 1,204 evaluable patients, with 815 in the pre-diagnosis cohort and 562 in the post-diagnosis cohort, including 173 patients in both. With unadjusted p-values, pre-diagnosis UVA revealed a higher mean PCS in API patients compared to W, B, and H patients (p<0.001, p<0.001, p = 0.02) as well as in W compared to H patients (p = 0.002); a higher mean MCS in W and API patients compared to B (p<0.001, p = 0.002) and H patients (p<0.001, p = 0.002); and a higher mean TCS in API compared to W, B, and H patients (p = 0.027, p<0.001, p<0.001) as well as in W compared to B and H patients (p<0.001, p = 0.012). Pre-diagnosis MVA revealed a higher mean PCS in API compared to B patients (p = 0.028) and a higher mean MCS in W and B compared to H patients (p = 0.022, p = 0.021). Post-diagnosis UVA showed a higher mean MCS in W compared to B and H patients (p<0.001 for both) as well as in API compared to H patients (p = 0.002), and a higher mean TCS in W and API patients compared to B (p<0.001, p = 0.045) and H patients (p<0.001, p = 0.007). Post-diagnosis MVA showed a higher mean MCS in API compared to H patients (p = 0.035). Compared to pre-diagnosis, post-diagnosis mean TCS was numerically lower for all groups.
CONCLUSION: Among older adults with colorectal cancer, there appear to be racial and ethnic disparities in HRQOL. Before the cancer diagnosis, API patients had better physical HRQOL than B patients, while W and B patients had better mental HRQOL than H patients. After diagnosis, API patients had better mental HRQOL than H patients. For all groups, the cancer diagnosis seemed to have a negative impact on overall HRQOL.
PMID:37785087 | DOI:10.1016/j.ijrobp.2023.06.2305
Predicting Mammogram Screening Follow Through with Electronic Health Record and Geographically Linked Data
Cancer Res Commun. 2023 Oct 19;3(10):2126-2132. doi: 10.1158/2767-9764.CRC-23-0263.
ABSTRACT
Cancer is the second leading cause of death in the United States, and breast cancer is the fourth leading cause of cancer-related death, with 42,275 women dying of breast cancer in the United States in 2020. Screening is a key strategy for reducing mortality from breast cancer and is recommended by various national guidelines. This study applies machine learning classification methods to the task of predicting which patients will fail to complete a mammogram screening after having one ordered, as well as understanding the underlying features that influence predictions. The results show that a small group of patients can be identified that are very unlikely to complete mammogram screening, enabling care managers to focus resources.
SIGNIFICANCE: The motivation behind this study is to create an automated system that can identify a small group of individuals that are at elevated risk for not following through completing a mammogram screening. This will enable interventions to boost screening to be focused on patients least likely to complete screening.
PMID:37782226 | PMC:PMC10586236 | DOI:10.1158/2767-9764.CRC-23-0263
A systematic review and knowledge mapping on ICT-based remote and automatic COVID-19 patient monitoring and care
BMC Health Serv Res. 2023 Sep 30;23(1):1047. doi: 10.1186/s12913-023-10047-z.
ABSTRACT
BACKGROUND: e-Health has played a crucial role during the COVID-19 pandemic in primary health care. e-Health is the cost-effective and secure use of Information and Communication Technologies (ICTs) to support health and health-related fields. Various stakeholders worldwide use ICTs, including individuals, non-profit organizations, health practitioners, and governments. As a result of the COVID-19 pandemic, ICT has improved the quality of healthcare, the exchange of information, training of healthcare professionals and patients, and facilitated the relationship between patients and healthcare providers. This study systematically reviews the literature on ICT-based automatic and remote monitoring methods, as well as different ICT techniques used in the care of COVID-19-infected patients.
OBJECTIVE: The purpose of this systematic literature review is to identify the e-Health methods, associated ICTs, method implementation strategies, information collection techniques, advantages, and disadvantages of remote and automatic patient monitoring and care in COVID-19 pandemic.
METHODS: The search included primary studies that were published between January 2020 and June 2022 in scientific and electronic databases, such as EBSCOhost, Scopus, ACM, Nature, SpringerLink, IEEE Xplore, MEDLINE, Google Scholar, JMIR, Web of Science, Science Direct, and PubMed. In this review, the findings from the included publications are presented and elaborated according to the identified research questions. Evidence-based systematic reviews and meta-analyses were conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Additionally, we improved the review process using the Rayyan tool and the Scale for the Assessment of Narrative Review Articles (SANRA). Among the eligibility criteria were methodological rigor, conceptual clarity, and useful implementation of ICTs in e-Health for remote and automatic monitoring of COVID-19 patients.
RESULTS: Our initial search identified 664 potential studies; 102 were assessed for eligibility in the pre-final stage and 65 articles were used in the final review with the inclusion and exclusion criteria. The review identified the following eHealth methods-Telemedicine, Mobile Health (mHealth), and Telehealth. The associated ICTs are Wearable Body Sensors, Artificial Intelligence (AI) algorithms, Internet-of-Things, or Internet-of-Medical-Things (IoT or IoMT), Biometric Monitoring Technologies (BioMeTs), and Bluetooth-enabled (BLE) home health monitoring devices. Spatial or positional data, personal and individual health, and wellness data, including vital signs, symptoms, biomedical images and signals, and lifestyle data are examples of information that is managed by ICTs. Different AI and IoT methods have opened new possibilities for automatic and remote patient monitoring with associated advantages and weaknesses. Our findings were represented in a structured manner using a semantic knowledge graph (e.g., ontology model).
CONCLUSIONS: Various e-Health methods, related remote monitoring technologies, different approaches, information categories, the adoption of ICT tools for an automatic remote patient monitoring (RPM), advantages and limitations of RMTs in the COVID-19 case are discussed in this review. The use of e-Health during the COVID-19 pandemic illustrates the constraints and possibilities of using ICTs. ICTs are not merely an external tool to achieve definite remote and automatic health monitoring goals; instead, they are embedded in contexts. Therefore, the importance of the mutual design process between ICT and society during the global health crisis has been observed from a social informatics perspective. A global health crisis can be observed as an information crisis (e.g., insufficient information, unreliable information, and inaccessible information); however, this review shows the influence of ICTs on COVID-19 patients' health monitoring and related information collection techniques.
PMID:37777722 | DOI:10.1186/s12913-023-10047-z
A Global Feature-Rich Network Dataset of Cities and Dashboard for Comprehensive Urban Analyses
Sci Data. 2023 Sep 30;10(1):667. doi: 10.1038/s41597-023-02578-1.
ABSTRACT
Urban network analytics has become an essential tool for understanding and modeling the intricate complexity of cities. We introduce the Urbanity data repository to nurture this growing research field, offering a comprehensive, open spatial network resource spanning 50 major cities in 29 countries worldwide. Our workflow enhances OpenStreetMap networks with 40 + high-resolution indicators from open global sources such as street view imagery, building morphology, urban population, and points of interest, catering to a diverse range of applications across multiple fields. We extract streetscape semantic features from more than four million street view images using computer vision. The dataset's strength lies in its thorough processing and validation at every stage, ensuring data quality and consistency through automated and manual checks. Accompanying the dataset is an interactive, web-based dashboard we developed which facilitates data access to even non-technical stakeholders. Urbanity aids various GeoAI and city comparative analyses, underscoring the growing importance of urban network analytics research.
PMID:37777566 | DOI:10.1038/s41597-023-02578-1
Interactive Healthcare Robot using Attention-based Question-Answer Retrieval and Medical Entity Extraction Models
IEEE J Biomed Health Inform. 2023 Sep 29;PP. doi: 10.1109/JBHI.2023.3320939. Online ahead of print.
ABSTRACT
In healthcare facilities, answering the questions from the patients and their companions about the health problems is regarded as an essential task. With the current shortage of medical personnel resources and an increase in the patient-to-clinician ratio, staff in the medical field have consequently devoted less time to answering questions for each patient. However, studies have shown that correct healthcare information can positively improve patients' knowledge, attitudes, and behaviors. Therefore, delivering correct healthcare knowledge through a question-answering system is crucial. In this paper, we develop an interactive healthcare question-answering system that uses attention-based models to answer healthcare-related questions. Attention-based transformer models are utilized to efficiently encode semantic meanings and extract the medical entities inside the user query individually. These two features are integrated through our designed fusion module to match against the pre-collected healthcare knowledge set, so that our system will finally give the most accurate response to the user in real-time. To improve the interactivity, we further introduce a recommendation module and an online web search module to provide potential questions and out-of-scope answers. Experimental results for question-answer retrieval show that the proposed method has the ability to retrieve the correct answer from the FAQ pairs in the healthcare domain. Thus, we believe that this application can bring more benefits to human beings.
PMID:37773912 | DOI:10.1109/JBHI.2023.3320939
Analysis and implementation of the DynDiff tool when comparing versions of ontology
J Biomed Semantics. 2023 Sep 28;14(1):15. doi: 10.1186/s13326-023-00295-7.
ABSTRACT
BACKGROUND: Ontologies play a key role in the management of medical knowledge because they have the properties to support a wide range of knowledge-intensive tasks. The dynamic nature of knowledge requires frequent changes to the ontologies to keep them up-to-date. The challenge is to understand and manage these changes and their impact on depending systems well in order to handle the growing volume of data annotated with ontologies and the limited documentation describing the changes.
METHODS: We present a method to detect and characterize the changes occurring between different versions of an ontology together with an ontology of changes entitled DynDiffOnto, designed according to Semantic Web best practices and FAIR principles. We further describe the implementation of the method and the evaluation of the tool with different ontologies from the biomedical domain (i.e. ICD9-CM, MeSH, NCIt, SNOMEDCT, GO, IOBC and CIDO), showing its performance in terms of time execution and capacity to classify ontological changes, compared with other state-of-the-art approaches.
RESULTS: The experiments show a top-level performance of DynDiff for large ontologies and a good performance for smaller ones, with respect to execution time and capability to identify complex changes. In this paper, we further highlight the impact of ontology matchers on the diff computation and the possibility to parameterize the matcher in DynDiff, enabling the possibility of benefits from state-of-the-art matchers.
CONCLUSION: DynDiff is an efficient tool to compute differences between ontology versions and classify these differences according to DynDiffOnto concepts. This work also contributes to a better understanding of ontological changes through DynDiffOnto, which was designed to express the semantics of the changes between versions of an ontology and can be used to document the evolution of an ontology.
PMID:37770956 | DOI:10.1186/s13326-023-00295-7
Generative Adversarial Network (GAN)-Based Autonomous Penetration Testing for Web Applications
Sensors (Basel). 2023 Sep 21;23(18):8014. doi: 10.3390/s23188014.
ABSTRACT
The web application market has shown rapid growth in recent years. The expansion of Wireless Sensor Networks (WSNs) and the Internet of Things (IoT) has created new web-based communication and sensing frameworks. Current security research utilizes source code analysis and manual exploitation of web applications, to identify security vulnerabilities, such as Cross-Site Scripting (XSS) and SQL Injection, in these emerging fields. The attack samples generated as part of web application penetration testing on sensor networks can be easily blocked, using Web Application Firewalls (WAFs). In this research work, we propose an autonomous penetration testing framework that utilizes Generative Adversarial Networks (GANs). We overcome the limitations of vanilla GANs by using conditional sequence generation. This technique helps in identifying key features for XSS attacks. We trained a generative model based on attack labels and attack features. The attack features were identified using semantic tokenization, and the attack payloads were generated using conditional sequence GAN. The generated attack samples can be used to target web applications protected by WAFs in an automated manner. This model scales well on a large-scale web application platform, and it saves the significant effort invested in manual penetration testing.
PMID:37766067 | DOI:10.3390/s23188014
Multiple Antiplatelet Therapy in Ischemic Stroke Already on Antiplatelet Agents Based on the Linked Big Data for Stroke
J Korean Med Sci. 2023 Sep 25;38(38):e294. doi: 10.3346/jkms.2023.38.e294.
ABSTRACT
BACKGROUND: Optimal antiplatelet strategy for patients with ischemic stroke who were already on single antiplatelet therapy (SAPT) remains to be elucidated. This study aimed to evaluate the effect of different antiplatelet regimens on vascular and safety outcomes at 1 year after non-cardioembolic stroke in patients previously on SAPT.
METHODS: We identified 9,284 patients with acute non-cardioembolic ischemic stroke that occurred on SAPT using linked data. Patients were categorized into three groups according to antiplatelet strategy at discharge: 1) SAPT; 2) dual antiplatelet therapy (DAPT); and 3) triple antiplatelet therapy (TAPT). One-year outcomes included recurrent ischemic stroke, composite outcomes (recurrent ischemic stroke, myocardial infarction, intracerebral hemorrhage, and death), and major bleeding.
RESULTS: Of 9,284 patients, 5,565 (59.9%) maintained SAPT, 3,638 (39.2%) were treated with DAPT, and 81 (0.9%) were treated with TAPT. Multiple antiplatelet therapy did not reduce the risks of 1-year recurrent stroke (DAPT, hazard ratio [HR], 1.08, 95% confidence interval [CI], 0.92-1.27, P = 0.339; TAPT, HR, 0.71, 95% CI, 0.27-1.91, P = 0.500) and 1-year composite outcome (DAPT, HR, 1.09, 95% CI, 0.68-1.97, P = 0.592; TAPT, HR, 1.46, 95% CI, 0.68-1.97, P = 0.592). However, the TAPT groups showed an increased risk of major bleeding complications (DAPT, HR, 1.23, 95% CI, 0.89-1.71, P = 0.208; TAPT, HR, 4.65, 95% CI, 2.01-10.74, P < 0.001).
CONCLUSION: Additional use of antiplatelet agents in patients with non-cardioembolic ischemic stroke who were already on SAPT did not reduce the 1-year incidence of vascular outcomes, although it increased the risk of bleeding complications.
PMID:37750368 | PMC:PMC10519784 | DOI:10.3346/jkms.2023.38.e294
Extrapolation of affective norms using transformer-based neural networks and its application to experimental stimuli selection
Behav Res Methods. 2023 Sep 25. doi: 10.3758/s13428-023-02212-3. Online ahead of print.
ABSTRACT
Data on the emotionality of words is important for the selection of experimental stimuli and sentiment analysis on large bodies of text. While norms for valence and arousal have been thoroughly collected in English, most languages do not have access to such large datasets. Moreover, theoretical developments lead to new dimensions being proposed, the norms for which are only partially available. In this paper, we propose a transformer-based neural network architecture for semantic and emotional norms extrapolation that predicts a whole ensemble of norms at once while achieving state-of-the-art correlations with human judgements on each. We improve on the previous approaches with regards to the correlations with human judgments by Δr = 0.1 on average. We precisely discuss the limitations of norm extrapolation as a whole, with a special focus on the introduced model. Further, we propose a unique practical application of our model by proposing a method of stimuli selection which performs unsupervised control by picking words that match in their semantic content. As the proposed model can easily be applied to different languages, we provide norm extrapolations for English, Polish, Dutch, German, French, and Spanish. To aid researchers, we also provide access to the extrapolation networks through an accessible web application.
PMID:37749424 | DOI:10.3758/s13428-023-02212-3
Fourteen quick tips for crowdsourcing geographically linked data for public health advocacy
PLoS Comput Biol. 2023 Sep 21;19(9):e1011285. doi: 10.1371/journal.pcbi.1011285. eCollection 2023 Sep.
ABSTRACT
This article presents 14 quick tips to build a team to crowdsource data for public health advocacy. It includes tips around team building and logistics, infrastructure setup, media and industry outreach, and project wrap-up and archival for posterity.
PMID:37733682 | PMC:PMC10513213 | DOI:10.1371/journal.pcbi.1011285
Can growing patients with end-stage TMJ pathology be successfully treated with alloplastic temporomandibular joint reconstruction? - A systematic review
Oral Maxillofac Surg. 2023 Sep 21. doi: 10.1007/s10006-023-01180-4. Online ahead of print.
ABSTRACT
INTRODUCTION: The use of alloplastic total temporomandibular joint reconstruction (TMJR) in growing patients is controversial, mainly due to immature elements of the craniomaxillofacial skeleton. The aim of this systematic review was to evaluate the use of alloplastic TMJR in growing patients, focusing on the patient's clinical presentation, surgical and medical history and efficacy of alloplastic TMJR implantation.
MATERIALS AND METHODS: The literature search strategy was based on the Population, Intervention, Comparator, Outcomes and Study type (PICOS) framework. We searched Pubmed, Google Scholar, Dimension, Web of Science, X-mol, Semantic Scholar and Embase to January 2023, without any restriction on the type of publication reporting alloplastic TMJR in growing patients (age ≤ 18 years for boys and age ≤ 15 years for girls).
RESULTS: A total of 15 studies (case reports: 09, case series: 02, cohort studies: 04) met the inclusion criteria, documenting 73 patients of growing age from 07 countries. Thirty-eight (~ 52%) cases were female. The mean ± SD (range) age and follow-up of patients in all studies was 13.1 ± 3.2 (0-17) years and 34.3 ± 21.5 (7-96) months, respectively. A total of 22 (30%) patients were implanted with bilateral alloplastic TMJR. Over half of the studies (n = 10) were published in the last 3 years. All patients underwent multiple surgeries prior to implantation of alloplastic TMJR. In extreme cases, patients underwent a total of 17 surgeries. Different types of studies reporting inconsistent variables restricted our ability to perform quality assessment measures for evidence building.
CONCLUSIONS: Clinical experience with alloplastic TMJR in growing patients is limited to cases showing poor prognosis with other types of reconstruction. Nevertheless, studies show promising results for the use of alloplastic TMJR in growing patients, highlighting the need for well-controlled prospective studies with long-term follow-up.
PMID:37733214 | DOI:10.1007/s10006-023-01180-4
Using Social Media to Help Understand Patient-Reported Health Outcomes of Post-COVID-19 Condition: Natural Language Processing Approach
J Med Internet Res. 2023 Sep 19;25:e45767. doi: 10.2196/45767.
ABSTRACT
BACKGROUND: While scientific knowledge of post-COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians.
OBJECTIVE: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline's potential as a surveillance tool.
METHODS: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries.
RESULTS: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada.
CONCLUSIONS: The outcome of our social media-derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient's journey that can help health care providers anticipate future needs.
INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2022.12.14.22283419.
PMID:37725432 | DOI:10.2196/45767
A web-based care assistant for caregivers of the elderly: Development and pilot study
Digit Health. 2023 Sep 11;9:20552076231200976. doi: 10.1177/20552076231200976. eCollection 2023 Jan-Dec.
ABSTRACT
BACKGROUND: The aging population in Korea has driven a surge in demand for elderly care services, leading to significant growth in elderly welfare facilities, particularly Adult Daycare Centers (ADCs). However, despite advancements in care facilities, caregivers continue to face challenges in providing suitable elderly care due to difficulties arising from gaps in the latest information on the elderly and their coping abilities.
OBJECTIVE: The objective of this study is to develop and evaluate the effectiveness of the elderly care assistant system, which facilitates the sharing of information and knowledge necessary for elderly care among caregivers.
METHODS: The ECA system was designed to support knowledge sharing through a knowledge management system based on an ontological knowledge model, with a web-based user interface for improved accessibility. A field trial was conducted at ADC in Seoul from August 17 to September 21, with eight caregivers participating. A mixed-methods approach, involving both surveys and interviews, was employed to gauge the ECA system's effectiveness.
RESULTS: The study found that the use of the ECA was beneficial in promoting knowledge sharing among caregivers. Additionally, caregivers noted the potential benefits of using the ECA in conjunction with family caregivers, who can offer additional information and perspectives on elderly care.
CONCLUSIONS: This study presents preliminary evidence of the potential benefits of a care knowledge sharing system among various caregivers in elderly care. Although the elderly care assistant effectively promotes knowledge sharing, more research is needed to fully understand its impact on elderly care outcomes.
PMID:37706021 | PMC:PMC10496464 | DOI:10.1177/20552076231200976
Modelling digital health data: The ExaMode ontology for computational pathology
J Pathol Inform. 2023 Aug 22;14:100332. doi: 10.1016/j.jpi.2023.100332. eCollection 2023.
ABSTRACT
Computational pathology can significantly benefit from ontologies to standardize the employed nomenclature and help with knowledge extraction processes for high-quality annotated image datasets. The end goal is to reach a shared model for digital pathology to overcome data variability and integration problems. Indeed, data annotation in such a specific domain is still an unsolved challenge and datasets cannot be steadily reused in diverse contexts due to heterogeneity issues of the adopted labels, multilingualism, and different clinical practices.
MATERIAL AND METHODS: This paper presents the ExaMode ontology, modeling the histopathology process by considering 3 key cancer diseases (colon, cervical, and lung tumors) and celiac disease. The ExaMode ontology has been designed bottom-up in an iterative fashion with continuous feedback and validation from pathologists and clinicians. The ontology is organized into 5 semantic areas that defines an ontological template to model any disease of interest in histopathology.
RESULTS: The ExaMode ontology is currently being used as a common semantic layer in: (i) an entity linking tool for the automatic annotation of medical records; (ii) a web-based collaborative annotation tool for histopathology text reports; and (iii) a software platform for building holistic solutions integrating multimodal histopathology data.
DISCUSSION: The ontology ExaMode is a key means to store data in a graph database according to the RDF data model. The creation of an RDF dataset can help develop more accurate algorithms for image analysis, especially in the field of digital pathology. This approach allows for seamless data integration and a unified query access point, from which we can extract relevant clinical insights about the considered diseases using SPARQL queries.
PMID:37705689 | PMC:PMC10495665 | DOI:10.1016/j.jpi.2023.100332
Numeracy and literacy attainment of children exposed to maternal incarceration and other adversities: A linked data study
J Sch Psychol. 2023 Oct;100:101241. doi: 10.1016/j.jsp.2023.101241. Epub 2023 Aug 18.
ABSTRACT
Parental incarceration has been associated with educational disadvantages for children, such as lower educational attainment, increased grade retention, and truancy and suspensions. However, children exposed to parental incarceration often experience other adversities that are also associated with educational disadvantage; the contribution of these co-occurring adversities has not been considered in previous research. This study aimed to investigate the educational outcomes of children exposed to (a) maternal incarceration alone and (b) maternal incarceration plus other adversities (i.e., maternal mental illness and/or child protective services [CPS] contact). We used linked administrative data for a sample of children whose mothers were incarcerated during the children's childhood (i.e., from the time of mother's pregnancy through the child's 18th birthday; n = 3828) and a comparison group of children whose mothers had not been incarcerated (n = 9570). Multivariate multinomial logistic regressions examined the association between exposure to the three adversities (i.e., maternal incarceration, maternal mental illness, and child CPS contact) and above or below average reading and numeracy attainment in Grades 3, 5, 7 and 9. At all grade levels, children exposed to maternal incarceration alone and those exposed to maternal incarceration plus other adversities had increased odds of below average numeracy and reading attainment and decreased odds of above average numeracy and reading attainment compared to children without any of the recorded exposures. Children exposed to maternal incarceration and CPS contact and those exposed to all three adversities had increased odds of below average reading and numeracy attainment compared to children exposed to maternal incarceration alone. The findings highlight the complex needs of children of incarcerated mothers that must be considered when designing and delivering educational support programs. These children would benefit from the implementation of multi-tiered, trauma-informed educational and clinical services.
PMID:37689438 | DOI:10.1016/j.jsp.2023.101241
Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland
Int J Popul Data Sci. 2023 May 11;8(1):2113. doi: 10.23889/ijpds.v8i1.2113. eCollection 2023.
ABSTRACT
INTRODUCTION: "Big data" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.
OBJECTIVES: We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.
METHODS: We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.
RESULTS: Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.
CONCLUSIONS: Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.
PMID:37670953 | PMC:PMC10476150 | DOI:10.23889/ijpds.v8i1.2113
Development of an integrated and inferenceable RDF database of glycan, pathogen and disease resources
Sci Data. 2023 Sep 6;10(1):582. doi: 10.1038/s41597-023-02442-2.
ABSTRACT
Glycans are known to play extremely important roles in infections by viruses and pathogens. In fact, the SARS-CoV-2 virus has been shown to have evolved due to a single change in glycosylation. However, data resources on glycans, pathogens and diseases are not well organized. To accurately obtain such information from these various resources, we have constructed a foundation for discovering glycan and virus interaction data using Semantic Web technologies to be able to semantically integrate such heterogeneous data. Here, we created an ontology to encapsulate the semantics of virus-glycan interactions, and used Resource Description Framework (RDF) to represent the data we obtained from non-RDF related databases and data associated with literature. These databases include PubChem, SugarBind, and PSICQUIC, which made it possible to refer to other RDF resources such as UniProt and GlyTouCan. We made these data publicly available as open data and provided a service that allows anyone to freely perform searches using SPARQL. In addition, the RDF resources created in this study are available at the GlyCosmos Portal.
PMID:37673902 | DOI:10.1038/s41597-023-02442-2
PO2/TransformON, an ontology for data integration on food, feed, bioproducts and biowaste engineering
NPJ Sci Food. 2023 Sep 4;7(1):47. doi: 10.1038/s41538-023-00221-2.
ABSTRACT
We are witnessing an acceleration of the global drive to converge consumption and production patterns towards a more circular and sustainable approach to the food system. To address the challenge of reconnecting agriculture, environment, food and health, collections of large datasets must be exploited. However, building high-capacity data-sharing networks means unlocking the information silos that are caused by a multiplicity of local data dictionaries. To solve the data harmonization problem, we proposed an ontology on food, feed, bioproducts, and biowastes engineering for data integration in a circular bioeconomy and nexus-oriented approach. This ontology is based on a core model representing a generic process, the Process and Observation Ontology (PO2), which has been specialized to provide the vocabulary necessary to describe any biomass transformation process and to characterize the food, bioproducts, and wastes derived from these processes. Much of this vocabulary comes from transforming authoritative references such as the European food classification system (FoodEx2), the European Waste Catalogue, and other international nomenclatures into a semantic, world wide web consortium (W3C) format that provides system interoperability and software-driven intelligence. We showed the relevance of this new domain ontology PO2/TransformON through several concrete use cases in the fields of process engineering, bio-based composite making, food ecodesign, and relations with consumer's perception and preferences. Further works will aim to align with other ontologies to create an ontology network for bridging the gap between upstream and downstream processes in the food system.
PMID:37666867 | DOI:10.1038/s41538-023-00221-2
Automatic transparency evaluation for open knowledge extraction systems
J Biomed Semantics. 2023 Aug 31;14(1):12. doi: 10.1186/s13326-023-00293-9.
ABSTRACT
BACKGROUND: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities.
RESULTS: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency.
CONCLUSIONS: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing.
PMID:37653549 | DOI:10.1186/s13326-023-00293-9
Using linked data to identify pathways of reporting overdose events in British Columbia, 2015-2017
Int J Popul Data Sci. 2022 Oct 26;7(1):1708. doi: 10.23889/ijpds.v7i1.1708. eCollection 2022.
ABSTRACT
INTRODUCTION: Overdose events related to illicit opioids and other substances are a public health crisis in Canada. The BC Provincial Overdose Cohort is a collection of linked datasets identifying drug-related toxicity events, including death, ambulance, emergency room, hospital, and physician records. The datasets were brought together to understand factors associated with drug-related overdose and can also provide information on pathways of care among people who experience an overdose.
OBJECTIVES: To describe pathways of recorded healthcare use for overdose events in British Columbia, Canada and discrepancies between data sources.
METHODS: Using the BC Provincial Overdose Cohort spanning 2015 to 2017, we examined pathways of recorded health care use for overdose through the framework of an injury reporting pyramid. We also explored differences in event capture between linked datasets.
RESULTS: In the cohort, a total of 34,113 fatal and non-fatal overdose events were identified. A total of 3,056 people died of overdose. Nearly 80% of these deaths occurred among those with no contact with the healthcare system. The majority of events with healthcare records included contact with EHS services (72%), while 39% were seen in the ED and only 7% were hospitalized. Pathways of care from EHS services to ED and hospitalization were generally observed. However, not all ED visits had an associated EHS record and some hospitalizations following an ED visit were for other health issues.
CONCLUSIONS: These findings emphasize the importance of accessing timely healthcare for people experiencing overdose. These findings can be applied to understanding pathways of care for people who experience overdose events and estimating the total burden of healthcare-attended overdose events.
HIGHLIGHTS: In British Columbia, Canada:Multiple sources of linked administrative health data were leveraged to understand recorded healthcare use among people with fatal and non-fatal overdose eventsThe majority of fatal overdose events occurred with no contact with the healthcare system and only appear in mortality dataMany non-fatal overdose events were captured in data from emergency health services, emergency departments, and hospital recordsAccessing timely healthcare services is critical for people experiencing overdose.
PMID:37650030 | PMC:PMC10464869 | DOI:10.23889/ijpds.v7i1.1708