Semantic Web

Using Social Media to Help Understand Patient-Reported Health Outcomes of Post-COVID-19 Condition: Natural Language Processing Approach

Tue, 2023-09-19 06:00

J Med Internet Res. 2023 Sep 19;25:e45767. doi: 10.2196/45767.

ABSTRACT

BACKGROUND: While scientific knowledge of post-COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians.

OBJECTIVE: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline's potential as a surveillance tool.

METHODS: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries.

RESULTS: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada.

CONCLUSIONS: The outcome of our social media-derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient's journey that can help health care providers anticipate future needs.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2022.12.14.22283419.

PMID:37725432 | DOI:10.2196/45767

Categories: Literature Watch

A web-based care assistant for caregivers of the elderly: Development and pilot study

Thu, 2023-09-14 06:00

Digit Health. 2023 Sep 11;9:20552076231200976. doi: 10.1177/20552076231200976. eCollection 2023 Jan-Dec.

ABSTRACT

BACKGROUND: The aging population in Korea has driven a surge in demand for elderly care services, leading to significant growth in elderly welfare facilities, particularly Adult Daycare Centers (ADCs). However, despite advancements in care facilities, caregivers continue to face challenges in providing suitable elderly care due to difficulties arising from gaps in the latest information on the elderly and their coping abilities.

OBJECTIVE: The objective of this study is to develop and evaluate the effectiveness of the elderly care assistant system, which facilitates the sharing of information and knowledge necessary for elderly care among caregivers.

METHODS: The ECA system was designed to support knowledge sharing through a knowledge management system based on an ontological knowledge model, with a web-based user interface for improved accessibility. A field trial was conducted at ADC in Seoul from August 17 to September 21, with eight caregivers participating. A mixed-methods approach, involving both surveys and interviews, was employed to gauge the ECA system's effectiveness.

RESULTS: The study found that the use of the ECA was beneficial in promoting knowledge sharing among caregivers. Additionally, caregivers noted the potential benefits of using the ECA in conjunction with family caregivers, who can offer additional information and perspectives on elderly care.

CONCLUSIONS: This study presents preliminary evidence of the potential benefits of a care knowledge sharing system among various caregivers in elderly care. Although the elderly care assistant effectively promotes knowledge sharing, more research is needed to fully understand its impact on elderly care outcomes.

PMID:37706021 | PMC:PMC10496464 | DOI:10.1177/20552076231200976

Categories: Literature Watch

Modelling digital health data: The ExaMode ontology for computational pathology

Thu, 2023-09-14 06:00

J Pathol Inform. 2023 Aug 22;14:100332. doi: 10.1016/j.jpi.2023.100332. eCollection 2023.

ABSTRACT

Computational pathology can significantly benefit from ontologies to standardize the employed nomenclature and help with knowledge extraction processes for high-quality annotated image datasets. The end goal is to reach a shared model for digital pathology to overcome data variability and integration problems. Indeed, data annotation in such a specific domain is still an unsolved challenge and datasets cannot be steadily reused in diverse contexts due to heterogeneity issues of the adopted labels, multilingualism, and different clinical practices.

MATERIAL AND METHODS: This paper presents the ExaMode ontology, modeling the histopathology process by considering 3 key cancer diseases (colon, cervical, and lung tumors) and celiac disease. The ExaMode ontology has been designed bottom-up in an iterative fashion with continuous feedback and validation from pathologists and clinicians. The ontology is organized into 5 semantic areas that defines an ontological template to model any disease of interest in histopathology.

RESULTS: The ExaMode ontology is currently being used as a common semantic layer in: (i) an entity linking tool for the automatic annotation of medical records; (ii) a web-based collaborative annotation tool for histopathology text reports; and (iii) a software platform for building holistic solutions integrating multimodal histopathology data.

DISCUSSION: The ontology ExaMode is a key means to store data in a graph database according to the RDF data model. The creation of an RDF dataset can help develop more accurate algorithms for image analysis, especially in the field of digital pathology. This approach allows for seamless data integration and a unified query access point, from which we can extract relevant clinical insights about the considered diseases using SPARQL queries.

PMID:37705689 | PMC:PMC10495665 | DOI:10.1016/j.jpi.2023.100332

Categories: Literature Watch

Numeracy and literacy attainment of children exposed to maternal incarceration and other adversities: A linked data study

Sat, 2023-09-09 06:00

J Sch Psychol. 2023 Oct;100:101241. doi: 10.1016/j.jsp.2023.101241. Epub 2023 Aug 18.

ABSTRACT

Parental incarceration has been associated with educational disadvantages for children, such as lower educational attainment, increased grade retention, and truancy and suspensions. However, children exposed to parental incarceration often experience other adversities that are also associated with educational disadvantage; the contribution of these co-occurring adversities has not been considered in previous research. This study aimed to investigate the educational outcomes of children exposed to (a) maternal incarceration alone and (b) maternal incarceration plus other adversities (i.e., maternal mental illness and/or child protective services [CPS] contact). We used linked administrative data for a sample of children whose mothers were incarcerated during the children's childhood (i.e., from the time of mother's pregnancy through the child's 18th birthday; n = 3828) and a comparison group of children whose mothers had not been incarcerated (n = 9570). Multivariate multinomial logistic regressions examined the association between exposure to the three adversities (i.e., maternal incarceration, maternal mental illness, and child CPS contact) and above or below average reading and numeracy attainment in Grades 3, 5, 7 and 9. At all grade levels, children exposed to maternal incarceration alone and those exposed to maternal incarceration plus other adversities had increased odds of below average numeracy and reading attainment and decreased odds of above average numeracy and reading attainment compared to children without any of the recorded exposures. Children exposed to maternal incarceration and CPS contact and those exposed to all three adversities had increased odds of below average reading and numeracy attainment compared to children exposed to maternal incarceration alone. The findings highlight the complex needs of children of incarcerated mothers that must be considered when designing and delivering educational support programs. These children would benefit from the implementation of multi-tiered, trauma-informed educational and clinical services.

PMID:37689438 | DOI:10.1016/j.jsp.2023.101241

Categories: Literature Watch

Lessons learned from using linked administrative data to evaluate the Family Nurse Partnership in England and Scotland

Wed, 2023-09-06 06:00

Int J Popul Data Sci. 2023 May 11;8(1):2113. doi: 10.23889/ijpds.v8i1.2113. eCollection 2023.

ABSTRACT

INTRODUCTION: "Big data" - including linked administrative data - can be exploited to evaluate interventions for maternal and child health, providing time- and cost-effective alternatives to randomised controlled trials. However, using these data to evaluate population-level interventions can be challenging.

OBJECTIVES: We aimed to inform future evaluations of complex interventions by describing sources of bias, lessons learned, and suggestions for improvements, based on two observational studies using linked administrative data from health, education and social care sectors to evaluate the Family Nurse Partnership (FNP) in England and Scotland.

METHODS: We first considered how different sources of potential bias within the administrative data could affect results of the evaluations. We explored how each study design addressed these sources of bias using maternal confounders captured in the data. We then determined what additional information could be captured at each step of the complex intervention to enable analysts to minimise bias and maximise comparability between intervention and usual care groups, so that any observed differences can be attributed to the intervention.

RESULTS: Lessons learned include the need for i) detailed data on intervention activity (dates/geography) and usual care; ii) improved information on data linkage quality to accurately characterise control groups; iii) more efficient provision of linked data to ensure timeliness of results; iv) better measurement of confounding characteristics affecting who is eligible, approached and enrolled.

CONCLUSIONS: Linked administrative data are a valuable resource for evaluations of the FNP national programme and other complex population-level interventions. However, information on local programme delivery and usual care are required to account for biases that characterise those who receive the intervention, and to inform understanding of mechanisms of effect. National, ongoing, robust evaluations of complex public health evaluations would be more achievable if programme implementation was integrated with improved national and local data collection, and robust quasi-experimental designs.

PMID:37670953 | PMC:PMC10476150 | DOI:10.23889/ijpds.v8i1.2113

Categories: Literature Watch

Development of an integrated and inferenceable RDF database of glycan, pathogen and disease resources

Wed, 2023-09-06 06:00

Sci Data. 2023 Sep 6;10(1):582. doi: 10.1038/s41597-023-02442-2.

ABSTRACT

Glycans are known to play extremely important roles in infections by viruses and pathogens. In fact, the SARS-CoV-2 virus has been shown to have evolved due to a single change in glycosylation. However, data resources on glycans, pathogens and diseases are not well organized. To accurately obtain such information from these various resources, we have constructed a foundation for discovering glycan and virus interaction data using Semantic Web technologies to be able to semantically integrate such heterogeneous data. Here, we created an ontology to encapsulate the semantics of virus-glycan interactions, and used Resource Description Framework (RDF) to represent the data we obtained from non-RDF related databases and data associated with literature. These databases include PubChem, SugarBind, and PSICQUIC, which made it possible to refer to other RDF resources such as UniProt and GlyTouCan. We made these data publicly available as open data and provided a service that allows anyone to freely perform searches using SPARQL. In addition, the RDF resources created in this study are available at the GlyCosmos Portal.

PMID:37673902 | DOI:10.1038/s41597-023-02442-2

Categories: Literature Watch

PO2/TransformON, an ontology for data integration on food, feed, bioproducts and biowaste engineering

Mon, 2023-09-04 06:00

NPJ Sci Food. 2023 Sep 4;7(1):47. doi: 10.1038/s41538-023-00221-2.

ABSTRACT

We are witnessing an acceleration of the global drive to converge consumption and production patterns towards a more circular and sustainable approach to the food system. To address the challenge of reconnecting agriculture, environment, food and health, collections of large datasets must be exploited. However, building high-capacity data-sharing networks means unlocking the information silos that are caused by a multiplicity of local data dictionaries. To solve the data harmonization problem, we proposed an ontology on food, feed, bioproducts, and biowastes engineering for data integration in a circular bioeconomy and nexus-oriented approach. This ontology is based on a core model representing a generic process, the Process and Observation Ontology (PO2), which has been specialized to provide the vocabulary necessary to describe any biomass transformation process and to characterize the food, bioproducts, and wastes derived from these processes. Much of this vocabulary comes from transforming authoritative references such as the European food classification system (FoodEx2), the European Waste Catalogue, and other international nomenclatures into a semantic, world wide web consortium (W3C) format that provides system interoperability and software-driven intelligence. We showed the relevance of this new domain ontology PO2/TransformON through several concrete use cases in the fields of process engineering, bio-based composite making, food ecodesign, and relations with consumer's perception and preferences. Further works will aim to align with other ontologies to create an ontology network for bridging the gap between upstream and downstream processes in the food system.

PMID:37666867 | DOI:10.1038/s41538-023-00221-2

Categories: Literature Watch

Automatic transparency evaluation for open knowledge extraction systems

Thu, 2023-08-31 06:00

J Biomed Semantics. 2023 Aug 31;14(1):12. doi: 10.1186/s13326-023-00293-9.

ABSTRACT

BACKGROUND: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities.

RESULTS: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency.

CONCLUSIONS: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing.

PMID:37653549 | DOI:10.1186/s13326-023-00293-9

Categories: Literature Watch

Using linked data to identify pathways of reporting overdose events in British Columbia, 2015-2017

Thu, 2023-08-31 06:00

Int J Popul Data Sci. 2022 Oct 26;7(1):1708. doi: 10.23889/ijpds.v7i1.1708. eCollection 2022.

ABSTRACT

INTRODUCTION: Overdose events related to illicit opioids and other substances are a public health crisis in Canada. The BC Provincial Overdose Cohort is a collection of linked datasets identifying drug-related toxicity events, including death, ambulance, emergency room, hospital, and physician records. The datasets were brought together to understand factors associated with drug-related overdose and can also provide information on pathways of care among people who experience an overdose.

OBJECTIVES: To describe pathways of recorded healthcare use for overdose events in British Columbia, Canada and discrepancies between data sources.

METHODS: Using the BC Provincial Overdose Cohort spanning 2015 to 2017, we examined pathways of recorded health care use for overdose through the framework of an injury reporting pyramid. We also explored differences in event capture between linked datasets.

RESULTS: In the cohort, a total of 34,113 fatal and non-fatal overdose events were identified. A total of 3,056 people died of overdose. Nearly 80% of these deaths occurred among those with no contact with the healthcare system. The majority of events with healthcare records included contact with EHS services (72%), while 39% were seen in the ED and only 7% were hospitalized. Pathways of care from EHS services to ED and hospitalization were generally observed. However, not all ED visits had an associated EHS record and some hospitalizations following an ED visit were for other health issues.

CONCLUSIONS: These findings emphasize the importance of accessing timely healthcare for people experiencing overdose. These findings can be applied to understanding pathways of care for people who experience overdose events and estimating the total burden of healthcare-attended overdose events.

HIGHLIGHTS: In British Columbia, Canada:Multiple sources of linked administrative health data were leveraged to understand recorded healthcare use among people with fatal and non-fatal overdose eventsThe majority of fatal overdose events occurred with no contact with the healthcare system and only appear in mortality dataMany non-fatal overdose events were captured in data from emergency health services, emergency departments, and hospital recordsAccessing timely healthcare services is critical for people experiencing overdose.

PMID:37650030 | PMC:PMC10464869 | DOI:10.23889/ijpds.v7i1.1708

Categories: Literature Watch

Using data linkage to monitor COVID-19 vaccination: development of a vaccination linked data repository

Thu, 2023-08-31 06:00

Int J Popul Data Sci. 2022 Dec 15;5(4):1730. doi: 10.23889/ijpds.v5i4.1730. eCollection 2020.

ABSTRACT

The COVID-19 Vaccination Linked Data Repository (CVLDR) was established in 2021 to assist with the implementation and management of the COVID-19 vaccination program in the State of Western Australia (WA). The CVLDR contains a number of datasets including the Australian Immunisation Register, hospital admissions, emergency department attendances, notifiable infectious disease, and laboratory data. Datasets in the CVLDR are linked using a probabilistic method at the WA Department of Health. Quality assurance mechanisms have been established to identify and mitigate potential errors in the linkage. Each of the datasets has varying degrees of data quality and completeness, however most are of high standard, underpinned by legislation. The linking of the datasets within the CVLDR has allowed for increased public health utility in the immunisation program including the areas of vaccine safety, effectiveness, and coverage.

PMID:37649990 | PMC:PMC10464866 | DOI:10.23889/ijpds.v5i4.1730

Categories: Literature Watch

PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata

Wed, 2023-08-30 06:00

bioRxiv. 2023 Aug 18:2023.08.15.551388. doi: 10.1101/2023.08.15.551388. Preprint.

ABSTRACT

BACKGROUND: As biological data increases, we need additional infrastructure to share it and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important, and in some ways has a wider scope than sharing data itself.

RESULTS: Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data, or to share new data.

AVAILABILITY: https://pephub.databio.org.

PMID:37645717 | PMC:PMC10462087 | DOI:10.1101/2023.08.15.551388

Categories: Literature Watch

A geospatial source selector for federated GeoSPARQL querying

Wed, 2023-08-30 06:00

Open Res Eur. 2022 Oct 6;2:48. doi: 10.12688/openreseurope.14605.2. eCollection 2022.

ABSTRACT

Background: Geospatial linked data brings into the scope of the Semantic Web and its technologies, a wealth of datasets that combine semantically-rich descriptions of resources with their geo-location. There are, however, various Semantic Web technologies where technical work is needed in order to achieve the full integration of geospatial data, and federated query processing is one of these technologies. Methods: In this paper, we explore the idea of annotating data sources with a bounding polygon that summarizes the spatial extent of the resources in each data source, and of using such a summary as an (additional) source selection criterion in order to reduce the set of sources that will be tested as potentially holding relevant data. We present our source selection method, and we discuss its correctness and implementation. Results: We evaluate the proposed source selection using three different types of summaries with different degrees of accuracy, against not using geospatial summaries. We use datasets and queries from a practical use case that combines crop-type data with water availability data for food security. The experimental results suggest that more complex summaries lead to slower source selection times, but also to more precise exclusion of unneeded sources. Moreover, we observe the source selection runtime is (partially or fully) recovered by shorter planning and execution runtimes. As a result, the federated sources are not burdened by pointless querying from the federation engine. Conclusions: The evaluation draws on data and queries from the agroenvironmental domain and shows that our source selection method substantially improves the effectiveness of federated GeoSPARQL query processing.

PMID:37645331 | PMC:PMC10446020 | DOI:10.12688/openreseurope.14605.2

Categories: Literature Watch

Overcoming challenges in rare disease registry integration using the semantic web - a clinical research perspective

Tue, 2023-08-29 06:00

Orphanet J Rare Dis. 2023 Aug 29;18(1):253. doi: 10.1186/s13023-023-02841-z.

ABSTRACT

The growing number of disease-specific patient registries for rare diseases has highlighted the need for registry interoperability and data linkage, leading to large-scale rare disease data integration projects using Semantic Web based solutions. These technologies may be difficult to grasp for rare disease experts, leading to limited involvement by domain expertise in the data integration process. Here, we propose a data integration framework starting from the perspective of the clinical researcher, allowing for purposeful rare disease registry integration driven by clinical research questions.

PMID:37644439 | DOI:10.1186/s13023-023-02841-z

Categories: Literature Watch

Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review

Mon, 2023-08-28 06:00

J Med Internet Res. 2023 Aug 28;25:e45013. doi: 10.2196/45013.

ABSTRACT

BACKGROUND: Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains.

OBJECTIVE: This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data.

METHODS: The Arksey and O'Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines.

RESULTS: A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic.

CONCLUSIONS: This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/22505.

PMID:37639292 | DOI:10.2196/45013

Categories: Literature Watch

Infrastructure tools to support an effective Radiation Oncology Learning Health System

Fri, 2023-08-25 06:00

J Appl Clin Med Phys. 2023 Aug 25:e14127. doi: 10.1002/acm2.14127. Online ahead of print.

ABSTRACT

PURPOSE: Radiation Oncology Learning Health System (RO-LHS) is a promising approach to improve the quality of care by integrating clinical, dosimetry, treatment delivery, research data in real-time. This paper describes a novel set of tools to support the development of a RO-LHS and the current challenges they can address.

METHODS: We present a knowledge graph-based approach to map radiotherapy data from clinical databases to an ontology-based data repository using FAIR concepts. This strategy ensures that the data are easily discoverable, accessible, and can be used by other clinical decision support systems. It allows for visualization, presentation, and data analyses of valuable information to identify trends and patterns in patient outcomes. We designed a search engine that utilizes ontology-based keyword searching, synonym-based term matching that leverages the hierarchical nature of ontologies to retrieve patient records based on parent and children classes, connects to the Bioportal database for relevant clinical attributes retrieval. To identify similar patients, a method involving text corpus creation and vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) are employed, using cosine similarity and distance metrics.

RESULTS: The data pipeline and tool were tested with 1660 patient clinical and dosimetry records resulting in 504 180 RDF (Resource Description Framework) tuples and visualized data relationships using graph-based representations. Patient similarity analysis using embedding models showed that the Word2Vec model had the highest mean cosine similarity, while the GloVe model exhibited more compact embeddings with lower Euclidean and Manhattan distances.

CONCLUSIONS: The framework and tools described support the development of a RO-LHS. By integrating diverse data sources and facilitating data discovery and analysis, they contribute to continuous learning and improvement in patient care. The tools enhance the quality of care by enabling the identification of cohorts, clinical decision support, and the development of clinical studies and machine learning programs in radiation oncology.

PMID:37624227 | DOI:10.1002/acm2.14127

Categories: Literature Watch

Age differences in the neural processing of semantics, within and beyond the core semantic network

Mon, 2023-08-21 06:00

Neurobiol Aging. 2023 Nov;131:88-105. doi: 10.1016/j.neurobiolaging.2023.07.022. Epub 2023 Jul 25.

ABSTRACT

Aging is associated with functional activation changes in domain-specific regions and large-scale brain networks. This preregistered Functional magnetic resonance imaging (fMRI) study investigated these effects within the domain of semantic cognition. Participants completed 1 nonsemantic and 2 semantic tasks. We found no age differences in semantic activation in core semantic regions. However, the right inferior frontal gyrus showed difficulty-related increases in both age groups. This suggests that age-related upregulation of this area may be a compensatory response to increased processing demands. At a network level, older people showed more engagement in the default mode network and less in the executive multiple-demand network, aligning with older people's greater knowledge reserves and executive declines. In contrast, activation was age-invariant in semantic control regions. Finally, older adults showed reduced demand-related modulation of multiple-demand network activation in the nonsemantic task but not the semantic tasks. These findings provide a new perspective on the neural basis of semantic cognition in aging, suggesting that preserved function in specialized semantic networks may help to maintain semantic cognition in later life.

PMID:37603932 | DOI:10.1016/j.neurobiolaging.2023.07.022

Categories: Literature Watch

A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study

Fri, 2023-08-18 06:00

JMIR Form Res. 2023 Aug 18;7:e47434. doi: 10.2196/47434.

ABSTRACT

BACKGROUND: The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of the competitiveness of research institutions. Differences in authors' writing habits and spelling mistakes lead to various names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning-based institution name normalization model can increase the accuracy of institution name normalization at the semantic level.

OBJECTIVE: This study aimed to train a deep learning-based model for institution name normalization based on the feature fusion of affiliation data from multisource literature, which would realize the normalization of institution name variants with the help of authority files and achieve a high specification accuracy after several rounds of training and optimization.

METHODS: In this study, an institution name normalization-oriented model was trained based on bidirectional encoder representations from transformers (BERT) and other deep learning models, including the institution classification model, institutional hierarchical relation extraction model, and institution matching and merging model. The model was then trained to automatically learn institutional features by pretraining and fine-tuning, and institution names were extracted from the affiliation data of 3 databases to complete the normalization process: Dimensions, Web of Science, and Scopus.

RESULTS: It was found that the trained model could achieve at least 3 functions. First, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID. Second, it could identify the nonstandard institution name variants, such as singular forms, plural changes, and abbreviations, and update the authority files. Third, it could identify the unregistered institutions and add them to the authority files, so that when the institution appeared again, the model could identify and regard it as a registered institution. Moreover, the test results showed that the accuracy of the normalization model reached 93.79%, indicating the promising performance of the model for the normalization of institution names.

CONCLUSIONS: The deep learning-based institution name normalization model trained in this study exhibited high accuracy. Therefore, it could be widely applied in the evaluation of the competitiveness of research institutions, analysis of research fields of institutions, and construction of interinstitutional cooperation networks, among others, showing high application value.

PMID:37594844 | DOI:10.2196/47434

Categories: Literature Watch

Changes in comprehensiveness of services delivered by Canadian family physicians: Analysis of population-based linked data in 4 provinces

Tue, 2023-08-15 06:00

Can Fam Physician. 2023 Aug;69(8):550-556. doi: 10.46747/cfp.6908550.

ABSTRACT

OBJECTIVE: To describe changes in the comprehensiveness of services delivered by family physicians across service settings and service areas in 4 Canadian provinces, to identify which settings and areas have changed the most, and to compare the magnitude of changes by physician characteristics.

DESIGN: Descriptive analysis of province-wide, population-based billing data linked to population and physician registries.

SETTING: British Columbia, Manitoba, Ontario, and Nova Scotia.

PARTICIPANTS: Family physicians registered to practise in the 1999-2000 and 2017-2018 fiscal years.

MAIN OUTCOME MEASURES: Comprehensiveness was measured across 7 service settings (home care, long-term care, emergency departments, hospitals, obstetric care, surgical assistance, anesthesiology) and in 7 service areas consistent with office-based practice (prenatal and postnatal care, Papanicolaou testing, mental health, substance use, cancer care, minor surgery, palliative home visits). The proportion of physicians with activity in each setting and area are reported and the average number of service settings and areas by physician characteristics is described (years in practice, sex, urban or rural practice setting, and location of medical degree training).

RESULTS: Declines in comprehensiveness were observed across all provinces studied. Declines were greater for comprehensiveness of settings than for areas consistent with office-based practice. Changes were observed across all physician characteristics. On average across provinces, declines in the number of service settings and service areas were highest among physicians in practice 20 years or longer, male physicians, and physicians practising in urban areas.

CONCLUSION: Declining comprehensiveness was observed across all physician characteristics, pointing to changes in the practice and policy contexts in which all family physicians work.

PMID:37582603 | DOI:10.46747/cfp.6908550

Categories: Literature Watch

Comprehensive Ontology of Fibroproliferative Diseases: Protocol for a Semantic Technology Study

Fri, 2023-08-11 06:00

JMIR Res Protoc. 2023 Aug 11;12:e48645. doi: 10.2196/48645.

ABSTRACT

BACKGROUND: Fibroproliferative or fibrotic diseases (FDs), which represent a significant proportion of age-related pathologies and account for over 40% of mortality in developed nations, are often underrepresented in focused research. Typically, these conditions are studied individually, such as chronic obstructive pulmonary disease or idiopathic pulmonary fibrosis (IPF), rather than as a collective entity, thereby limiting the holistic understanding and development of effective treatments. To address this, we propose creating and publicizing a comprehensive fibroproliferative disease ontology (FDO) to unify the understanding of FDs.

OBJECTIVE: This paper aims to delineate the study protocol for the creation of the FDO, foster transparency and high quality standards during its development, and subsequently promote its use once it becomes publicly available.

METHODS: We aim to establish an ontology encapsulating the broad spectrum of FDs, constructed in the Web Ontology Language format using the Protégé ontology editor, adhering to ontology development life cycle principles. The modeling process will leverage Protégé in accordance with a methodologically defined process, involving targeted scoping reviews of MEDLINE and PubMed information, expert knowledge, and an ontology development process. A hybrid top-down and bottom-up strategy will guide the identification of core concepts and relations, conducted by a team of domain experts based on systematic iterations of scientific literature reviews.

RESULTS: The result will be an exhaustive FDO accommodating a wide variety of crucial biomedical concepts, augmented with synonyms, definitions, and references. The FDO aims to encapsulate diverse perspectives on the FD domain, including those of clinicians, health informaticians, medical researchers, and public health experts.

CONCLUSIONS: The FDO is expected to stimulate broader and more in-depth FD research by enabling reasoning, inference, and the identification of relationships between concepts for application in multiple contexts, such as developing specialized software, fostering research communities, and enhancing domain comprehension. A common vocabulary and understanding of relationships among medical professionals could potentially expedite scientific progress and the discovery of innovative solutions. The publicly available FDO will form the foundation for future research, technological advancements, and public health initiatives.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/48645.

PMID:37566458 | DOI:10.2196/48645

Categories: Literature Watch

Web content topic modeling using LDA and HTML tags

Mon, 2023-08-07 06:00

PeerJ Comput Sci. 2023 Jul 11;9:e1459. doi: 10.7717/peerj-cs.1459. eCollection 2023.

ABSTRACT

An immense volume of digital documents exists online and offline with content that can offer useful information and insights. Utilizing topic modeling enhances the analysis and understanding of digital documents. Topic modeling discovers latent semantic structures or topics within a set of digital textual documents. The Internet of Things, Blockchain, recommender system, and search engine optimization applications use topic modeling to handle data mining tasks, such as classification and clustering. The usefulness of topic models depends on the quality of resulting term patterns and topics with high quality. Topic coherence is the standard metric to measure the quality of topic models. Previous studies build topic models to generally work on conventional documents, and they are insufficient and underperform when applied to web content data due to differences in the structure of the conventional and HTML documents. Neglecting the unique structure of web content leads to missing otherwise coherent topics and, therefore, low topic quality. This study aims to propose an innovative topic model to learn coherence topics in web content data. We present the HTML Topic Model (HTM), a web content topic model that takes into consideration the HTML tags to understand the structure of web pages. We conducted two series of experiments to demonstrate the limitations of the existing topic models and examine the topic coherence of the HTM against the widely used Latent Dirichlet Allocation (LDA) model and its variants, namely the Correlated Topic Model, the Dirichlet Multinomial Regression, the Hierarchical Dirichlet Process, the Hierarchical Latent Dirichlet Allocation, the pseudo-document based Topic Model, and the Supervised Latent Dirichlet Allocation models. The first experiment demonstrates the limitations of the existing topic models when applied to web content data and, therefore, the essential need for a web content topic model. When applied to web data, the overall performance dropped an average of five times and, in some cases, up to approximately 20 times lower than when applied to conventional data. The second experiment then evaluates the effectiveness of the HTM model in discovering topics and term patterns of web content data. The HTM model achieved an overall 35% improvement in topic coherence compared to the LDA.

PMID:37547394 | PMC:PMC10403181 | DOI:10.7717/peerj-cs.1459

Categories: Literature Watch

Pages