Semantic Web
Reflections on modern methods: linkage error bias.
Reflections on modern methods: linkage error bias.
Int J Epidemiol. 2019 12 01;48(6):2050-2060
Authors: Doidge JC, Harron KL
Abstract
Linked data are increasingly being used for epidemiological research, to enhance primary research, and in planning, monitoring and evaluating public policy and services. Linkage error (missed links between records that relate to the same person or false links between unrelated records) can manifest in many ways: as missing data, measurement error and misclassification, unrepresentative sampling, or as a special combination of these that is specific to analysis of linked data: the merging and splitting of people that can occur when two hospital admission records are counted as one person admitted twice if linked and two people admitted once if not. Through these mechanisms, linkage error can ultimately lead to information bias and selection bias; so identifying relevant mechanisms is key in quantitative bias analysis. In this article we introduce five key concepts and a study classification system for identifying which mechanisms are relevant to any given analysis. We provide examples and discuss options for estimating parameters for bias analysis. This conceptual framework provides the 'links' between linkage error, information bias and selection bias, and lays the groundwork for quantitative bias analysis for linkage error.
PMID: 31633184 [PubMed - indexed for MEDLINE]
A scientometric analysis of birth cohorts in South Asia: Way forward for Pakistan.
A scientometric analysis of birth cohorts in South Asia: Way forward for Pakistan.
PLoS One. 2020;15(7):e0235385
Authors: Waqas A, Zafar S, Lawlor DA, Wright J, Hafeez A, Ahmad I, Sikander S, Rahman A
Abstract
The present study aims to: a) systematically map the of birth cohort studies from the South Asian region b) examine the major research foci and landmark contributions from these cohorts using reproducible scientometric techniques and c) offer recommendations on establishing new birth cohorts in Pakistan, building upon the strengths, weaknesses and gaps of previous cohorts. Bibliographic records for a total of 260 articles, published during through December 2018, were retrieved from the Web of Science (core database). All data were analysed using Microsoft Excel (2013), Web of Science platform and CiteSpace. A series of network analysis were then run for each time-period using the link reduction method and pathfinder network scaling. The co-cited articles were clustered into their homogeneous research clusters. The clusters were named using the Latent Semantic Indexing (LSI) method that utilized author keywords as source of names for these clusters. The scientometric analyses of original research output from these birth cohorts also paint a pessimistic landscape in Pakistan- where Pakistani sites for birth cohorts contributed only 31 publications; a majority of these utilized the MAL-ED birth cohort data. A majority of original studies were published from birth cohorts in India (156), Bangladesh (63), and Nepal (15). Out of these contributions, 31 studies reported data from multiple countries. The three major birth cohorts include prospective and multi-country MAL-ED birth cohort and The Pakistan Early Childhood Development Scale Up Trial, and a retrospective Maternal and infant nutrition intervention cohort. In addition to these, a few small-scale birth cohorts reported findings pertaining to neonatal sepsis, intrauterine growth retardation and its effects on linear growth of children and environmental enteropathy.
PMID: 32645067 [PubMed - as supplied by publisher]
Enabling ad-hoc reuse of private data repositories through schema extraction.
Enabling ad-hoc reuse of private data repositories through schema extraction.
J Biomed Semantics. 2020 Jul 08;11(1):6
Authors: Gleim LC, Karim MR, Zimmermann L, Kohlbacher O, Stenzhorn H, Decker S, Beyan O
Abstract
BACKGROUND: Sharing sensitive data across organizational boundaries is often significantly limited by legal and ethical restrictions. Regulations such as the EU General Data Protection Rules (GDPR) impose strict requirements concerning the protection of personal and privacy sensitive data. Therefore new approaches, such as the Personal Health Train initiative, are emerging to utilize data right in their original repositories, circumventing the need to transfer data.
RESULTS: Circumventing limitations of previous systems, this paper proposes a configurable and automated schema extraction and publishing approach, which enables ad-hoc SPARQL query formulation against RDF triple stores without requiring direct access to the private data. The approach is compatible with existing Semantic Web-based technologies and allows for the subsequent execution of such queries in a safe setting under the data provider's control. Evaluation with four distinct datasets shows that a configurable amount of concise and task-relevant schema, closely describing the structure of the underlying data, was derived, enabling the schema introspection-assisted authoring of SPARQL queries.
CONCLUSIONS: Automatically extracting and publishing data schema can enable the introspection-assisted creation of data selection and integration queries. In conjunction with the presented system architecture, this approach can enable reuse of data from private repositories and in settings where agreeing upon a shared schema and encoding a priori is infeasible. As such, it could provide an important step towards reuse of data from previously inaccessible sources and thus towards the proliferation of data-driven methods in the biomedical domain.
PMID: 32641124 [PubMed - in process]
CLIN-IK-LINKS: A platform for the design and execution of clinical data transformation and reasoning workflows.
CLIN-IK-LINKS: A platform for the design and execution of clinical data transformation and reasoning workflows.
Comput Methods Programs Biomed. 2020 Jun 25;197:105616
Authors: Maldonado JA, Marcos M, Fernández-Breis JT, Giménez-Solano VM, Legaz-García MDC, Martínez-Salvador B
Abstract
BACKGROUND AND OBJECTIVE: Effective sharing and reuse of Electronic Health Records (EHR) requires technological solutions which deal with different representations and different models of data. This includes information models, domain models and, ideally, inference models, which enable clinical decision support based on a knowledge base and facts. Our goal is to develop a framework to support EHR interoperability based on transformation and reasoning services intended for clinical data and knowledge.
METHODS: Our framework is based on workflows whose primary components are reusable mappings. Key features are an integrated representation, storage, and exploitation of different types of mappings for clinical data transformation purposes, as well as the support for the discovery of new workflows. The current framework supports mappings which take advantage of the best features of EHR standards and ontologies. Our proposal is based on our previous results and experience working with both technological infrastructures.
RESULTS: We have implemented CLIN-IK-LINKS, a web-based platform that enables users to create, modify and delete mappings as well as to define and execute workflows. The platform has been applied in two use cases: semantic publishing of clinical laboratory test results; and implementation of two colorectal cancer screening protocols. Real data have been used in both use cases.
CONCLUSIONS: The CLIN-IK-LINKS platform allows the composition and execution of clinical data transformation workflows to convert EHR data into EHR and/or semantic web standards. Having proved its usefulness to implement clinical data transformation applications of interest, CLIN-IK-LINKS can be regarded as a valuable contribution to improve the semantic interoperability of EHR systems.
PMID: 32629294 [PubMed - as supplied by publisher]
Data Integration in the Brazilian Public Health System for Tuberculosis: Use of the Semantic Web to Establish Interoperability.
Data Integration in the Brazilian Public Health System for Tuberculosis: Use of the Semantic Web to Establish Interoperability.
JMIR Med Inform. 2020 Jul 06;8(7):e17176
Authors: Pellison FC, Rijo RPCL, Lima VC, Crepaldi NY, Bernardi FA, Galliez RM, Kritski A, Abhishek K, Alves D
Abstract
BACKGROUND: Interoperability of health information systems is a challenge due to the heterogeneity of existing systems at both the technological and semantic levels of their data. The lack of existing data about interoperability disrupts intra-unit and inter-unit medical operations as well as creates challenges in conducting studies on existing data. The goal is to exchange data while providing the same meaning for data from different sources.
OBJECTIVE: To find ways to solve this challenge, this research paper proposes an interoperability solution for the tuberculosis treatment and follow-up scenario in Brazil using Semantic Web technology supported by an ontology.
METHODS: The entities of the ontology were allocated under the definitions of Basic Formal Ontology. Brazilian tuberculosis applications were tagged with entities from the resulting ontology.
RESULTS: An interoperability layer was developed to retrieve data with the same meaning and in a structured way enabling semantic and functional interoperability.
CONCLUSIONS: Health professionals could use the data gathered from several data sources to enhance the effectiveness of their actions and decisions, as shown in a practical use case to integrate tuberculosis data in the State of São Paulo.
PMID: 32628611 [PubMed - as supplied by publisher]
A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL.
A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL.
F1000Res. 2019;8:1822
Authors: Sima AC, Dessimoz C, Stockinger K, Zahn-Zabal M, Mendes de Farias T
Abstract
The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the SPARQL query language. In this article, we provide a hands-on introduction to querying evolutionary data across multiple sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different sources can be compared, through the use of federated SPARQL queries.
PMID: 32612807 [PubMed - in process]
Semantic Web of Things (SWoT) for Global Infectious Disease Control and Prevention.
Semantic Web of Things (SWoT) for Global Infectious Disease Control and Prevention.
Stud Health Technol Inform. 2020 Jun 26;272:425-428
Authors: Shaban-Nejad A, Brenas JH, Al Manir MS, Zinszer K, Baker CJO
Abstract
This paper reports on the early-stage development of an analytics framework to support the semantic integration of dynamic surveillance data across multiple scales to inform decision making for malaria eradication. We propose using the Semantic Web of Things (SWoT), a combination of Internet of Things (IoT) and semantic web technologies, to support the evolution and integration of dynamic malaria data sources and improve interoperability between different datasets generated through relevant IoT assets (e.g. computers, sensors, persons, and other smart objects and devices).
PMID: 32604693 [PubMed - in process]
Towards an Architecture for the Interoperability of Hospital Information Systems in Burkina Faso.
Towards an Architecture for the Interoperability of Hospital Information Systems in Burkina Faso.
Stud Health Technol Inform. 2020 Jun 26;272:159-162
Authors: Tapsoba LS, Traore Y, Malo S
Abstract
The successful introduction of ICTs into medical practice is a key factor in improving the performance of any health system for both patients and healthcare professionals. In Burkina Faso, many hospital information systems (HIS) have been developed and are already widely used in large health centers with proven efficiency. To improve the quality of patient care, these hospital information systems should exchange information. Interoperability is one of the privileged ways to improve the integration of different systems because nowadays a HIS is no longer just a single monolithic software system, which is run on a single machine. This paper presents a semantic interoperability architecture, which is based on a mediation approach. The mediator implements local domain ontologies for each HIS, a knowledge base, and a referential ontology which is used as a semantic repository and web services.
PMID: 32604625 [PubMed - in process]
Structural variant analysis for linked-read sequencing data with gemtools.
Structural variant analysis for linked-read sequencing data with gemtools.
Bioinformatics. 2019 11 01;35(21):4397-4399
Authors: Greer SU, Ji HP
Abstract
SUMMARY: Linked-read sequencing generates synthetic long reads which are useful for the detection and analysis of structural variants (SVs). The software associated with 10× Genomics linked-read sequencing, Long Ranger, generates the essential output files (BAM, VCF, SV BEDPE) necessary for downstream analyses. However, to perform downstream analyses requires the user to customize their own tools to handle the unique features of linked-read sequencing data. Here, we describe gemtools, a collection of tools for the downstream and in-depth analysis of SVs from linked-read data. Gemtools uses the barcoded aligned reads and the Megabase-scale phase blocks to determine haplotypes of SV breakpoints and delineate complex breakpoint configurations at the resolution of single DNA molecules. The gemtools package is a suite of tools that provides the user with the flexibility to perform basic functions on their linked-read sequencing output in order to address even more questions.
AVAILABILITY AND IMPLEMENTATION: The gemtools package is freely available for download at: https://github.com/sgreer77/gemtools.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID: 30938757 [PubMed - indexed for MEDLINE]
Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study.
Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study.
JMIR Med Inform. 2020 Jun 23;8(6):e17650
Authors: Li G, Li B, Huang L, Hou S
Abstract
BACKGROUND: According to a World Health Organization report in 2017, there was almost one patient with depression among every 20 people in China. However, the diagnosis of depression is usually difficult in terms of clinical detection owing to slow observation, high cost, and patient resistance. Meanwhile, with the rapid emergence of social networking sites, people tend to share their daily life and disclose inner feelings online frequently, making it possible to effectively identify mental conditions using the rich text information. There are many achievements regarding an English web-based corpus, but for research in China so far, the extraction of language features from web-related depression signals is still in a relatively primary stage.
OBJECTIVE: The purpose of this study was to propose an effective approach for constructing a depression-domain lexicon. This lexicon will contain language features that could help identify social media users who potentially have depression. Our study also compared the performance of detection with and without our lexicon.
METHODS: We autoconstructed a depression-domain lexicon using Word2Vec, a semantic relationship graph, and the label propagation algorithm. These two methods combined performed well in a specific corpus during construction. The lexicon was obtained based on 111,052 Weibo microblogs from 1868 users who were depressed or nondepressed. During depression detection, we considered six features, and we used five classification methods to test the detection performance.
RESULTS: The experiment results showed that in terms of the F1 value, our autoconstruction method performed 1% to 6% better than baseline approaches and was more effective and steadier. When applied to detection models like logistic regression and support vector machine, our lexicon helped the models outperform by 2% to 9% and was able to improve the final accuracy of potential depression detection.
CONCLUSIONS: Our depression-domain lexicon was proven to be a meaningful input for classification algorithms, providing linguistic insights on the depressive status of test subjects. We believe that this lexicon will enhance early depression detection in people on social media. Future work will need to be carried out on a larger corpus and with more complex methods.
PMID: 32574151 [PubMed - as supplied by publisher]
Movie Review Summarization Using Supervised Learning and Graph-Based Ranking Algorithm.
Movie Review Summarization Using Supervised Learning and Graph-Based Ranking Algorithm.
Comput Intell Neurosci. 2020;2020:7526580
Authors: Khan A, Gul MA, Zareei M, Biswal RR, Zeb A, Naeem M, Saeed Y, Salim N
Abstract
With the growing information on web, online movie review is becoming a significant information resource for Internet users. However, online users post thousands of movie reviews on daily basis and it is hard for them to manually summarize the reviews. Movie review mining and summarization is one of the challenging tasks in natural language processing. Therefore, an automatic approach is desirable to summarize the lengthy movie reviews, and it will allow users to quickly recognize the positive and negative aspects of a movie. This study employs a feature extraction technique called bag of words (BoW) to extract features from movie reviews and represent the reviews as a vector space model or feature vector. The next phase uses Naïve Bayes machine learning algorithm to classify the movie reviews (represented as feature vector) into positive and negative. Next, an undirected weighted graph is constructed from the pairwise semantic similarities between classified review sentences in such a way that the graph nodes represent review sentences, while the edges of graph indicate semantic similarity weight. The weighted graph-based ranking algorithm (WGRA) is applied to compute the rank score for each review sentence in the graph. Finally, the top ranked sentences (graph nodes) are chosen based on highest rank scores to produce the extractive summary. Experimental results reveal that the proposed approach is superior to other state-of-the-art approaches.
PMID: 32565772 [PubMed - in process]
I know what you're probably going to say: Listener adaptation to variable use of uncertainty expressions.
I know what you're probably going to say: Listener adaptation to variable use of uncertainty expressions.
Cognition. 2020 Jun 10;203:104285
Authors: Schuster S, Degen J
Abstract
Pragmatic theories of utterance interpretation share the assumption that listeners reason about alternative utterances that a speaker could have produced, but didn't. For such reasoning to be successful, listeners must have precise expectations about a speaker's production choices. This is at odds with the considerable variability across speakers that exists at all levels of linguistic representation. This tension can be reconciled by listeners adapting to the statistics of individual speakers. While linguistic adaptation is increasingly widely attested, semantic/pragmatic adaptation is underexplored. Moreover, what kind of representations listeners update during semantic/pragmatic adaptation - estimates of the speaker's lexicon, or estimates of the speaker's utterance preferences - remains poorly understood. In this work, we investigate semantic/pragmatic adaptation in the domain of uncertainty expressions like might and probably. In a series of web-based experiments, we find 1) that listeners vary in their expectations about a generic speaker's use of uncertainty expressions; 2) that listeners rapidly update their expectations about the use of uncertainty expressions after brief exposure to a speaker with a specific usage of uncertainty expressions; and 3) that listeners' interpretations of uncertainty expressions change after being exposed to a specific speaker. We present a novel computational model of semantic/pragmatic adaptation based on Bayesian belief updating and show, through a series of model comparisons, that semantic/pragmatic adaptation is best captured by listeners updating their beliefs both about the speaker's lexicon and their utterance preferences. This work has implications for both semantic theories of uncertainty expressions and psycholinguistic theories of adaptation: it highlights the need for dynamic semantic representations and suggests that listeners integrate their general linguistic knowledge with speaker-specific experiences to arrive at more precise interpretations.
PMID: 32535344 [PubMed - as supplied by publisher]
Ontological framework for standardizing and digitizing clinical pathways in healthcare information systems.
Ontological framework for standardizing and digitizing clinical pathways in healthcare information systems.
Comput Methods Programs Biomed. 2020 Jun 01;196:105559
Authors: Alahmar A, Crupi ME, Benlamri R
Abstract
BACKGROUND AND OBJECTIVE: Most healthcare institutions are reorganizing their healthcare delivery systems based on Clinical Pathways (CPs). CPs are novel medical management plans to standardize medical activities, reduce cost, optimize resource usage, and improve the quality of service. However, most CPs are still paper-based and not fully integrated with Health Information Systems (HIS). More CP computerization research is therefore needed to fully benefit from CP's practical potentials. A major contribution of this research is the vision that CP systems deserve to be placed at the centre of HIS, because within CPs lies the very heart of medical planning, treatment and impressions, including healthcare quality and cost factors.
METHODS: An important contribution to the realization of this vision is to fully standardize and digitize CPs so that they become machine-readable and smoothly linkable across various HIS. To achieve this goal, this research proposes a framework for (i) CP knowledge representation and sharing using ontologies, (ii) CP standardization based on SNOMED CT and HL7, and (iii) CP digitization based on a novel coding system to encode CP data. To show the feasibility of the proposed framework we developed a prototype clinical pathway management system (CPMS) based on CPs currently in use at hospitals.
RESULTS: The results show that CPs can be fully standardized and digitized using SNOMED CT terms and codes, and the CPMS can work as an independent system, performing novel CP-related functions, including useful data analytics. CPs can be compared easily for auditing and quality management. Furthermore, the CPMS was smoothly linked to a hospital EMR and CP data were captured in EMR without any loss.
CONCLUSION: The proposed framework is promising and contributes toward solving major challenges related to CP standardization, digitization, and inclusion in today's modern computerized hospitals.
PMID: 32531654 [PubMed - as supplied by publisher]
The molecular entities in linked data dataset.
The molecular entities in linked data dataset.
Data Brief. 2020 Aug;31:105757
Authors: Tomaszuk D, Szeremeta Ł
Abstract
The Molecular Entities in Linked Data (MEiLD) dataset comprises data of distinct atoms, molecules, ions, ion pairs, radicals, radical ions, and others that can be identifiable as separately distinguishable chemical entities. The dataset is provided in a JSON-LD format and was generated by the SDFEater, a tool that allows parsing atoms, bonds, and other molecule data. MEiLD contains 349,960 of 'small' chemical entities. Our dataset is based on the SDF files and is enriched with additional ontologies and line notation data. As a basis, the Molecular Entities in Linked Data dataset uses the Resource Description Framework (RDF) data model. Saving the data in such a model allows preserving the semantic relations, like hierarchical and associative, between them. To describe chemical molecules, vocabularies such as Chemical Vocabulary for Molecular Entities (CVME) and Simple Knowledge Organization System (SKOS) are used. The dataset can be beneficial, among others, for people concerned with research and development tools for cheminformatics and bioinformatics. In this paper, we describe various methods of access to our dataset. In addition to the MEiLD dataset, we publish the Shapes Constraint Language (SHACL) schema of our dataset and the CVME ontology. The data is available in Mendeley Data.
PMID: 32529012 [PubMed]
FAIR-compliant clinical, radiomics and DICOM metadata of RIDER, Interobserver, Lung1 and Head-Neck1 TCIA collections.
FAIR-compliant clinical, radiomics and DICOM metadata of RIDER, Interobserver, Lung1 and Head-Neck1 TCIA collections.
Med Phys. 2020 Jun 10;:
Authors: Kalendralis P, Shi Z, Traverso A, Choudhury A, Sloep M, Zhovannik I, Starmans MPA, Grittner D, Feltens P, Monshouwer R, Klein S, Fijten R, Aerts H, Dekker A, van Soest J, Wee L
Abstract
PURPOSE: One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modelling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests for additional explanatory metadata on the following datasets - RIDER, Interobserver, Lung1 and Head-Neck1. To support repeatability, reproducibility, generalizability and transparency in radiomics research, we publish the subjects' clinical data, extracted radiomics features and Digital Imaging and Communications in Medicine (DICOM) headers of these four datasets with descriptive metadata, in order to be more compliant with findable, accessible, interoperable and re-usable (FAIR) data management principles.
ACQUISITION AND VALIDATION METHODS: Overall survival time intervals were updated using a national citizens registry after internal ethics board approval. Spatial offsets of the Primary Gross Tumor Volume (GTV) regions of interest (ROIs) associated with the Lung1 CT series were improved on the TCIA. GTV radiomics features were extracted using the open-source ontology-guided radiomics workflow (O-RAW). We reshaped the output of O-RAW to map features and extraction settings to the latest version of Radiomics Ontology, so as to be consistent with the Image Biomarker Standardization Initiative (IBSI). DICOM metadata was extracted using a research version of Semantic DICOM (SOHARD, GmbH, Fuerth; Germany). Subjects' clinical data was described with metadata using the Radiation Oncology Ontology. All of the above were published in Resource Descriptor Format (RDF), i.e. triples. Example SPARQL queries are shared with the reader to use on the online triples archive, which are intended to illustrate how to exploit this data submission.
DATA FORMAT: The accumulated RDF data is publicly accessible through a SPARQL endpoint where the triples are archived. The endpoint is remotely queried through a graph database web application at http://sparql.cancerdata.org. SPARQL queries are intrinsically federated, such that we can efficiently cross-reference clinical, DICOM and radiomics data within a single query, while being agnostic to the original data format and coding system. The federated queries work in the same way even if the RDF data were partitioned across multiple servers and dispersed physical locations.
POTENTIAL APPLICATIONS: The public availability of these data resources is intended to support radiomics features replication, repeatability and reproducibility studies by the academic community. The example SPARQL queries may be freely used and modified by readers depending on their research question. Data interoperability and reusability is supported by referencing existing public ontologies. The RDF data is readily findable and accessible through the aforementioned link. Scripts used to create the RDF are made available at a code repository linked to this submission : https://gitlab.com/UM-CDS/FAIR-compliant_clinical_radiomics_and_DICOM_metadata.
PMID: 32521049 [PubMed - as supplied by publisher]
Spelling Errors and Shouting Capitalization Lead to Additive Penalties to Trustworthiness of Online Health Information: Randomized Experiment With Laypersons.
Spelling Errors and Shouting Capitalization Lead to Additive Penalties to Trustworthiness of Online Health Information: Randomized Experiment With Laypersons.
J Med Internet Res. 2020 Jun 10;22(6):e15171
Authors: Witchel HJ, Thompson GA, Jones CI, Westling CEI, Romero J, Nicotra A, Maag B, Critchley HD
Abstract
BACKGROUND: The written format and literacy competence of screen-based texts can interfere with the perceived trustworthiness of health information in online forums, independent of the semantic content. Unlike in professional content, the format in unmoderated forums can regularly hint at incivility, perceived as deliberate rudeness or casual disregard toward the reader, for example, through spelling errors and unnecessary emphatic capitalization of whole words (online shouting).
OBJECTIVE: This study aimed to quantify the comparative effects of spelling errors and inappropriate capitalization on ratings of trustworthiness independently of lay insight and to determine whether these changes act synergistically or additively on the ratings.
METHODS: In web-based experiments, 301 UK-recruited participants rated 36 randomized short stimulus excerpts (in the format of information from an unmoderated health forum about multiple sclerosis) for trustworthiness using a semantic differential slider. A total of 9 control excerpts were compared with matching error-containing excerpts. Each matching error-containing excerpt included 5 instances of misspelling, or 5 instances of inappropriate capitalization (shouting), or a combination of 5 misspelling plus 5 inappropriate capitalization errors. Data were analyzed in a linear mixed effects model.
RESULTS: The mean trustworthiness ratings of the control excerpts ranged from 32.59 to 62.31 (rating scale 0-100). Compared with the control excerpts, excerpts containing only misspellings were rated as being 8.86 points less trustworthy, those containing inappropriate capitalization were rated as 6.41 points less trustworthy, and those containing the combination of misspelling and capitalization were rated as 14.33 points less trustworthy (P<.001 for all). Misspelling and inappropriate capitalization show an additive effect.
CONCLUSIONS: Distinct indicators of incivility independently and additively penalize the perceived trustworthiness of online text independently of lay insight, eliciting a medium effect size.
PMID: 32519676 [PubMed - in process]
Connected Traffic Data Ontology (CTDO) for Intelligent Urban Traffic Systems Focused on Connected (Semi) Autonomous Vehicles.
Connected Traffic Data Ontology (CTDO) for Intelligent Urban Traffic Systems Focused on Connected (Semi) Autonomous Vehicles.
Sensors (Basel). 2020 May 23;20(10):
Authors: Viktorović M, Yang D, Vries B
Abstract
For autonomous vehicles (AV), the ability to share information about their surroundings is crucial. With Level 4 and 5 autonomy in sight, solving the challenge of organization and efficient storing of data, coming from these connected platforms, becomes paramount. Research done up to now has been mostly focused on communication and network layers of V2X (Vehicle-to-Everything) data sharing. However, there is a gap when it comes to the data layer. Limited attention has been paid to the ontology development in the automotive domain. More specifically, the way to integrate sensor data and geospatial data efficiently is missing. Therefore, we proposed to develop a new Connected Traffic Data Ontology (CTDO) on the foundations of Sensor, Observation, Sample, and Actuator (SOSA) ontology, to provide a more suitable ontology for large volumes of time-sensitive data coming from multi-sensory platforms, like connected vehicles, as the first step in closing the existing research gap. Additionally, as this research aims to further extend the CTDO in the future, a possible way to map to the CTDO with ontologies that represent road infrastructure has been presented. Finally, new CTDO ontology was benchmarked against SOSA, and better memory performance and query execution speeds have been confirmed.
PMID: 32456152 [PubMed - in process]
Pushing the Scalability of RDF Engines on IoT Edge Devices.
Pushing the Scalability of RDF Engines on IoT Edge Devices.
Sensors (Basel). 2020 May 14;20(10):
Authors: Le-Tuan A, Hayes C, Hauswirth M, Le-Phuoc D
Abstract
Semantic interoperability for the Internet of Things (IoT) is enabled by standards and technologies from the Semantic Web. As recent research suggests a move towards decentralised IoT architectures, we have investigated the scalability and robustness of RDF (Resource Description Framework)engines that can be embedded throughout the architecture, in particular at edge nodes. RDF processing at the edge facilitates the deployment of semantic integration gateways closer to low-level devices. Our focus is on how to enable scalable and robust RDF engines that can operate on lightweight devices. In this paper, we have first carried out an empirical study of the scalability and behaviour of solutions for RDF data management on standard computing hardware that have been ported to run on lightweight devices at the network edge. The findings of our study shows that these RDF store solutions have several shortcomings on commodity ARM (Advanced RISC Machine) boards that are representative of IoT edge node hardware. Consequently, this has inspired us to introduce a lightweight RDF engine, which comprises an RDF storage and a SPARQL processor for lightweight edge devices, called RDF4Led. RDF4Led follows the RISC-style (Reduce Instruction Set Computer) design philosophy. The design constitutes a flash-aware storage structure, an indexing scheme, an alternative buffer management technique and a low-memory-footprint join algorithm that demonstrates improved scalability and robustness over competing solutions. With a significantly smaller memory footprint, we show that RDF4Led can handle 2 to 5 times more data than popular RDF engines such as Jena TDB (Tuple Database) and RDF4J, while consuming the same amount of memory. In particular, RDF4Led requires 10%-30% memory of its competitors to operate on datasets of up to 50 million triples. On memory-constrained ARM boards, it can perform faster updates and can scale better than Jena TDB and Virtuoso. Furthermore, we demonstrate considerably faster query operations than Jena TDB and RDF4J.
PMID: 32422961 [PubMed]
GWAS Central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide association studies.
GWAS Central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide association studies.
Nucleic Acids Res. 2020 01 08;48(D1):D933-D940
Authors: Beck T, Shorter T, Brookes AJ
Abstract
The GWAS Central resource provides a toolkit for integrative access and visualization of a uniquely extensive collection of genome-wide association study data, while ensuring safe open access to prevent research participant identification. GWAS Central is the world's most comprehensive openly accessible repository of summary-level GWAS association information, providing over 70 million P-values for over 3800 studies investigating over 1400 unique phenotypes. The database content comprises direct submissions received from GWAS authors and consortia, in addition to actively gathered data sets from various public sources. GWAS data are discoverable from the perspective of genetic markers, genes, genome regions or phenotypes, via graphical visualizations and detailed downloadable data reports. Tested genetic markers and relevant genomic features can be visually interrogated across up to sixteen multiple association data sets in a single view using the integrated genome browser. The semantic standardization of phenotype descriptions with Medical Subject Headings and the Human Phenotype Ontology allows the precise identification of genetic variants associated with diseases, phenotypes and traits of interest. Harmonization of the phenotype descriptions used across several GWAS-related resources has extended the phenotype search capabilities to enable cross-database study discovery using a range of ontologies. GWAS Central is updated regularly and available at https://www.gwascentral.org.
PMID: 31612961 [PubMed - indexed for MEDLINE]
Zostavax vaccine effectiveness among US elderly using real-world evidence: Addressing unmeasured confounders by using multiple imputation after linking beneficiary surveys with Medicare claims.
Zostavax vaccine effectiveness among US elderly using real-world evidence: Addressing unmeasured confounders by using multiple imputation after linking beneficiary surveys with Medicare claims.
Pharmacoepidemiol Drug Saf. 2019 07;28(7):993-1001
Authors: Izurieta HS, Wu X, Lu Y, Chillarige Y, Wernecke M, Lindaas A, Pratt D, MaCurdy TE, Chu S, Kelman J, Forshee R
Abstract
PURPOSE: Medicare claims can provide real-world evidence (RWE) to support the Food and Drug Administration's ability to conduct postapproval studies to validate products' safety and effectiveness. However, Medicare claims do not contain comprehensive information on some important sources of bias. Thus, we piloted an approach using the Medicare Current Beneficiary Survey (MCBS), a nationally representative survey of the Medicare population, to (a) assess cohort balance with respect to unmeasured confounders in a herpes zoster vaccine (HZV) effectiveness claims-based study and (b) augment Medicare claims with MCBS data to include unmeasured covariates.
METHODS: We reanalyzed data from our published HZV effectiveness Medicare analysis, using linkages to MCBS to obtain information on impaired mobility, education, and health-seeking behavior. We assessed survey variable balance between the matched cohorts and selected imbalanced variables for model adjustment, applying multiple imputation by chained equations (MICE) to impute these potential unmeasured confounders.
RESULTS: The original HZV effectiveness study cohorts appeared well balanced with respect to variables we selected from the MCBS. Our imputed results showed slight shifts in HZV effectiveness point estimates with wider confidence intervals, but indicated no statistically significant differences from the original study estimates.
CONCLUSIONS: Our innovative use of linked survey data to assess cohort balance and our imputation approach to augment Medicare claims with MCBS data to include unmeasured covariates provide potential solutions for addressing bias related to unmeasured confounding in large database studies, thus adding new tools for RWE studies.
PMID: 31168897 [PubMed - indexed for MEDLINE]