Semantic Web
Expressing and Executing Informed Consent Permissions Using SWRL: The All of Us Use Case
AMIA Annu Symp Proc. 2022 Feb 21;2021:197-206. eCollection 2021.
ABSTRACT
The informed consent process is a complicated procedure involving permissions as well a variety of entities and actions. In this paper, we discuss the use of Semantic Web Rule Language (SWRL) to further extend the Informed Consent Ontology (ICO) to allow for semantic machine-based reasoning to manage and generate important permission-based information that can later be viewed by stakeholders. We present four use cases of permissions from the All of Us informed consent document and translate these permissions into SWRL expressions to extend and operationalize ICO. Our efforts show how SWRL is able to infer some of the implicit information based on the defined rules, and demonstrate the utility of ICO through the use of SWRL extensions. Future work will include developing formal and generalized rules and expressing permissions from the entire document, as well as working towards integrating ICO into software systems to enhance the semantic representation of informed consent for biomedical research.
PMID:35309008 | PMC:PMC8861693
Identifying informative tweets during a pandemic via a topic-aware neural language model
World Wide Web. 2022 Mar 16:1-16. doi: 10.1007/s11280-022-01034-1. Online ahead of print.
ABSTRACT
Every epidemic affects the real lives of many people around the world and leads to terrible consequences. Recently, many tweets about the COVID-19 pandemic have been shared publicly on social media platforms. The analysis of these tweets is helpful for emergency response organizations to prioritize their tasks and make better decisions. However, most of these tweets are non-informative, which is a challenge for establishing an automated system to detect useful information in social media. Furthermore, existing methods ignore unlabeled data and topic background knowledge, which can provide additional semantic information. In this paper, we propose a novel Topic-Aware BERT (TABERT) model to solve the above challenges. TABERT first leverages a topic model to extract the latent topics of tweets. Secondly, a flexible framework is used to combine topic information with the output of BERT. Finally, we adopt adversarial training to achieve semi-supervised learning, and a large amount of unlabeled data can be used to improve inner representations of the model. Experimental results on the dataset of COVID-19 English tweets show that our model outperforms classic and state-of-the-art baselines.
PMID:35308294 | PMC:PMC8924578 | DOI:10.1007/s11280-022-01034-1
The AOP-DB RDF: Applying FAIR Principles to the Semantic Integration of AOP Data Using the Research Description Framework
Front Toxicol. 2022 Feb 14;4:803983. doi: 10.3389/ftox.2022.803983. eCollection 2022.
ABSTRACT
Computational toxicology is central to the current transformation occurring in toxicology and chemical risk assessment. There is a need for more efficient use of existing data to characterize human toxicological response data for environmental chemicals in the US and Europe. The Adverse Outcome Pathway (AOP) framework helps to organize existing mechanistic information and contributes to what is currently being described as New Approach Methodologies (NAMs). AOP knowledge and data are currently submitted directly by users and stored in the AOP-Wiki (https://aopwiki.org/). Automatic and systematic parsing of AOP-Wiki data is challenging, so we have created the EPA Adverse Outcome Pathway Database. The AOP-DB, developed by the US EPA to assist in the biological and mechanistic characterization of AOP data, provides a broad, systems-level overview of the biological context of AOPs. Here we describe the recent semantic mapping efforts for the AOP-DB, and how this process facilitates the integration of AOP-DB data with other toxicologically relevant datasets through a use case example.
PMID:35295213 | PMC:PMC8915825 | DOI:10.3389/ftox.2022.803983
A systematic review of social participation in ecosystem services studies in Latin America from a transdisciplinary perspective, 1996-2020
Sci Total Environ. 2022 Mar 12:154523. doi: 10.1016/j.scitotenv.2022.154523. Online ahead of print.
ABSTRACT
In this article, we propose that ecosystem services (ES) should be studied integrating social participation and the narrative of social actors. We analyzed the ES literature (1996-2020) in Latin America (LA), basing our review on the concept that the study of this topic should be transdisciplinary and post-normal (i.e., extended peer communities). We prepared the review using the Scopus® and Web of Science™ (WoS) databases. We found 1069 articles related to social participation in ES studies in 20 LA countries, identifying 310 articles for further analysis using screening and eligibility protocols. We also used a random sample (n = 50) of the 310 articles for a detailed analysis of social participation and extended peer communities. Results showed that articles increased from seven in 2010 to 39 per year from 2015 to 2019. English is the primary language used (91% of the articles), with only one journal accepting publications in Spanish. The most common collaboration combination has been one LA author and one or more non-LA authors (41% of the articles). The semantic network analysis showed 35 thematic clusters, with the most common corresponding to ES protection and provision issues. Direct social participation was included in 62% of the articles, mainly through interviews; however, consultancy processes have dominated the participatory perspective of the authors without transformative involvement. We discuss article language and low inter-countries collaboration, both influencing the lack of social participation required for the transdisciplinary analysis of ES.
PMID:35292319 | DOI:10.1016/j.scitotenv.2022.154523
Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data
J Biomed Semantics. 2022 Mar 15;13(1):9. doi: 10.1186/s13326-022-00264-6.
ABSTRACT
BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles.
RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR.
CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.
PMID:35292119 | DOI:10.1186/s13326-022-00264-6
Automated post scoring: evaluating posts with topics and quoted posts in online forum
World Wide Web. 2022 Mar 10:1-25. doi: 10.1007/s11280-022-01005-6. Online ahead of print.
ABSTRACT
Online forumpost evaluationis an effective way for instructors to assess students' knowledge understanding and writing mechanics. Manually evaluating massive posts costs a lot of time. Automatically grading online posts could significantly alleviate instructors' burden. Similar text assessment tasks like Automated Text Scoring evaluate the writing quality of independent texts or relevance between text and prompt. And Automatic Short Answer Grading measures the semantic matching of short answers according to given problems and correct answers. Different from existing tasks, we propose a novel task, Automated Post Scoring (APS), which grades all online discussion posts in each thread of each student with given topics and quoted posts. APS evaluates not only the writing quality of posts automatically but also the relevance to topics. To measure the relevance, we model the semantic consistency between posts and topics. Supporting arguments are also extracted from quoted posts to enhance posts evaluation. Specifically, we propose a mixture model including a hierarchical text model to measure the writing quality, a semantic matching model to model topic relevance, and a semantic representation model to integrate quoted posts. We also construct a new dataset called Online Discussion Dataset containing 2,542 online posts from 694 students of a social science course. The proposed models are evaluated on the dataset with correlation and residual based evaluation metrics. Compared with measuring posts alone, experimental results demonstrate that incorporating topics and quoted posts could improve the performance of APS by a large margin, more than 9 percent on QWK.
PMID:35287331 | PMC:PMC8907391 | DOI:10.1007/s11280-022-01005-6
Big data, computational social science, and other recent innovations in social network analysis
Can Rev Sociol. 2022 Mar 14. doi: 10.1111/cars.12377. Online ahead of print.
ABSTRACT
While sociologists have studied social networks for about one hundred years, recent developments in data, technology, and methods of analysis provide opportunities for social network analysis (SNA) to play a prominent role in the new research world of big data and computational social science (CSS). In our review, we focus on four broad topics: (1) Collecting Social Network Data from the Web, (2) Non-traditional and Bipartite/Multi-mode Networks, including Discourse and Semantic Networks, and Social-Ecological Networks, (3) Recent Developments in Statistical Inference for Networks, and (4) Ethics in Computational Network Research.
PMID:35286014 | DOI:10.1111/cars.12377
Semantic Table-of-Contents for Efficient Web Screen Reading
Proc Symp Appl Comput. 2021 Mar;2021:1941-1949. doi: 10.1145/3412841.3442066. Epub 2021 Apr 22.
ABSTRACT
Navigating back-and-forth between segments in webpages is well-known to be an arduous endeavor for blind screen-reader users, due to the serial nature of content navigation coupled with the inconsistent usage of accessibility enhancing features such as WAI-ARIA landmarks and skip navigation links by web developers. Without these supporting features, navigating modern webpages that typically contain thousands of HTML elements in their DOMs, is both tedious and cumbersome for blind screen-reader users. Existing approaches to improve non-visual navigation efficiency typically propose 'one-size-fits-all' solutions that do not accommodate the personal needs and preferences of screen-reader users. To fill this void, in this paper, we present sTag, a browser extension embodying a semi-automatic method that enables users to easily create their own Table Of Contents (TOC) for any webpage by simply 'tagging' their preferred 'semantically-meaningful' segments (e.g., search results, filter options, forms, menus, etc.) while navigating the webpage. This way, all subsequent accesses to these segments can be made via the generated TOC that is made instantly accessible via a special shortcut or a repurposed mouse/touchpad action. As tags in sTag are attached to the abstract semantic segments instead of actual DOM nodes in the webpage, sTag can automatically generate equivalent TOCs for other similar webpages, without requiring the users to duplicate their tagging efforts from scratch in these webpages. An evaluation with 15 blind screen-reader users revealed that sTag significantly reduced the content-navigation time and effort compared to those with a state-of-the-art solution.
PMID:35265951 | PMC:PMC8903019 | DOI:10.1145/3412841.3442066
A semantic web technology index
Sci Rep. 2022 Mar 7;12(1):3672. doi: 10.1038/s41598-022-07615-4.
ABSTRACT
Semantic web (SW) technology has been widely applied to many domains such as medicine, health care, finance, geology. At present, researchers mainly rely on their experience and preferences to develop and evaluate the work of SW technology. Although the general architecture (e.g., Tim Berners-Lee's Semantic Web Layer Cake) of SW technology was proposed many years ago and has been well-known, it still lacks a concrete guideline for standardizing the development of SW technology. In this paper, we propose an SW technology index to standardize the development for ensuring that the work of SW technology is designed well and to quantitatively evaluate the quality of the work in SW technology. This index consists of 10 criteria that quantify the quality as a score of [Formula: see text]. We address each criterion in detail for a clear explanation from three aspects: (1) what is the criterion? (2) why do we consider this criterion and (3) how do the current studies meet this criterion? Finally, we present the validation of this index by providing some examples of how to apply the index to the validation cases. We conclude that the index is a useful standard to guide and evaluate the work in SW technology.
PMID:35256665 | DOI:10.1038/s41598-022-07615-4
From Platform to Knowledge Graph: Evolution of Laboratory Automation
JACS Au. 2022 Jan 10;2(2):292-309. doi: 10.1021/jacsau.1c00438. eCollection 2022 Feb 28.
ABSTRACT
High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.
PMID:35252980 | PMC:PMC8889618 | DOI:10.1021/jacsau.1c00438
Priorities for research during the Coronavirus SARS-CoV-2 (COVID-19) pandemic and beyond: a survey of nurses, midwives and health visitors in the United Kingdom
J Res Nurs. 2021 Aug;26(5):442-454. doi: 10.1177/17449871211018737. Epub 2021 Aug 5.
ABSTRACT
BACKGROUND: The Coronavirus SARS-CoV-2 (COVID-19) pandemic has had a significant burden on global healthcare systems. Nurses, midwives and health visitors remain critical to the rapid responses and innovative solutions required. Their views, however, on priorities for research is mainly muted, necessitating greater clarity to inform research that benefits patients and families across the life course.
AIMS: To identify priorities for research in relation to the COVID-19 pandemic and 'beyond', as recommended by nurses, midwives and health visitors across the four countries of the United Kingdom (UK).
METHODS: A cross-sectional, web-based survey design was conducted (5th May-4th June 2020). In addition to the completion of demographic information, respondents identified up to three research areas important to their clinical care/practice in the context of COVID-19 and beyond. Data were imported for analysis into NVivo 12 (QSR International). Descriptive analysis was used to summarise the demographic variables. Free text responses were analysed using a semantic, inductive thematic analysis approach.
RESULTS: In total 1,296 responses were received from a self-selected sample of predominantly of female, registered nurses of white British ethnicity, located in England and working for acute care providers, providing 3,444 research priority recommendations. Four higher-order themes emerged, (1) New and unknown frontiers; (2) Care and treatment solutions; (3) Healthcare leadership and inclusive workforce; and (4) Emotional and mental health impact.
CONCLUSIONS: At a time of significant global uncertainty, the collective voice of nursing, midwifery and health visiting is never more important to inform clinical research. Whilst generalisability is limited by the homogeneity of the sample, this is the first survey to elicit the priorities for research in relation to the COVID-19 pandemic and beyond from nurses, midwives and health visitors in the UK. Novel findings developed through a rigorous analytical approach illuminate areas that require both urgent and long-term attention and provide a platform to direct priority refinement, future research and the basis for evidence translation.
PMID:35251274 | PMC:PMC8894638 | DOI:10.1177/17449871211018737
Deep Learning Accurately Quantifies Plasma Cell Percentages on CD138-Stained Bone Marrow Samples
J Pathol Inform. 2022 Feb 5;13:100011. doi: 10.1016/j.jpi.2022.100011. eCollection 2022.
ABSTRACT
The diagnosis of plasma cell neoplasms requires accurate, and ideally precise, percentages. This plasma cell percentage is often determined by visual estimation of CD138-stained bone marrow biopsies and clot sections. While not necessarily inaccurate, estimates are by definition imprecise. For this study, we hypothesized that deep learning can be used to improve precision. We trained a semantic segmentation-based convolutional neural network (CNN) using annotations of CD138+ and CD138- cells provided by one pathologist on small image patches of bone marrow and validated the CNN on an independent test set of image patches using annotations from two pathologists and a non-deep learning commercial software. On validation, we found that the intraclass correlation coefficients for plasma cell percentages between the CNN and pathologist #1, a non-deep learning commercial software and pathologist #1, and pathologists #1 and #2 were 0.975, 0.892, and 0.994, respectively. The overall results show that CNN labels were almost as accurate as pathologist labels at a cell-by-cell level. Once satisfied with performance, we scaled-up the CNN to evaluate whole slide images (WSIs), and deployed the system as a workflow friendly web application to measure plasma cell percentages using snapshots taken from microscope cameras.
PMID:35242448 | PMC:PMC8873946 | DOI:10.1016/j.jpi.2022.100011
Semantic Integration of Multi-Modal Data and Derived Neuroimaging Results Using the Platform for Imaging in Precision Medicine (PRISM) in the Arkansas Imaging Enterprise System (ARIES)
Front Artif Intell. 2022 Feb 10;4:649970. doi: 10.3389/frai.2021.649970. eCollection 2021.
ABSTRACT
Neuroimaging is among the most active research domains for the creation and management of open-access data repositories. Notably lacking from most data repositories are integrated capabilities for semantic representation. The Arkansas Imaging Enterprise System (ARIES) is a research data management system which features integrated capabilities to support semantic representations of multi-modal data from disparate sources (imaging, behavioral, or cognitive assessments), across common image-processing stages (preprocessing steps, segmentation schemes, analytic pipelines), as well as derived results (publishable findings). These unique capabilities ensure greater reproducibility of scientific findings across large-scale research projects. The current investigation was conducted with three collaborating teams who are using ARIES in a project focusing on neurodegeneration. Datasets included magnetic resonance imaging (MRI) data as well as non-imaging data obtained from a variety of assessments designed to measure neurocognitive functions (performance scores on neuropsychological tests). We integrate and manage these data with semantic representations based on axiomatically rich biomedical ontologies. These instantiate a knowledge graph that combines the data from the study cohorts into a shared semantic representation that explicitly accounts for relations among the entities that the data are about. This knowledge graph is stored in a triple-store database that supports reasoning over and querying these integrated data. Semantic integration of the non-imaging data using background information encoded in biomedical domain ontologies has served as a key feature-engineering step, allowing us to combine disparate data and apply analyses to explore associations, for instance, between hippocampal volumes and measures of cognitive functions derived from various assessment instruments.
PMID:35224477 | PMC:PMC8866818 | DOI:10.3389/frai.2021.649970
A Topic Recognition Method of News Text Based on Word Embedding Enhancement
Comput Intell Neurosci. 2022 Feb 16;2022:4582480. doi: 10.1155/2022/4582480. eCollection 2022.
ABSTRACT
Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy. Some researchers proposed to combine the topic model with the word embedding model, whose results had shown that this approach could enrich text representation and benefit natural language processing downstream tasks. However, for the topic recognition problem of news texts, there is currently no standard way of combining topic model and word embedding model. Besides, some existing similar approaches were more complex and did not consider the fusion between topic distribution of different granularity and word embedding information. Therefore, this paper proposes a novel text representation method based on word embedding enhancement and further forms a full-process topic recognition framework for news text. In contrast to traditional topic recognition methods, this framework is designed to use the probabilistic topic model LDA, the word embedding models Word2vec and Glove to fully extract and integrate the topic distribution, semantic knowledge, and syntactic relationship of the text, and then use popular classifiers to automatically recognize the topic categories of news based on the obtained text representation vectors. As a result, the proposed framework can take advantage of the relationship between document and topic and the context information, which improves the expressive ability and reduces the dimensionality. Based on the two benchmark datasets of 20NewsGroup and BBC News, the experimental results verify the effectiveness and superiority of the proposed method based on word embedding enhancement for the news topic recognition problem.
PMID:35222628 | PMC:PMC8865979 | DOI:10.1155/2022/4582480
The Citation Cloud of a biomedical article: a free, public, web-based tool enabling citation analysis
J Med Libr Assoc. 2022 Jan 1;110(1):103-108. doi: 10.5195/jmla.2022.1117.
ABSTRACT
BACKGROUND: An article's citations are useful for finding related articles that may not be readily found by keyword searches or textual similarity. Citation analysis is also important for analyzing scientific innovation and the structure of the biomedical literature. We wanted to facilitate citation analysis for the broad community by providing a user-friendly interface for accessing and analyzing citation data for biomedical articles.
CASE PRESENTATION: We seeded the Citation Cloud dataset with over 465 million open access citations culled from six different sources: PubMed Central, Microsoft Academic Graph, ArnetMiner, Semantic Scholar, Open Citations, and the NIH iCite dataset. We implemented a free, public extension to PubMed that allows any user to visualize and analyze the entire citation cloud around any paper of interest A: the set of articles cited by A, those which cite A, those which are co-cited with A, and those which are bibliographically coupled to A.
CONCLUSIONS: Citation Cloud greatly enables the study of citations by the scientific community, including relatively advanced analyses (co-citations and bibliographic coupling) that cannot be undertaken using other available tools. The tool can be accessed by running any PubMed query on the Anne O'Tate value-added search interface and clicking on the Citations button next to any retrieved article.
PMID:35210969 | PMC:PMC8830385 | DOI:10.5195/jmla.2022.1117
TSA-SCC: Text Semantic-Aware Screen Content Coding with Ultra Low Bitrate
IEEE Trans Image Process. 2022 Feb 23;PP. doi: 10.1109/TIP.2022.3152003. Online ahead of print.
ABSTRACT
Due to the rapid growth of web conferences, remote screen sharing, and online games, screen content has become an important type of internet media information and over 90% of online media interactions are screen based. Meanwhile, as the main component in the screen content, textual information averagely takes up over 40% of the whole image on various commonly used screen content datasets. However, it is difficult to compress the textual information by using the traditional coding schemes as HEVC, which assumes strong spatial and temporal correlations within the image/video. State-of-the-art screen content coding standard (SCC) as HEVC-SCC still adopts a block-based coding framework and does not consider the text semantics for compression, thus inevitably blurring texts at a lower bitrate. In this paper, we propose a general text semantic-aware screen content coding scheme (TSA-SCC) for ultra low bitrate setting. This method detects the abrupt picture in a screen content video (or image), recognizes textual information (including word, position, font type, font size and font color) in the abrupt picture based on neural networks, and encodes texts with text coding tools. The other pictures as well as the background image after removing texts from the abrupt picture via inpainting, are encoded with HEVC-SCC. Compared with HEVC-SCC, the proposed method TSA-SCC reduces bitrate by up to 3× at a similar compression quality. Moreover, TSA-SCC achieves much better visual quality with less bitrate consumption when encoding the screen content video/image at ultra low bitrates.
PMID:35196232 | DOI:10.1109/TIP.2022.3152003
A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
PeerJ. 2022 Feb 15;10:e12618. doi: 10.7717/peerj.12618. eCollection 2022.
ABSTRACT
To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.
PMID:35186448 | PMC:PMC8855714 | DOI:10.7717/peerj.12618
Design and Development of a Medical Image Databank for Assisting Studies in Radiomics
J Digit Imaging. 2022 Feb 15. doi: 10.1007/s10278-021-00576-6. Online ahead of print.
ABSTRACT
CompreHensive Digital ArchiVe of Cancer Imaging - Radiation Oncology (CHAVI-RO) is a multi-tier WEB-based medical image databank. It supports archiving de-identified radiological and clinical datasets in a relational database. A semantic relational database model is designed to accommodate imaging and treatment data of cancer patients. It aims to provide key datasets to investigate and model the use of radiological imaging data in response to radiation. This domain of research area addresses the modeling and analysis of complete treatment data of oncology patient. A DICOM viewer is integrated for reviewing the uploaded de-identified DICOM dataset. In a prototype system we carried out a pilot study with cancer data of four diseased sites, namely breast, head and neck, brain, and lung cancers. The representative dataset is used to estimate the data size of the patient. A role-based access control module is integrated with the image databank to restrict the user access limit. We also perform different types of load tests to analyze and quantify the performance of the CHAVI databank.
PMID:35166968 | DOI:10.1007/s10278-021-00576-6
MonaGO: a novel gene ontology enrichment analysis visualisation system
BMC Bioinformatics. 2022 Feb 14;23(1):69. doi: 10.1186/s12859-022-04594-1.
ABSTRACT
BACKGROUND: Gene ontology (GO) enrichment analysis is frequently undertaken during exploration of various -omics data sets. Despite the wide array of tools available to biologists to perform this analysis, meaningful visualisation of the overrepresented GO in a manner which is easy to interpret is still lacking.
RESULTS: Monash Gene Ontology (MonaGO) is a novel web-based visualisation system that provides an intuitive, interactive and responsive interface for performing GO enrichment analysis and visualising the results. MonaGO supports gene lists as well as GO terms as inputs. Visualisation results can be exported as high-resolution images or restored in new sessions, allowing reproducibility of the analysis. An extensive comparison between MonaGO and 11 state-of-the-art GO enrichment visualisation tools based on 9 features revealed that MonaGO is a unique platform that simultaneously allows interactive visualisation within one single output page, directly accessible through a web browser with customisable display options.
CONCLUSION: MonaGO combines dynamic clustering and interactive visualisation as well as customisation options to assist biologists in obtaining meaningful representation of overrepresented GO terms, producing simplified outputs in an unbiased manner. MonaGO will facilitate the interpretation of GO analysis and will assist the biologists into the representation of the results.
PMID:35164667 | DOI:10.1186/s12859-022-04594-1
Improving SDG Classification Precision Using Combinatorial Fusion
Sensors (Basel). 2022 Jan 29;22(3):1067. doi: 10.3390/s22031067.
ABSTRACT
Combinatorial fusion algorithm (CFA) is a machine learning and artificial intelligence (ML/AI) framework for combining multiple scoring systems using the rank-score characteristic (RSC) function and cognitive diversity (CD). When measuring the relevance of a publication or document with respect to the 17 Sustainable Development Goals (SDGs) of the United Nations, a classification scheme is used. However, this classification process is a challenging task due to the overlapping goals and contextual differences of those diverse SDGs. In this paper, we use CFA to combine a topic model classifier (Model A) and a semantic link classifier (Model B) to improve the precision of the classification process. We characterize and analyze each of the individual models using the RSC function and CD between Models A and B. We evaluate the classification results from combining the models using a score combination and a rank combination, when compared to the results obtained from human experts. In summary, we demonstrate that the combination of Models A and B can improve classification precision only if these individual models perform well and are diverse.
PMID:35161807 | DOI:10.3390/s22031067