Semantic Web
From Platform to Knowledge Graph: Evolution of Laboratory Automation
JACS Au. 2022 Jan 10;2(2):292-309. doi: 10.1021/jacsau.1c00438. eCollection 2022 Feb 28.
ABSTRACT
High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.
PMID:35252980 | PMC:PMC8889618 | DOI:10.1021/jacsau.1c00438
Priorities for research during the Coronavirus SARS-CoV-2 (COVID-19) pandemic and beyond: a survey of nurses, midwives and health visitors in the United Kingdom
J Res Nurs. 2021 Aug;26(5):442-454. doi: 10.1177/17449871211018737. Epub 2021 Aug 5.
ABSTRACT
BACKGROUND: The Coronavirus SARS-CoV-2 (COVID-19) pandemic has had a significant burden on global healthcare systems. Nurses, midwives and health visitors remain critical to the rapid responses and innovative solutions required. Their views, however, on priorities for research is mainly muted, necessitating greater clarity to inform research that benefits patients and families across the life course.
AIMS: To identify priorities for research in relation to the COVID-19 pandemic and 'beyond', as recommended by nurses, midwives and health visitors across the four countries of the United Kingdom (UK).
METHODS: A cross-sectional, web-based survey design was conducted (5th May-4th June 2020). In addition to the completion of demographic information, respondents identified up to three research areas important to their clinical care/practice in the context of COVID-19 and beyond. Data were imported for analysis into NVivo 12 (QSR International). Descriptive analysis was used to summarise the demographic variables. Free text responses were analysed using a semantic, inductive thematic analysis approach.
RESULTS: In total 1,296 responses were received from a self-selected sample of predominantly of female, registered nurses of white British ethnicity, located in England and working for acute care providers, providing 3,444 research priority recommendations. Four higher-order themes emerged, (1) New and unknown frontiers; (2) Care and treatment solutions; (3) Healthcare leadership and inclusive workforce; and (4) Emotional and mental health impact.
CONCLUSIONS: At a time of significant global uncertainty, the collective voice of nursing, midwifery and health visiting is never more important to inform clinical research. Whilst generalisability is limited by the homogeneity of the sample, this is the first survey to elicit the priorities for research in relation to the COVID-19 pandemic and beyond from nurses, midwives and health visitors in the UK. Novel findings developed through a rigorous analytical approach illuminate areas that require both urgent and long-term attention and provide a platform to direct priority refinement, future research and the basis for evidence translation.
PMID:35251274 | PMC:PMC8894638 | DOI:10.1177/17449871211018737
Deep Learning Accurately Quantifies Plasma Cell Percentages on CD138-Stained Bone Marrow Samples
J Pathol Inform. 2022 Feb 5;13:100011. doi: 10.1016/j.jpi.2022.100011. eCollection 2022.
ABSTRACT
The diagnosis of plasma cell neoplasms requires accurate, and ideally precise, percentages. This plasma cell percentage is often determined by visual estimation of CD138-stained bone marrow biopsies and clot sections. While not necessarily inaccurate, estimates are by definition imprecise. For this study, we hypothesized that deep learning can be used to improve precision. We trained a semantic segmentation-based convolutional neural network (CNN) using annotations of CD138+ and CD138- cells provided by one pathologist on small image patches of bone marrow and validated the CNN on an independent test set of image patches using annotations from two pathologists and a non-deep learning commercial software. On validation, we found that the intraclass correlation coefficients for plasma cell percentages between the CNN and pathologist #1, a non-deep learning commercial software and pathologist #1, and pathologists #1 and #2 were 0.975, 0.892, and 0.994, respectively. The overall results show that CNN labels were almost as accurate as pathologist labels at a cell-by-cell level. Once satisfied with performance, we scaled-up the CNN to evaluate whole slide images (WSIs), and deployed the system as a workflow friendly web application to measure plasma cell percentages using snapshots taken from microscope cameras.
PMID:35242448 | PMC:PMC8873946 | DOI:10.1016/j.jpi.2022.100011
Semantic Integration of Multi-Modal Data and Derived Neuroimaging Results Using the Platform for Imaging in Precision Medicine (PRISM) in the Arkansas Imaging Enterprise System (ARIES)
Front Artif Intell. 2022 Feb 10;4:649970. doi: 10.3389/frai.2021.649970. eCollection 2021.
ABSTRACT
Neuroimaging is among the most active research domains for the creation and management of open-access data repositories. Notably lacking from most data repositories are integrated capabilities for semantic representation. The Arkansas Imaging Enterprise System (ARIES) is a research data management system which features integrated capabilities to support semantic representations of multi-modal data from disparate sources (imaging, behavioral, or cognitive assessments), across common image-processing stages (preprocessing steps, segmentation schemes, analytic pipelines), as well as derived results (publishable findings). These unique capabilities ensure greater reproducibility of scientific findings across large-scale research projects. The current investigation was conducted with three collaborating teams who are using ARIES in a project focusing on neurodegeneration. Datasets included magnetic resonance imaging (MRI) data as well as non-imaging data obtained from a variety of assessments designed to measure neurocognitive functions (performance scores on neuropsychological tests). We integrate and manage these data with semantic representations based on axiomatically rich biomedical ontologies. These instantiate a knowledge graph that combines the data from the study cohorts into a shared semantic representation that explicitly accounts for relations among the entities that the data are about. This knowledge graph is stored in a triple-store database that supports reasoning over and querying these integrated data. Semantic integration of the non-imaging data using background information encoded in biomedical domain ontologies has served as a key feature-engineering step, allowing us to combine disparate data and apply analyses to explore associations, for instance, between hippocampal volumes and measures of cognitive functions derived from various assessment instruments.
PMID:35224477 | PMC:PMC8866818 | DOI:10.3389/frai.2021.649970
A Topic Recognition Method of News Text Based on Word Embedding Enhancement
Comput Intell Neurosci. 2022 Feb 16;2022:4582480. doi: 10.1155/2022/4582480. eCollection 2022.
ABSTRACT
Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy. Some researchers proposed to combine the topic model with the word embedding model, whose results had shown that this approach could enrich text representation and benefit natural language processing downstream tasks. However, for the topic recognition problem of news texts, there is currently no standard way of combining topic model and word embedding model. Besides, some existing similar approaches were more complex and did not consider the fusion between topic distribution of different granularity and word embedding information. Therefore, this paper proposes a novel text representation method based on word embedding enhancement and further forms a full-process topic recognition framework for news text. In contrast to traditional topic recognition methods, this framework is designed to use the probabilistic topic model LDA, the word embedding models Word2vec and Glove to fully extract and integrate the topic distribution, semantic knowledge, and syntactic relationship of the text, and then use popular classifiers to automatically recognize the topic categories of news based on the obtained text representation vectors. As a result, the proposed framework can take advantage of the relationship between document and topic and the context information, which improves the expressive ability and reduces the dimensionality. Based on the two benchmark datasets of 20NewsGroup and BBC News, the experimental results verify the effectiveness and superiority of the proposed method based on word embedding enhancement for the news topic recognition problem.
PMID:35222628 | PMC:PMC8865979 | DOI:10.1155/2022/4582480
The Citation Cloud of a biomedical article: a free, public, web-based tool enabling citation analysis
J Med Libr Assoc. 2022 Jan 1;110(1):103-108. doi: 10.5195/jmla.2022.1117.
ABSTRACT
BACKGROUND: An article's citations are useful for finding related articles that may not be readily found by keyword searches or textual similarity. Citation analysis is also important for analyzing scientific innovation and the structure of the biomedical literature. We wanted to facilitate citation analysis for the broad community by providing a user-friendly interface for accessing and analyzing citation data for biomedical articles.
CASE PRESENTATION: We seeded the Citation Cloud dataset with over 465 million open access citations culled from six different sources: PubMed Central, Microsoft Academic Graph, ArnetMiner, Semantic Scholar, Open Citations, and the NIH iCite dataset. We implemented a free, public extension to PubMed that allows any user to visualize and analyze the entire citation cloud around any paper of interest A: the set of articles cited by A, those which cite A, those which are co-cited with A, and those which are bibliographically coupled to A.
CONCLUSIONS: Citation Cloud greatly enables the study of citations by the scientific community, including relatively advanced analyses (co-citations and bibliographic coupling) that cannot be undertaken using other available tools. The tool can be accessed by running any PubMed query on the Anne O'Tate value-added search interface and clicking on the Citations button next to any retrieved article.
PMID:35210969 | PMC:PMC8830385 | DOI:10.5195/jmla.2022.1117
TSA-SCC: Text Semantic-Aware Screen Content Coding with Ultra Low Bitrate
IEEE Trans Image Process. 2022 Feb 23;PP. doi: 10.1109/TIP.2022.3152003. Online ahead of print.
ABSTRACT
Due to the rapid growth of web conferences, remote screen sharing, and online games, screen content has become an important type of internet media information and over 90% of online media interactions are screen based. Meanwhile, as the main component in the screen content, textual information averagely takes up over 40% of the whole image on various commonly used screen content datasets. However, it is difficult to compress the textual information by using the traditional coding schemes as HEVC, which assumes strong spatial and temporal correlations within the image/video. State-of-the-art screen content coding standard (SCC) as HEVC-SCC still adopts a block-based coding framework and does not consider the text semantics for compression, thus inevitably blurring texts at a lower bitrate. In this paper, we propose a general text semantic-aware screen content coding scheme (TSA-SCC) for ultra low bitrate setting. This method detects the abrupt picture in a screen content video (or image), recognizes textual information (including word, position, font type, font size and font color) in the abrupt picture based on neural networks, and encodes texts with text coding tools. The other pictures as well as the background image after removing texts from the abrupt picture via inpainting, are encoded with HEVC-SCC. Compared with HEVC-SCC, the proposed method TSA-SCC reduces bitrate by up to 3× at a similar compression quality. Moreover, TSA-SCC achieves much better visual quality with less bitrate consumption when encoding the screen content video/image at ultra low bitrates.
PMID:35196232 | DOI:10.1109/TIP.2022.3152003
A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
PeerJ. 2022 Feb 15;10:e12618. doi: 10.7717/peerj.12618. eCollection 2022.
ABSTRACT
To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.
PMID:35186448 | PMC:PMC8855714 | DOI:10.7717/peerj.12618
Design and Development of a Medical Image Databank for Assisting Studies in Radiomics
J Digit Imaging. 2022 Feb 15. doi: 10.1007/s10278-021-00576-6. Online ahead of print.
ABSTRACT
CompreHensive Digital ArchiVe of Cancer Imaging - Radiation Oncology (CHAVI-RO) is a multi-tier WEB-based medical image databank. It supports archiving de-identified radiological and clinical datasets in a relational database. A semantic relational database model is designed to accommodate imaging and treatment data of cancer patients. It aims to provide key datasets to investigate and model the use of radiological imaging data in response to radiation. This domain of research area addresses the modeling and analysis of complete treatment data of oncology patient. A DICOM viewer is integrated for reviewing the uploaded de-identified DICOM dataset. In a prototype system we carried out a pilot study with cancer data of four diseased sites, namely breast, head and neck, brain, and lung cancers. The representative dataset is used to estimate the data size of the patient. A role-based access control module is integrated with the image databank to restrict the user access limit. We also perform different types of load tests to analyze and quantify the performance of the CHAVI databank.
PMID:35166968 | DOI:10.1007/s10278-021-00576-6
MonaGO: a novel gene ontology enrichment analysis visualisation system
BMC Bioinformatics. 2022 Feb 14;23(1):69. doi: 10.1186/s12859-022-04594-1.
ABSTRACT
BACKGROUND: Gene ontology (GO) enrichment analysis is frequently undertaken during exploration of various -omics data sets. Despite the wide array of tools available to biologists to perform this analysis, meaningful visualisation of the overrepresented GO in a manner which is easy to interpret is still lacking.
RESULTS: Monash Gene Ontology (MonaGO) is a novel web-based visualisation system that provides an intuitive, interactive and responsive interface for performing GO enrichment analysis and visualising the results. MonaGO supports gene lists as well as GO terms as inputs. Visualisation results can be exported as high-resolution images or restored in new sessions, allowing reproducibility of the analysis. An extensive comparison between MonaGO and 11 state-of-the-art GO enrichment visualisation tools based on 9 features revealed that MonaGO is a unique platform that simultaneously allows interactive visualisation within one single output page, directly accessible through a web browser with customisable display options.
CONCLUSION: MonaGO combines dynamic clustering and interactive visualisation as well as customisation options to assist biologists in obtaining meaningful representation of overrepresented GO terms, producing simplified outputs in an unbiased manner. MonaGO will facilitate the interpretation of GO analysis and will assist the biologists into the representation of the results.
PMID:35164667 | DOI:10.1186/s12859-022-04594-1
Improving SDG Classification Precision Using Combinatorial Fusion
Sensors (Basel). 2022 Jan 29;22(3):1067. doi: 10.3390/s22031067.
ABSTRACT
Combinatorial fusion algorithm (CFA) is a machine learning and artificial intelligence (ML/AI) framework for combining multiple scoring systems using the rank-score characteristic (RSC) function and cognitive diversity (CD). When measuring the relevance of a publication or document with respect to the 17 Sustainable Development Goals (SDGs) of the United Nations, a classification scheme is used. However, this classification process is a challenging task due to the overlapping goals and contextual differences of those diverse SDGs. In this paper, we use CFA to combine a topic model classifier (Model A) and a semantic link classifier (Model B) to improve the precision of the classification process. We characterize and analyze each of the individual models using the RSC function and CD between Models A and B. We evaluate the classification results from combining the models using a score combination and a rank combination, when compared to the results obtained from human experts. In summary, we demonstrate that the combination of Models A and B can improve classification precision only if these individual models perform well and are diverse.
PMID:35161807 | DOI:10.3390/s22031067
AI-Assisted Design Concept Exploration Through Character Space Construction
Front Psychol. 2022 Jan 27;12:819237. doi: 10.3389/fpsyg.2021.819237. eCollection 2021.
ABSTRACT
We propose an AI-assisted design concept exploration tool, the "Character Space Construction" ("CSC"). Concept designers explore and articulate the target product aesthetics and semantics in language, which is expressed using "Design Concept Phrases" ("DCPs"), that is, compound adjective phrases, and contrasting terms that convey what are not their target design concepts. Designers often utilize this dichotomy technique to communicate the nature of their aesthetic and semantic design concepts with stakeholders, especially in an early design development phase. The CSC assists this designers' cognitive activity by constructing a "Character Space" ("CS"), which is a semantic quadrant system, in a structured manner. A CS created by designers with the assistance of the CSC enables them to discern and explain their design concepts in contrast with opposing terms. These terms in a CS are retrieved and combined in the CSC by using a knowledge graph. The CSC presents terms and phrases as lists of candidates to users from which users will choose in order to define the target design concept, which is then visualized in a CS. The participants in our experiment, who were in the "arts and design" profession, were given two conditions under which to create DCPs and explain them. One group created and explained the DCPs with the assistance of the proposed CSC, and the other did the same task without this assistance, given the freedom to use any publicly available web search tools instead. The result showed that the group assisted by the CSC indicated their tasks were supported significantly better, especially in exploration, as measured by the Creativity Support Index (CSI).
PMID:35153935 | PMC:PMC8828642 | DOI:10.3389/fpsyg.2021.819237
Bibliometric network analysis on rapid-onset opioids for breakthrough cancer pain treatment
J Pain Symptom Manage. 2022 Feb 10:S0885-3924(22)00063-X. doi: 10.1016/j.jpainsymman.2022.01.023. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVES: Proper breakthrough cancer pain (BTcP) management is of pivotal importance. Although rapid-acting, oral and nasal transmucosal, fentanyl formulations (rapid-onset opioids, ROOs) are licensed for BTcP treatment, not all guidelines recommend their use. Presumably, some research gaps need to be bridged to produce solid evidence. We present a bibliometric network analysis on ROOs for BTcP treatment.
METHODS: Documents were retrieved from the Web of Science (WOS) online database. The string was "rapid onset opioids" or "transmucosal fentanyl" and "breakthrough cancer pain". Year of publication, journal metrics (impact factor and quartile), title, document type, topic, and clinical setting (in-patients, outpatients, and palliative care) were extracted. The software tool VOSviewer (version 1.6.17) was used to analyze the semantic network analyzes, bibliographic coupling, journals analysis, and research networks.
RESULTS: 502 articles were found in WOS. A declining trend in published articles from 2014 to 2021 was observed. Approximately 50% of documents regard top quartile (Q1) journals. Most articles focused on ROOs efficacy, but abuse and misuse issues are poorly addressed. With respect to article type, we calculated 132 clinical investigations. The semantic network analysis found interconnections between the terms "breakthrough cancer pain", "opioids", and "cancers". The top co-cited article was published in 2000 and addressed pain assessment. The largest number of partnerships regarded the United States, Italy, and England.
CONCLUSION: In this research area, most articles are published in top-ranked journals. Nevertheless, paramount topics should be better addressed, and the implementation of research networks is needed.
PMID:35151801 | DOI:10.1016/j.jpainsymman.2022.01.023
Machine and cognitive intelligence for human health: systematic review
Brain Inform. 2022 Feb 12;9(1):5. doi: 10.1186/s40708-022-00153-9.
ABSTRACT
Brain informatics is a novel interdisciplinary area that focuses on scientifically studying the mechanisms of human brain information processing by integrating experimental cognitive neuroscience with advanced Web intelligence-centered information technologies. Web intelligence, which aims to understand the computational, cognitive, physical, and social foundations of the future Web, has attracted increasing attention to facilitate the study of brain informatics to promote human health. A large number of articles created in the recent few years are proof of the investment in Web intelligence-assisted human health. This study systematically reviews academic studies regarding article trends, top journals, subjects, countries/regions, and institutions, study design, artificial intelligence technologies, clinical tasks, and performance evaluation. Results indicate that literature is especially welcomed in subjects such as medical informatics and health care sciences and service. There are several promising topics, for example, random forests, support vector machines, and conventional neural networks for disease detection and diagnosis, semantic Web, ontology mining, and topic modeling for clinical or biomedical text mining, artificial neural networks and logistic regression for prediction, and convolutional neural networks and support vector machines for monitoring and classification. Additionally, future research should focus on algorithm innovations, additional information use, functionality improvement, model and system generalization, scalability, evaluation, and automation, data acquirement and quality improvement, and allowing interaction. The findings of this study help better understand what and how Web intelligence can be applied to promote healthcare procedures and clinical outcomes. This provides important insights into the effective use of Web intelligence to support informatics-enabled brain studies.
PMID:35150379 | DOI:10.1186/s40708-022-00153-9
Cross-modal distribution alignment embedding network for generalized zero-shot learning
Neural Netw. 2022 Apr;148:176-182. doi: 10.1016/j.neunet.2022.01.007. Epub 2022 Jan 29.
ABSTRACT
Many approaches in generalized zero-shot learning (GZSL) rely on cross-modal mapping between the image feature space and the class embedding space, which achieves knowledge transfer from seen to unseen classes. However, these two spaces are completely different space and their manifolds are inconsistent, the existing methods suffer from highly overlapped semantic description of different classes, as in GZSL tasks unseen classes can be easily misclassified into seen classes. To handle these problems, we adopt a novel semantic embedding network which helps to encode more discriminative information from initial semantic attributes to semantic embeddings in visual space. Meanwhile, a distribution alignment constraint is adopted to help keep the distribution of the learned semantic embeddings consistent with the distribution of real image features. Moreover, an auxiliary classifier is adopted to strengthen the quality of the learned semantic embeddings. Finally, a relation network is used to classify the unseen images by computing the relation scores between the semantic embeddings and image features, which is much more flexible than the fixed distance metric functions. Experimental results demonstrate that our proposed method is superior to other state-of-the-arts.
PMID:35144151 | DOI:10.1016/j.neunet.2022.01.007
The winter, the summer and the summer dream of artificial intelligence in law: Presidential address to the 18th International Conference on Artificial Intelligence and Law
Artif Intell Law (Dordr). 2022 Feb 3:1-15. doi: 10.1007/s10506-022-09309-8. Online ahead of print.
ABSTRACT
This paper reflects my address as IAAIL president at ICAIL 2021. It is aimed to give my vision of the status of the AI and Law discipline, and possible future perspectives. In this respect, I go through different seasons of AI research (of AI and Law in particular): from the Winter of AI, namely a period of mistrust in AI (throughout the eighties until early nineties), to the Summer of AI, namely the current period of great interest in the discipline with lots of expectations. One of the results of the first decades of AI research is that "intelligence requires knowledge". Since its inception the Web proved to be an extraordinary vehicle for knowledge creation and sharing, therefore it's not a surprise if the evolution of AI has followed the evolution of the Web. I argue that a bottom-up approach, in terms of machine/deep learning and NLP to extract knowledge from raw data, combined with a top-down approach, in terms of legal knowledge representation and models for legal reasoning and argumentation, may represent a promotion for the development of the Semantic Web, as well as of AI systems. Finally, I provide my insight in the potential of AI development, which takes into account technological opportunities and theoretical limits.
PMID:35132296 | PMC:PMC8811736 | DOI:10.1007/s10506-022-09309-8
Evaluating semantic similarity methods for comparison of text-derived phenotype profiles
BMC Med Inform Decis Mak. 2022 Feb 5;22(1):33. doi: 10.1186/s12911-022-01770-4.
ABSTRACT
BACKGROUND: Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area.
METHODS: We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III).
RESULTS: 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures.
CONCLUSION: We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
PMID:35123470 | DOI:10.1186/s12911-022-01770-4
MCRWR: a new method to measure the similarity of documents based on semantic network
BMC Bioinformatics. 2022 Feb 1;23(1):56. doi: 10.1186/s12859-022-04578-1.
ABSTRACT
BACKGROUND: Besides Boolean retrieval with medical subject headings (MeSH), PubMed provides users with an alternative way called "Related Articles" to access and collect relevant documents based on semantic similarity. To explore the functionality more efficiently and more accurately, we proposed an improved algorithm by measuring the semantic similarity of PubMed citations based on the MeSH-concept network model.
RESULTS: Three article similarity networks are obtained using MeSH-concept random walk with restart (MCRWR), MeSH random walk with restart (MRWR) and PubMed related article (PMRA) respectively. The area under receiver operating characteristic (ROC) curve of MCRWR, MRWR and PMRA is 0.93, 0.90, and 0.67 respectively. Precisions of MCRWR and MRWR under various similarity thresholds are higher than that of PMRA. Mean value of P5 of MCRWR is 0.742, which is much higher than those of MRWR (0.692) and PMRA (0.223). In the article semantic similarity network of "Genes & Function of organ & Disease" based on MCRWR algorithm, four topics are identified according to golden standards.
CONCLUSION: MeSH-concept random walk with restart algorithm has better performance in constructing article semantic similarity network, which can reveal the implicitly semantic association between documents. The efficiency and accuracy of retrieving semantic-related documents have been improved a lot.
PMID:35105306 | DOI:10.1186/s12859-022-04578-1
CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction
Interdiscip Sci. 2022 Feb 1. doi: 10.1007/s12539-021-00500-0. Online ahead of print.
ABSTRACT
N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k-nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ac4c/ . The presented model and tool are beneficial to identify ac4C on large scale.
PMID:35106702 | DOI:10.1007/s12539-021-00500-0
Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media
World Wide Web. 2022 Jan 28:1-24. doi: 10.1007/s11280-021-00992-2. Online ahead of print.
ABSTRACT
The ability to explain why the model produced results in such a way is an important problem, especially in the medical domain. Model explainability is important for building trust by providing insight into the model prediction. However, most existing machine learning methods provide no explainability, which is worrying. For instance, in the task of automatic depression prediction, most machine learning models lead to predictions that are obscure to humans. In this work, we propose explainable Multi-Aspect Depression Detection with Hierarchical Attention Network MDHAN, for automatic detection of depressed users on social media and explain the model prediction. We have considered user posts augmented with additional features from Twitter. Specifically, we encode user posts using two levels of attention mechanisms applied at the tweet-level and word-level, calculate each tweet and words' importance, and capture semantic sequence features from the user timelines (posts). Our hierarchical attention model is developed in such a way that it can capture patterns that leads to explainable results. Our experiments show that MDHAN outperforms several popular and robust baseline methods, demonstrating the effectiveness of combining deep learning with multi-aspect features. We also show that our model helps improve predictive performance when detecting depression in users who are posting messages publicly on social media. MDHAN achieves excellent performance and ensures adequate evidence to explain the prediction.
PMID:35106059 | PMC:PMC8795347 | DOI:10.1007/s11280-021-00992-2