Deep learning
Deep Learning Methods for De Novo Peptide Sequencing
Mass Spectrom Rev. 2024 Nov 29. doi: 10.1002/mas.21919. Online ahead of print.
ABSTRACT
Protein tandem mass spectrometry data are most often interpreted by matching observed mass spectra to a protein database derived from the reference genome of the sample being analyzed. In many application domains, however, a relevant protein database is unavailable or incomplete, and in such settings de novo sequencing is required. Since the introduction of the DeepNovo algorithm in 2017, the field of de novo sequencing has been dominated by deep learning methods, which use large amounts of labeled mass spectrometry data to train multi-layer neural networks to translate from observed mass spectra to corresponding peptide sequences. Here, we describe these deep learning methods, outline procedures for evaluating their performance, and discuss the challenges in the field, both in terms of methods development and evaluation protocols.
PMID:39611290 | DOI:10.1002/mas.21919
RNA-protein interaction prediction without high-throughput data: An overview and benchmark of <em>in silico</em> tools
Comput Struct Biotechnol J. 2024 Nov 8;23:4036-4046. doi: 10.1016/j.csbj.2024.11.015. eCollection 2024 Dec.
ABSTRACT
RNA-protein interactions (RPIs) are crucial for accurately operating various processes in and between organisms across kingdoms of life. Mutual detection of RPI partner molecules depends on distinct sequential, structural, or thermodynamic features, which can be determined via experimental and bioinformatic methods. Still, the underlying molecular mechanisms of many RPIs are poorly understood. It is further hypothesized that many RPIs are not even described yet. Computational RPI prediction is continuously challenged by the lack of data and detailed research of very specific examples. With the discovery of novel RPI complexes in all kingdoms of life, adaptations of existing RPI prediction methods are necessary. Continuously improving computational RPI prediction is key in advancing the understanding of RPIs in detail and supplementing experimental RPI determination. The growing amount of data covering more species and detailed mechanisms support the accuracy of prediction tools, which in turn support specific experimental research on RPIs. Here, we give an overview of RPI prediction tools that do not use high-throughput data as the user's input. We review the tools according to their input, usability, and output. We then apply the tools to known RPI examples across different kingdoms of life. Our comparison shows that the investigated prediction tools do not favor a certain species and equip the user with results varying in degree of information, from an overall RPI score to detailed interacting residues. Furthermore, we provide a guide tree to assist users which RPI prediction tool is appropriate for their available input data and desired output.
PMID:39610906 | PMC:PMC11603007 | DOI:10.1016/j.csbj.2024.11.015
RDA-MTE: an innovative model for emotion recognition in sports behavior decision-making
Front Neurosci. 2024 Nov 14;18:1466013. doi: 10.3389/fnins.2024.1466013. eCollection 2024.
ABSTRACT
Emotional stimuli play a crucial role in sports behavior decision-making as they significantly influence individuals' responses and decisions in sports contexts. However, existing research predominantly relies on traditional psychological and behavioral methods, lacking in-depth analysis of the complex relationship between emotions and sports behavior, particularly in the integration of real-time emotion recognition and sports behavior decision-making. To address this issue, we propose a deep learning-based model, RDA-MTE, which efficiently extracts and enhances feature interaction capabilities to capture and recognize facial expressions, thereby analyzing the impact of emotional stimuli on sports behavior decision-making. This model combines a pre-trained ResNet-50, a bidirectional attention mechanism, and a multi-layer Transformer encoder to improve the accuracy and robustness of emotion recognition. Experimental results demonstrate that the RDA-MTE model achieves an accuracy of 83.54% on the FER-2013 dataset and 88.9% on the CK+ dataset, particularly excelling in recognizing positive emotions such as "Happy" and "Surprise." Additionally, the model exhibits strong stability in ablation experiments, validating its reliability and generalization capability across different emotion categories. This study not only extends research methodologies in the fields of affective computing and sports behavior decision-making but also provides significant reference for the development of emotion recognition systems in practical applications. The findings of this research will enhance understanding of the role of emotions in sports behavior and promote advancements in related fields.
PMID:39610868 | PMC:PMC11602515 | DOI:10.3389/fnins.2024.1466013
Multi-fusion strategy network-guided cancer subtypes discovering based on multi-omics data
Front Genet. 2024 Nov 14;15:1466825. doi: 10.3389/fgene.2024.1466825. eCollection 2024.
ABSTRACT
INTRODUCTION: The combination of next-generation sequencing technology and Cancer Genome Atlas (TCGA) data provides unprecedented opportunities for the discovery of cancer subtypes. Through comprehensive analysis and in-depth analysis of the genomic data of a large number of cancer patients, researchers can more accurately identify different cancer subtypes and reveal their molecular heterogeneity.
METHODS: In this paper, we propose the SMMSN (Self-supervised Multi-fusion Strategy Network) model for the discovery of cancer subtypes. SMMSN can not only fuse multi-level data representations of single omics data by Graph Convolutional Network (GCN) and Stacked Autoencoder Network (SAE), but also achieve the organic fusion of multi- -omics data through multiple fusion strategies. In response to the problem of lack label information in multi-omics data, SMMSN propose to use dual self-supervise method to cluster cancer subtypes from the integrated data.
RESULTS: We conducted experiments on three labeled and five unlabeled multi-omics datasets to distinguish potential cancer subtypes. Kaplan Meier survival curves and other results showed that SMMSN can obtain cancer subtypes with significant differences.
DISCUSSION: In the case analysis of Glioblastoma Multiforme (GBM) and Breast Invasive Carcinoma (BIC), we conducted survival time and age distribution analysis, drug response analysis, differential expression analysis, functional enrichment analysis on the predicted cancer subtypes. The research results showed that SMMSN can discover clinically meaningful cancer subtypes.
PMID:39610828 | PMC:PMC11602503 | DOI:10.3389/fgene.2024.1466825
Exploring the efficacy of various CNN architectures in diagnosing oral cancer from squamous cell carcinoma
MethodsX. 2024 Nov 5;13:103034. doi: 10.1016/j.mex.2024.103034. eCollection 2024 Dec.
ABSTRACT
Oral cancer can result from mutations in cells located in the lips or mouth. Diagnosing oral cavity squamous cell carcinoma (OCSCC) is particularly challenging, often occurring at advanced stages. To address this, computer-aided diagnosis methods are increasingly being used. In this work, a deep learning-based approach utilizing models such as VGG16, ResNet50, LeNet-5, MobileNetV2, and Inception V3 is presented. NEOR and OCSCC datasets were used for feature extraction, with virtual slide images divided into tiles and classified as normal or squamous cell cancer. Performance metrics like accuracy, F1-score, AUC, precision, and recall were analyzed to determine the prerequisites for optimal CNN performance. The proposed CNN approaches were effective for classifying OCSCC and oral dysplasia, with the highest accuracy of 95.41 % achieved using MobileNetV2.
KEY FINDINGS: Deep learning models, particularly MobileNetV2, achieved high classification accuracy (95.41 %) for OCSCC.CNN-based methods show promise for early-stage OCSCC and oral dysplasia diagnosis. Performance parameters like precision, recall, and F1-score help optimize CNN model selection for this task.
PMID:39610794 | PMC:PMC11603122 | DOI:10.1016/j.mex.2024.103034
Drone-Based Digital Phenotyping to Evaluating Relative Maturity, Stand Count, and Plant Height in Dry Beans (<em>Phaseolus vulgaris</em> L.)
Plant Phenomics. 2024 Nov 28;6:0278. doi: 10.34133/plantphenomics.0278. eCollection 2024.
ABSTRACT
Substantial effort has been made in manually tracking plant maturity and to measure early-stage plant density and crop height in experimental fields. In this study, RGB drone imagery and deep learning (DL) approaches are explored to measure relative maturity (RM), stand count (SC), and plant height (PH), potentially offering higher throughput, accuracy, and cost-effectiveness than traditional methods. A time series of drone images was utilized to estimate dry bean RM employing a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) model. For early-stage SC assessment, Faster RCNN object detection algorithm was evaluated. Flight frequencies, image resolution, and data augmentation techniques were investigated to enhance DL model performance. PH was obtained using a quantile method from digital surface model (DSM) and point cloud (PC) data sources. The CNN-LSTM model showed high accuracy in RM prediction across various conditions, outperforming traditional image preprocessing approaches. The inclusion of growing degree days (GDD) data improved the model's performance under specific environmental stresses. The Faster R-CNN model effectively identified early-stage bean plants, demonstrating superior accuracy over traditional methods and consistency across different flight altitudes. For PH estimation, moderate correlations with ground-truth data were observed across both datasets analyzed. The choice between PC and DSM source data may depend on specific environmental and flight conditions. Overall, the CNN-LSTM and Faster R-CNN models proved more effective than conventional techniques in quantifying RM and SC. The subtraction method proposed for estimating PH without accurate ground elevation data yielded results comparable to the difference-based method. Additionally, the pipeline and open-source software developed hold potential to significantly benefit the phenotyping community.
PMID:39610705 | PMC:PMC11602537 | DOI:10.34133/plantphenomics.0278
Bibliometric and visual analysis of radiomics for evaluating lymph node status in oncology
Front Med (Lausanne). 2024 Nov 14;11:1501652. doi: 10.3389/fmed.2024.1501652. eCollection 2024.
ABSTRACT
BACKGROUND: Radiomics, which involves the conversion of digital images into high-dimensional data, has been used in oncological studies since 2012. We analyzed the publications that had been conducted on this subject using bibliometric and visual methods to expound the hotpots and future trends regarding radiomics in evaluating lymph node status in oncology.
METHODS: Documents published between 2012 and 2023, updated to August 1, 2024, were searched using the Scopus database. VOSviewer, R Package, and Microsoft Excel were used for visualization.
RESULTS: A total of 898 original articles and reviews written in English and be related to radiomics for evaluating lymph node status in oncology, published between 2015 and 2023, were retrieved. A significant increase in the number of publications was observed, with an annual growth rate of 100.77%. The publications predominantly originated from three countries, with China leading in the number of publications and citations. Fudan University was the most contributing affiliation, followed by Sun Yat-sen University and Southern Medical University, all of which were from China. Tian J. from the Chinese Academy of Sciences contributed the most within 5885 authors. In addition, Frontiers in Oncology had the most publications and transcended other journals in recent 4 years. Moreover, the keywords co-occurrence suggested that the interplay of "radiomics" and "lymph node metastasis," as well as "major clinical study" were the predominant topics, furthermore, the focused topics shifted from revealing the diagnosis of cancers to exploring the deep learning-based prediction of lymph node metastasis, suggesting the combination of artificial intelligence research would develop in the future.
CONCLUSION: The present bibliometric and visual analysis described an approximately continuous trend of increasing publications related to radiomics in evaluating lymph node status in oncology and revealed that it could serve as an efficient tool for personalized diagnosis and treatment guidance in clinical patients, and combined artificial intelligence should be further considered in the future.
PMID:39610679 | PMC:PMC11602298 | DOI:10.3389/fmed.2024.1501652
HeteroKGRep: Heterogeneous Knowledge Graph based Drug Repositioning
Knowl Based Syst. 2024 Dec 3;305:112638. doi: 10.1016/j.knosys.2024.112638. Epub 2024 Oct 19.
ABSTRACT
The process of developing new drugs is both time-consuming and costly, often taking over a decade and billions of dollars to obtain regulatory approval. Additionally, the complexity of patent protection for novel compounds presents challenges for pharmaceutical innovation. Drug repositioning offers an alternative strategy to uncover new therapeutic uses for existing medicines. Previous repositioning models have been limited by their reliance on homogeneous data sources, failing to leverage the rich information available in heterogeneous biomedical knowledge graphs. We propose HeteroKGRep, a novel drug repositioning model that utilizes heterogeneous graphs to address these limitations. HeteroKGRep is a multi-step framework that first generates a similarity graph from hierarchical concept relations. It then applies SMOTE over-sampling to address class imbalance before generating node sequences using a heterogeneous graph neural network. Drug and disease embeddings are extracted from the network and used for prediction. We evaluated HeteroKGRep on a graph containing biomedical concepts and relations from ontologies, pathways and literature. It achieved state-of-the-art performance with 99% accuracy, 95% AUC ROC and 94% average precision on predicting repurposing opportunities. Compared to existing homogeneous approaches, HeteroKGRep leverages diverse knowledge sources to enrich representation learning. Based on heterogeneous graphs, HeteroKGRep can discover new drug-desease associations, leveraging de novo drug development. This work establishes a promising new paradigm for knowledge-guided drug repositioning using multimodal biomedical data.
PMID:39610660 | PMC:PMC11600970 | DOI:10.1016/j.knosys.2024.112638
Deep learning based binary classification of diabetic retinopathy images using transfer learning approach
J Diabetes Metab Disord. 2024 Sep 20;23(2):2289-2314. doi: 10.1007/s40200-024-01497-1. eCollection 2024 Dec.
ABSTRACT
OBJECTIVE: Diabetic retinopathy (DR) is a common problem of diabetes, and it is the cause of blindness worldwide. Detection of diabetic radiology disease in the early detection stage is crucial for preventing vision loss. In this work, a deep learning-based binary classification of DR images has been proposed to classify DR images into healthy and unhealthy. Transfer learning-based 20 pre-trained networks have been fine-tuned using a robust dataset of diabetic radiology images. The combined dataset has been collected from three robust databases of diabetic patients annotated by experienced ophthalmologists indicating healthy or non-healthy diabetic retina images.
METHOD: This work has improved robust models by pre-processing the DR images by applying a denoising algorithm, normalization, and data augmentation. In this work, three rubout datasets of diabetic retinopathy images have been selected, named DRD- EyePACS, IDRiD, and APTOS-2019, for the extensive experiments, and a combined diabetic retinopathy image dataset has been generated for the exhaustive experiments. The datasets have been divided into training, testing, and validation sets, and the models use classification accuracy, sensitivity, specificity, precision, F1-score, and ROC-AUC to assess the model's efficiency for evaluating network performance. The present work has selected 20 different pre-trained networks based on three categories: Series, DAG, and lightweight.
RESULTS: This study uses pre-processed data augmentation and normalization of data to solve overfitting problems. From the exhaustive experiments, the three best pre-trained have been selected based on the best classification accuracy from each category. It is concluded that the trained model ResNet101 based on the DAG category effectively identifies diabetic retinopathy disease accurately from radiological images from all cases. It is noted that 97.33% accuracy has been achieved using ResNet101 in the category of DAG network.
CONCLUSION: Based on the experiment results, the proposed model ResNet101 helps healthcare professionals detect retina diseases early and provides practical solutions to diabetes patients. It also gives patients and experts a second opinion for early detection of diabetic retinopathy.
PMID:39610484 | PMC:PMC11599653 | DOI:10.1007/s40200-024-01497-1
A Multi-task learning U-Net model for end-to-end HEp-2 cell image analysis
Artif Intell Med. 2024 Nov 20;159:103031. doi: 10.1016/j.artmed.2024.103031. Online ahead of print.
ABSTRACT
Antinuclear Antibody (ANA) testing is pivotal to help diagnose patients with a suspected autoimmune disease. The Indirect Immunofluorescence (IIF) microscopy performed with human epithelial type 2 (HEp-2) cells as the substrate is the reference method for ANA screening. It allows for the detection of antibodies binding to specific intracellular targets, resulting in various staining patterns that should be identified for diagnosis purposes. In recent years, there has been an increasing interest in devising deep learning methods for automated cell segmentation and classification of staining patterns, as well as for other tasks related to this diagnostic technique (such as intensity classification). However, little attention has been devoted to architectures aimed at simultaneously managing multiple interrelated tasks, via a shared representation. In this paper, we propose a deep neural network model that extends U-Net in a Multi-Task Learning (MTL) fashion, thus offering an end-to-end approach to tackle three fundamental tasks of the diagnostic procedure, i.e., HEp-2 cell specimen intensity classification, specimen segmentation, and pattern classification. The experiments were conducted on one of the largest publicly available datasets of HEp-2 images. The results showed that the proposed approach significantly outperformed the competing state-of-the-art methods for all the considered tasks.
PMID:39608042 | DOI:10.1016/j.artmed.2024.103031
Artificial intelligence-powered image analysis: A paradigm shift in infectious disease detection
Artif Intell Med. 2024 Nov 23;159:103025. doi: 10.1016/j.artmed.2024.103025. Online ahead of print.
ABSTRACT
The global burden of infectious diseases significantly affects mortality rates, with their varying symptoms making it challenging to assess and determine the severity of infections. Different countries face unique challenges related to these diseases. This study introduces innovative Artificial Intelligence (AI) based methodologies to enhance diagnostic accuracy through the analysis of medical imagery. It achieves this by developing a mathematical model capable of identifying potential infectious diseases from images, utilizing a Multi-Criteria Decision-Making (MCDM) framework. This cutting-edge approach combines Hypersoft Set (HSS) within a fuzzy context, pioneering in AI-driven diagnostic processes. The decision-making process might suggest actions such as isolation, quarantine in either domestic settings or specialized facilities, or admission to a hospital for further treatment. The use of visual aids in this research not only improves understanding but also highlights the effectiveness and significance of the proposed methods. The foundational theory and the results from this novel approach demonstrate its potential for widespread application in fields like machine learning, deep learning, and pattern recognition, indicating a significant stride in the fight against infectious diseases through advanced diagnostic techniques.
PMID:39608041 | DOI:10.1016/j.artmed.2024.103025
DeepCTG 2.0: Development and validation of a deep learning model to detect neonatal acidemia from cardiotocography during labor
Comput Biol Med. 2024 Nov 27;184:109448. doi: 10.1016/j.compbiomed.2024.109448. Online ahead of print.
ABSTRACT
Cardiotocography (CTG) is the main tool available to detect neonatal acidemia during delivery. Presently, obstetricians and midwives primarily rely on visual interpretation, leading to a significant intra-observer variability. In this paper, we build and evaluate a convolutional neural network to detect neonatal acidemia from the CTG signals during delivery on a multicenter database with 27662 cases in five centers, including 3457 and 464 cases of moderate and severe neonatal acidemia respectively (defined by a fetal pH at birth between 7.05 and 7.20, and lower than 7.05 respectively). To use all the available records, the convolutional layers are pretrained on a task which consists in predicting several features known to be associated with neonatal acidemia from the raw CTG signals. In a cross-center evaluation, the AUC varies from 0.74 to 0.83 between the centers for the detection of severe acidemia, showing the ability of deep learning models to generalize from one dataset to the other and paving the way for more accurate models trained on larger databases. The model can still be significantly improved, by adding clinical variables to account for risk factors of acidemia that may not appear in the CTG signals. Further research will also be led to integrate the model in a tool that could assist humans in the interpretation of CTG.
PMID:39608037 | DOI:10.1016/j.compbiomed.2024.109448
AELGNet: Attention-based Enhanced Local and Global Features Network for medicinal leaf and plant classification
Comput Biol Med. 2024 Nov 27;184:109447. doi: 10.1016/j.compbiomed.2024.109447. Online ahead of print.
ABSTRACT
Pharmaceutical companies increasingly use medicinal plants because they are cheaper and have fewer side effects than conventional drugs. Accurate identification and classification of medicinal plants is critical for guaranteeing scientific evidence-based usage of herbal treatments in traditional medicine, upholding pharmaceutical safety requirements, and contributing to biodiversity conservation efforts. However, conventional manual classification methods are time-consuming, error-prone, and necessitate specialized knowledge. As a result, many researchers are very interested in studying the automatic classification of therapeutic plants. Current state-of-the-art techniques rely primarily on leaf or plant imagery, restricting their application to certain scenarios. This study combines a large dataset of medicinal plants and their accompanying leaves to create a more generalizable approach for classifying medicinal plants efficiently. The first phase uses contrast-limited adaptive histogram equalization (CLAHE) to highlight important features in medicinal plant and leaf images. The proposed deep learning architecture, Attention-based Enhanced Local and Global Features Network (AELGNet), utilizes these images to extract and classify prominent features. Three MBConv modules in the AELGNet extract base features, subsequently dividing them into four non-overlapping patches for local feature extraction. Additionally, the AELGNet examines base features for global feature extraction. We simultaneously apply residual channel-wise and spatial attention to each patch and global feature to extract more conspicuous information pertinent to the medicinal plant or leaves. The experiment employs a dataset of Indian medicinal plants to assess the efficacy of ALEGNet. AELGNet has a 99.71% accuracy, a 99.80% precision, a 99.75% recall, and a 99.77% F1 score. The suggested AELGNet outperforms 14 current methods with an accuracy range of 2%-10%. The findings confirm AELGNet in medical and industrial settings, providing a strong tool for accurately and quickly identifying medicinal plants and leaves.
PMID:39608035 | DOI:10.1016/j.compbiomed.2024.109447
RS-MOCO: A deep learning-based topology-preserving image registration method for cardiac T1 mapping
Comput Biol Med. 2024 Nov 27;184:109442. doi: 10.1016/j.compbiomed.2024.109442. Online ahead of print.
ABSTRACT
Cardiac T1 mapping can evaluate various clinical symptoms of myocardial tissue. However, there is currently a lack of effective, robust, and efficient methods for motion correction in cardiac T1 mapping. In this paper, we propose a deep learning-based and topology-preserving image registration framework for motion correction in cardiac T1 mapping. Notably, our proposed implicit consistency constraint dubbed BLOC, to some extent preserves the image topology in registration by bidirectional consistency constraint and local anti-folding constraint. To address the contrast variation issue, we introduce a weighted image similarity metric for multimodal registration of cardiac T1-weighted images. Besides, a semi-supervised myocardium segmentation network and a dual-domain attention module are integrated into the framework to further improve the performance of the registration. Numerous comparative experiments, as well as ablation studies, demonstrated the effectiveness and high robustness of our method. The results also indicate that the proposed weighted image similarity metric, specifically crafted for our network, contributes a lot to the enhancement of the motion correction efficacy, while the bidirectional consistency constraint combined with the local anti-folding constraint ensures a more desirable topology-preserving registration mapping.
PMID:39608033 | DOI:10.1016/j.compbiomed.2024.109442
Artificial Intelligence Applications to Measure Food and Nutrient Intakes: Scoping Review
J Med Internet Res. 2024 Nov 28;26:e54557. doi: 10.2196/54557.
ABSTRACT
BACKGROUND: Accurate measurement of food and nutrient intake is crucial for nutrition research, dietary surveillance, and disease management, but traditional methods such as 24-hour dietary recalls, food diaries, and food frequency questionnaires are often prone to recall error and social desirability bias, limiting their reliability. With the advancement of artificial intelligence (AI), there is potential to overcome these limitations through automated, objective, and scalable dietary assessment techniques. However, the effectiveness and challenges of AI applications in this domain remain inadequately explored.
OBJECTIVE: This study aimed to conduct a scoping review to synthesize existing literature on the efficacy, accuracy, and challenges of using AI tools in assessing food and nutrient intakes, offering insights into their current advantages and areas of improvement.
METHODS: This review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. A comprehensive literature search was conducted in 4 databases-PubMed, Web of Science, Cochrane Library, and EBSCO-covering publications from the databases' inception to June 30, 2023. Studies were included if they used modern AI approaches to assess food and nutrient intakes in human subjects.
RESULTS: The 25 included studies, published between 2010 and 2023, involved sample sizes ranging from 10 to 38,415 participants. These studies used a variety of input data types, including food images (n=10), sound and jaw motion data from wearable devices (n=9), and text data (n=4), with 2 studies combining multiple input types. AI models applied included deep learning (eg, convolutional neural networks), machine learning (eg, support vector machines), and hybrid approaches. Applications were categorized into dietary intake assessment, food detection, nutrient estimation, and food intake prediction. Food detection accuracies ranged from 74% to 99.85%, and nutrient estimation errors varied between 10% and 15%. For instance, the RGB-D (Red, Green, Blue-Depth) fusion network achieved a mean absolute error of 15% in calorie estimation, and a sound-based classification model reached up to 94% accuracy in detecting food intake based on jaw motion and chewing patterns. In addition, AI-based systems provided real-time monitoring capabilities, improving the precision of dietary assessments and demonstrating the potential to reduce recall bias typically associated with traditional self-report methods.
CONCLUSIONS: While AI demonstrated significant advantages in improving accuracy, reducing labor, and enabling real-time monitoring, challenges remain in adapting to diverse food types, ensuring algorithmic fairness, and addressing data privacy concerns. The findings suggest that AI has transformative potential for dietary assessment at both individual and population levels, supporting precision nutrition and chronic disease management. Future research should focus on enhancing the robustness of AI models across diverse dietary contexts and integrating biological sensors for a holistic dietary assessment approach.
PMID:39608003 | DOI:10.2196/54557
Deep learning-based classifier for carcinoma of unknown primary using methylation quantitative trait loci
J Neuropathol Exp Neurol. 2024 Nov 28:nlae123. doi: 10.1093/jnen/nlae123. Online ahead of print.
ABSTRACT
Cancer of unknown primary (CUP) constitutes between 2% and 5% of human malignancies and is among the most common causes of cancer death in the United States. Brain metastases are often the first clinical presentation of CUP; despite extensive pathological and imaging studies, 20%-45% of CUP are never assigned a primary site. DNA methylation array profiling is a reliable method for tumor classification but tumor-type-specific classifier development requires many reference samples. This is difficult to accomplish for CUP as many cases are never assigned a specific diagnosis. Recent studies identified subsets of methylation quantitative trait loci (mQTLs) unique to specific organs, which could help increase classifier accuracy while requiring fewer samples. We performed a retrospective genome-wide methylation analysis of 759 carcinoma samples from formalin-fixed paraffin-embedded tissue samples using Illumina EPIC array. Utilizing mQTL specific for breast, lung, ovarian/gynecologic, colon, kidney, or testis (BLOCKT) (185k total probes), we developed a deep learning-based methylation classifier that achieved 93.12% average accuracy and 93.04% average F1-score across a 10-fold validation for BLOCKT organs. Our findings indicate that our organ-based DNA methylation classifier can assist pathologists in identifying the site of origin, providing oncologists insight on a diagnosis to administer appropriate therapy, improving patient outcomes.
PMID:39607989 | DOI:10.1093/jnen/nlae123
stMMR: accurate and robust spatial domain identification from spatially resolved transcriptomics with multimodal feature representation
Gigascience. 2024 Jan 2;13:giae089. doi: 10.1093/gigascience/giae089.
ABSTRACT
BACKGROUND: Deciphering spatial domains using spatially resolved transcriptomics (SRT) is of great value for characterizing and understanding tissue architecture. However, the inherent heterogeneity and varying spatial resolutions present challenges in the joint analysis of multimodal SRT data.
RESULTS: We introduce a multimodal geometric deep learning method, named stMMR, to effectively integrate gene expression, spatial location, and histological information for accurate identifying spatial domains from SRT data. stMMR uses graph convolutional networks and a self-attention module for deep embedding of features within unimodality and incorporates similarity contrastive learning for integrating features across modalities.
CONCLUSIONS: Comprehensive benchmark analysis on various types of spatial data shows superior performance of stMMR in multiple analyses, including spatial domain identification, pseudo-spatiotemporal analysis, and domain-specific gene discovery. In chicken heart development, stMMR reconstructed the spatiotemporal lineage structures, indicating an accurate developmental sequence. In breast cancer and lung cancer, stMMR clearly delineated the tumor microenvironment and identified marker genes associated with diagnosis and prognosis. Overall, stMMR is capable of effectively utilizing the multimodal information of various SRT data to explore and characterize tissue architectures of homeostasis, development, and tumor.
PMID:39607984 | DOI:10.1093/gigascience/giae089
"UDE DIATOMS in the Wild 2024": a new image dataset of freshwater diatoms for training deep learning models
Gigascience. 2024 Jan 2;13:giae087. doi: 10.1093/gigascience/giae087.
ABSTRACT
BACKGROUND: Diatoms are microalgae with finely ornamented microscopic silica shells. Their taxonomic identification by light microscopy is routinely used as part of community ecological research as well as ecological status assessment of aquatic ecosystems, and a need for digitalization of these methods has long been recognized. Alongside their high taxonomic and morphological diversity, several other factors make diatoms highly challenging for deep learning-based identification using light microscopy images. These include (i) an unusually high intraclass variability combined with small between-class differences, (ii) a rather different visual appearance of specimens depending on their orientation on the microscope slide, and (iii) the limited availability of diatom experts for accurate taxonomic annotation.
FINDINGS: We present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, "UDE DIATOMS in the Wild 2024." The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 examples and 144 by at least 50 examples each. We showcase this dataset in 2 innovative analyses that address individual aspects of the above challenges using subclustering to deal with visually heterogeneous classes, out-of-distribution sample detection, and semi-supervised learning.
CONCLUSIONS: The problem of image-based identification of diatoms is both important for environmental research and challenging from the machine learning perspective. By making available the so far largest image dataset, accompanied by innovative analyses, this contribution will facilitate addressing these points by the scientific community.
PMID:39607983 | DOI:10.1093/gigascience/giae087
An improved low-rank plus sparse unrolling network method for dynamic magnetic resonance imaging
Med Phys. 2024 Nov 28. doi: 10.1002/mp.17501. Online ahead of print.
ABSTRACT
BACKGROUND: Recent advances in deep learning have sparked new research interests in dynamic magnetic resonance imaging (MRI) reconstruction. However, existing deep learning-based approaches suffer from insufficient reconstruction efficiency and accuracy due to the lack of time correlation modeling during the reconstruction procedure.
PURPOSE: Inappropriate tensor processing steps and deep learning models may lead to not only a lack of modeling in the time dimension but also an increase in the overall size of the network. Therefore, this study aims to find suitable tensor processing methods and deep learning models to achieve better reconstruction results and a smaller network size.
METHODS: We propose a novel unrolling network method that enhances the reconstruction quality and reduces the parameter redundancy by introducing time correlation modeling into MRI reconstruction with low-rank core matrix and convolutional long short-term memory (ConvLSTM) unit.
RESULTS: We conduct extensive experiments on AMRG Cardiac MRI dataset to evaluate our proposed approach. The results demonstrate that compared to other state-of-the-art approaches, our approach achieves higher peak signal-to-noise ratios and structural similarity indices at different accelerator factors with significantly fewer parameters.
CONCLUSIONS: The improved reconstruction performance demonstrates that our proposed time correlation modeling is simple and effective for accelerating MRI reconstruction. We hope our approach can serve as a reference for future research in dynamic MRI reconstruction.
PMID:39607945 | DOI:10.1002/mp.17501
A deep learning approach for automated scoring of the Rey-Osterrieth complex figure
Elife. 2024 Nov 28;13:RP96017. doi: 10.7554/eLife.96017.
ABSTRACT
Memory deficits are a hallmark of many different neurological and psychiatric conditions. The Rey-Osterrieth complex figure (ROCF) is the state-of-the-art assessment tool for neuropsychologists across the globe to assess the degree of non-verbal visual memory deterioration. To obtain a score, a trained clinician inspects a patient's ROCF drawing and quantifies deviations from the original figure. This manual procedure is time-consuming, slow and scores vary depending on the clinician's experience, motivation, and tiredness. Here, we leverage novel deep learning architectures to automatize the rating of memory deficits. For this, we collected more than 20k hand-drawn ROCF drawings from patients with various neurological and psychiatric disorders as well as healthy participants. Unbiased ground truth ROCF scores were obtained from crowdsourced human intelligence. This dataset was used to train and evaluate a multihead convolutional neural network. The model performs highly unbiased as it yielded predictions very close to the ground truth and the error was similarly distributed around zero. The neural network outperforms both online raters and clinicians. The scoring system can reliably identify and accurately score individual figure elements in previously unseen ROCF drawings, which facilitates explainability of the AI-scoring system. To ensure generalizability and clinical utility, the model performance was successfully replicated in a large independent prospective validation study that was pre-registered prior to data collection. Our AI-powered scoring system provides healthcare institutions worldwide with a digital tool to assess objectively, reliably, and time-efficiently the performance in the ROCF test from hand-drawn images.
PMID:39607424 | DOI:10.7554/eLife.96017