Deep learning

Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learning

Thu, 2024-10-10 06:00

bioRxiv [Preprint]. 2024 Sep 24:2024.09.21.614253. doi: 10.1101/2024.09.21.614253.

ABSTRACT

Regulatory proteins have evolved diverse repressor domains (RDs) to enable precise context-specific repression of transcription. However, our understanding of how sequence variation impacts the functional activity of RDs is limited. To address this gap, we generated a high-throughput mutational scanning dataset measuring the repressor activity of 115,000 variant sequences spanning more than 50 RDs in human cells. We identified thousands of clinical variants with loss or gain of repressor function, including TWIST1 HLH variants associated with Saethre-Chotzen syndrome and MECP2 domain variants associated with Rett syndrome. We also leveraged these data to annotate short linear interacting motifs (SLiMs) that are critical for repression in disordered RDs. Then, we designed a deep learning model called TENet ( T ranscriptional E ffector Net work) that integrates sequence, structure and biochemical representations of sequence variants to accurately predict repressor activity. We systematically tested generalization within and across domains with varying homology using the mutational scanning dataset. Finally, we employed TENet within a directed evolution sequence editing framework to tune the activity of both structured and disordered RDs and experimentally test thousands of designs. Our work highlights critical considerations for future dataset design and model training strategies to improve functional variant prioritization and precision design of synthetic regulatory proteins.

PMID:39386603 | PMC:PMC11463546 | DOI:10.1101/2024.09.21.614253

Categories: Literature Watch

Learning precise segmentation of neurofibrillary tangles from rapid manual point annotations

Thu, 2024-10-10 06:00

bioRxiv [Preprint]. 2024 Sep 24:2024.05.15.594372. doi: 10.1101/2024.05.15.594372.

ABSTRACT

Accumulation of abnormal tau protein into neurofibrillary tangles (NFTs) is a pathologic hallmark of Alzheimer disease (AD). Accurate detection of NFTs in tissue samples can reveal relationships with clinical, demographic, and genetic features through deep phenotyping. However, expert manual analysis is time-consuming, subject to observer variability, and cannot handle the data amounts generated by modern imaging. We present a scalable, open-source, deep-learning approach to quantify NFT burden in digital whole slide images (WSIs) of post-mortem human brain tissue. To achieve this, we developed a method to generate detailed NFT boundaries directly from single-point-per-NFT annotations. We then trained a semantic segmentation model on 45 annotated 2400µm by 1200µm regions of interest (ROIs) selected from 15 unique temporal cortex WSIs of AD cases from three institutions (University of California (UC)-Davis, UC-San Diego, and Columbia University). Segmenting NFTs at the single-pixel level, the model achieved an area under the receiver operating characteristic of 0.832 and an F1 of 0.527 (196-fold over random) on a held-out test set of 664 NFTs from 20 ROIs (7 WSIs). We compared this to deep object detection, which achieved comparable but coarser-grained performance that was 60% faster. The segmentation and object detection models correlated well with expert semi-quantitative scores at the whole-slide level (Spearman's rho ρ=0.654 (p=6.50e-5) and ρ=0.513 (p=3.18e-3), respectively). We openly release this multi-institution deep-learning pipeline to provide detailed NFT spatial distribution and morphology analysis capability at a scale otherwise infeasible by manual assessment.

PMID:39386601 | PMC:PMC11463656 | DOI:10.1101/2024.05.15.594372

Categories: Literature Watch

scEMB: Learning context representation of genes based on large-scale single-cell transcriptomics

Thu, 2024-10-10 06:00

bioRxiv [Preprint]. 2024 Sep 26:2024.09.24.614685. doi: 10.1101/2024.09.24.614685.

ABSTRACT

BACKGROUND: The rapid advancement of single-cell transcriptomic technologies has led to the curation of millions of cellular profiles, providing unprecedented insights into cellular heterogeneity across various tissues and developmental stages. This growing wealth of data presents an opportunity to uncover complex gene-gene relationships, yet also poses significant computational challenges.

RESULTS: We present scEMB, a transformer-based deep learning model developed to capture context-aware gene embeddings from large-scale single-cell transcriptomics data. Trained on over 30 million single-cell transcriptomes, scEMB utilizes an innovative binning strategy that integrates data across multiple platforms, effectively preserving both gene expression hierarchies and cell-type specificity. In downstream tasks such as batch integration, clustering, and cell type annotation, scEMB demonstrates superior performance compared to existing models like scGPT and Geneformer. Notably, scEMB excels in silico correlation analysis, accurately predicting gene perturbation effects in CRISPR-edited datasets and microglia state transition, identifying a few known Alzheimer's disease (AD) risks genes in top gene list. Additionally, scEMB offers robust fine-tuning capabilities for domain-specific applications, making it a versatile tool for tackling diverse biological problems such as therapeutic target discovery and disease modeling.

CONCLUSIONS: scEMB represents a powerful tool for extracting biologically meaningful insights from complex gene expression data. Its ability to model in silico perturbation effects and conduct correlation analyses in the embedding space highlights its potential to accelerate discoveries in precision medicine and therapeutic development.

PMID:39386549 | PMC:PMC11463607 | DOI:10.1101/2024.09.24.614685

Categories: Literature Watch

TransCDR: a deep learning model for enhancing the generalizability of drug activity prediction through transfer learning and multimodal data fusion

Wed, 2024-10-09 06:00

BMC Biol. 2024 Oct 9;22(1):227. doi: 10.1186/s12915-024-02023-8.

ABSTRACT

BACKGROUND: Accurate and robust drug response prediction is of utmost importance in precision medicine. Although many models have been developed to utilize the representations of drugs and cancer cell lines for predicting cancer drug responses (CDR), their performances can be improved by addressing issues such as insufficient data modality, suboptimal fusion algorithms, and poor generalizability for novel drugs or cell lines.

RESULTS: We introduce TransCDR, which uses transfer learning to learn drug representations and fuses multi-modality features of drugs and cell lines by a self-attention mechanism, to predict the IC50 values or sensitive states of drugs on cell lines. We are the first to systematically evaluate the generalization of the CDR prediction model to novel (i.e., never-before-seen) compound scaffolds and cell line clusters. TransCDR shows better generalizability than 8 state-of-the-art models. TransCDR outperforms its 5 variants that train drug encoders (i.e., RNN and AttentiveFP) from scratch under various scenarios. The most critical contributors among multiple drug notations and omics profiles are Extended Connectivity Fingerprint and genetic mutation. Additionally, the attention-based fusion module further enhances the predictive performance of TransCDR. TransCDR, trained on the GDSC dataset, demonstrates strong predictive performance on the external testing set CCLE. It is also utilized to predict missing CDRs on GDSC. Moreover, we investigate the biological mechanisms underlying drug response by classifying 7675 patients from TCGA into drug-sensitive or drug-resistant groups, followed by a Gene Set Enrichment Analysis.

CONCLUSIONS: TransCDR emerges as a potent tool with significant potential in drug response prediction.

PMID:39385185 | DOI:10.1186/s12915-024-02023-8

Categories: Literature Watch

Accelerated muscle mass estimation from CT images through transfer learning

Wed, 2024-10-09 06:00

BMC Med Imaging. 2024 Oct 9;24(1):271. doi: 10.1186/s12880-024-01449-4.

ABSTRACT

BACKGROUND: The cost of labeling to collect training data sets using deep learning is especially high in medical applications compared to other fields. Furthermore, due to variances in images depending on the computed tomography (CT) devices, a deep learning based segmentation model trained with a certain device often does not work with images from a different device.

METHODS: In this study, we propose an efficient learning strategy for deep learning models in medical image segmentation. We aim to overcome the difficulties of segmentation in CT images by training a VNet segmentation model which enables rapid labeling of organs in CT images with the model obtained by transfer learning using a small number of manually labeled images, called SEED images. We established a process for generating SEED images and conducting transfer learning a model. We evaluate the performance of various segmentation models such as vanilla UNet, UNETR, Swin-UNETR and VNet. Furthermore, assuming a scenario that a model is repeatedly trained with CT images collected from multiple devices, in which is catastrophic forgetting often occurs, we examine if the performance of our model degrades.

RESULTS: We show that transfer learning can train a model that does a good job of segmenting muscles with a small number of images. In addition, it was confirmed that VNet shows better performance when comparing the performance of existing semi-automated segmentation tools and other deep learning networks to muscle and liver segmentation tasks. Additionally, we confirmed that VNet is the most robust model to deal with catastrophic forgetting problems.

CONCLUSION: In the 2D CT image segmentation task, we confirmed that the CNN-based network shows better performance than the existing semi-automatic segmentation tool or latest transformer-based networks.

PMID:39385108 | DOI:10.1186/s12880-024-01449-4

Categories: Literature Watch

Prediction of adverse drug reactions using demographic and non-clinical drug characteristics in FAERS data

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23636. doi: 10.1038/s41598-024-74505-2.

ABSTRACT

The presence of adverse drug reactions (ADRs) is an ongoing public health concern. While traditional methods to discover ADRs are very costly and limited, it is prudent to predict ADRs through non-invasive methods such as machine learning based on existing data. Although various studies exist regarding ADR prediction using non-clinical data, a process that leverages both demographic and non-clinical data for ADR prediction is missing. In addition, the importance of individual features in ADR prediction has yet to be fully explored. This study aims to develop an ADR prediction model based on demographic and non-clinical data, where we identify the highest contributing factors. We focus our efforts on 30 common and severe ADRs reported to the Food and Drug Administration (FDA) between 2012 and 2023. We have developed a random forest (RF) and deep learning (DL) machine learning model that ingests demographic data (e.g., Age and Gender of patients) and non-clinical data, which includes chemical, molecular, and biological drug characteristics. We successfully unified both demographic and non-clinical data sources within a complete dataset regarding ADR prediction. Model performances were assessed via the area under the receiver operating characteristic curve (AUC) and the mean average precision (MAP). We demonstrated that our parsimonious models, which include only the top 20 most important features comprising 5 demographic features and 15 non-clinical features (13 molecular and 2 biological), achieve ADR prediction performance comparable to a less practical, feature-rich model consisting of all 2,315 features. Specifically, our models achieved an AUC of 0.611 and 0.674 for RF and DL algorithms, respectively. We hope our research provides researchers and clinicians with valuable insights and facilitates future research designs by identifying top ADR predictors (including demographic information) and practical parsimonious models.

PMID:39384938 | DOI:10.1038/s41598-024-74505-2

Categories: Literature Watch

Predicting SARS-CoV-2 infection among hemodialysis patients using deep neural network methods

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23588. doi: 10.1038/s41598-024-74967-4.

ABSTRACT

COVID-19 has a higher rate of morbidity and mortality among dialysis patients than the general population. Identifying infected patients early with the support of predictive models helps dialysis centers implement concerted procedures (e.g., temperature screenings, universal masking, isolation treatments) to control the spread of SARS-CoV-2 and mitigate outbreaks. We collect data from multiple sources, including demographics, clinical, treatment, laboratory, vaccination, socioeconomic status, and COVID-19 surveillance. Previous early prediction models, such as logistic regression, SVM, and XGBoost, require sophisticated feature engineering and need improved prediction performance. We create deep learning models, including Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to predict SARS-CoV-2 infections during incubation. Our study shows deep learning models with minimal feature engineering can identify those infected patients more accurately than previously built models. Our Long Short-Term Memory (LSTM) model consistently performed well, with an AUC exceeding 0.80, peaking at 0.91 in August 2021. The CNN model also demonstrated strong results with an AUC above 0.75. Both models outperformed previous best XGBoost models by over 0.10 in AUC. Prediction accuracy declined as the pandemic evolved, dropping to approximately 0.75 between September 2021 and January 2022. Maintaining a 20% false positive rate, our LSTM and CNN models identified 66% and 64% of positive cases among patients, significantly outperforming XGBoost models at 42%. We also identify key features for dialysis patients by calculating the gradient of the output with respect to the input features. By closely monitoring these factors, dialysis patients can receive earlier diagnoses and care, leading to less severe outcomes. Our research highlights the effectiveness of deep neural networks in analyzing longitudinal data, especially in predicting COVID-19 infections during the crucial incubation period. These deep network approaches surpass traditional methods relying on aggregated variable means, significantly improving the accurate identification of SARS-CoV-2 infections.

PMID:39384931 | DOI:10.1038/s41598-024-74967-4

Categories: Literature Watch

Deep learning pipeline for automated cell profiling from cyclic imaging

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23600. doi: 10.1038/s41598-024-74597-w.

ABSTRACT

Cyclic fluorescence microscopy enables multiple targets to be detected simultaneously. This, in turn, has deepened our understanding of tissue composition, cell-to-cell interactions, and cell signaling. Unfortunately, analysis of these datasets can be time-prohibitive due to the sheer volume of data. In this paper, we present CycloNET, a computational pipeline tailored for analyzing raw fluorescent images obtained through cyclic immunofluorescence. The automated pipeline pre-processes raw image files, quickly corrects for translation errors between imaging cycles, and leverages a pre-trained neural network to segment individual cells and generate single-cell molecular profiles. We applied CycloNET to a dataset of 22 human samples from head and neck squamous cell carcinoma patients and trained a neural network to segment immune cells. CycloNET efficiently processed a large-scale dataset (17 fields of view per cycle and 13 staining cycles per specimen) in 10 min, delivering insights at the single-cell resolution and facilitating the identification of rare immune cell clusters. We expect that this rapid pipeline will serve as a powerful tool to understand complex biological systems at the cellular level, with the potential to facilitate breakthroughs in areas such as developmental biology, disease pathology, and personalized medicine.

PMID:39384907 | DOI:10.1038/s41598-024-74597-w

Categories: Literature Watch

PRN: progressive reasoning network and its image completion applications

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23519. doi: 10.1038/s41598-024-72368-1.

ABSTRACT

Ancient murals embody profound historical, cultural, scientific, and artistic values, yet many are afflicted with challenges such as pigment shedding or missing parts. While deep learning-based completion techniques have yielded remarkable results in restoring natural images, their application to damaged murals has been unsatisfactory due to data shifts and limited modeling efficacy. This paper proposes a novel progressive reasoning network designed specifically for mural image completion, inspired by the mural painting process. The proposed network comprises three key modules: a luminance reasoning module, a sketch reasoning module, and a color fusion module. The first two modules are based on the double-codec framework, designed to infer missing areas' luminance and sketch information. The final module then utilizes a paired-associate learning approach to reconstruct the color image. This network utilizes two parallel, complementary pathways to estimate the luminance and sketch maps of a damaged mural. Subsequently, these two maps are combined to synthesize a complete color image. Experimental results indicate that the proposed network excels in restoring clearer structures and more vivid colors, surpassing current state-of-the-art methods in both quantitative and qualitative assessments for repairing damaged images. Our code and results will be publicly accessible at https://github.com/albestobe/PRN .

PMID:39384878 | DOI:10.1038/s41598-024-72368-1

Categories: Literature Watch

Analyzing hope speech from psycholinguistic and emotional perspectives

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23548. doi: 10.1038/s41598-024-74630-y.

ABSTRACT

Hope is a vital coping mechanism, enabling individuals to effectively confront life's challenges. This study proposes a technique employing Natural Language Processing (NLP) tools like Linguistic Inquiry and Word Count (LIWC), NRC-emotion-lexicon, and vaderSentiment to analyze social media posts, extracting psycholinguistic, emotional, and sentimental features from a hope speech dataset. The findings of this study reveal distinct cognitive, emotional, and communicative characteristics and psycholinguistic dimensions, emotions, and sentiments associated with different types of hope shared in social media. Furthermore, the study investigates the potential of leveraging this data to classify different types of hope using machine learning algorithms. Notably, models such as LightGBM and CatBoost demonstrate impressive performance, surpassing traditional methods and competing effectively with deep learning techniques. We employed hyperparameter tuning to optimize the models' parameters and compared their performance using both default and tuned settings. The results highlight the enhanced efficiency achieved through hyperparameter tuning for these models.

PMID:39384851 | DOI:10.1038/s41598-024-74630-y

Categories: Literature Watch

SMGformer: integrating STL and multi-head self-attention in deep learning model for multi-step runoff forecasting

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23550. doi: 10.1038/s41598-024-74329-0.

ABSTRACT

Accurate runoff forecasting is of great significance for water resource allocation flood control and disaster reduction. However, due to the inherent strong randomness of runoff sequences, this task faces significant challenges. To address this challenge, this study proposes a new SMGformer runoff forecast model. The model integrates Seasonal and Trend decomposition using Loess (STL), Informer's Encoder layer, Bidirectional Gated Recurrent Unit (BiGRU), and Multi-head self-attention (MHSA). Firstly, in response to the nonlinear and non-stationary characteristics of the runoff sequence, the STL decomposition is used to extract the runoff sequence's trend, period, and residual terms, and a multi-feature set based on 'sequence-sequence' is constructed as the input of the model, providing a foundation for subsequent models to capture the evolution of runoff. The key features of the input set are then captured using the Informer's Encoder layer. Next, the BiGRU layer is used to learn the temporal information of these features. To further optimize the output of the BiGRU layer, the MHSA mechanism is introduced to emphasize the impact of important information. Finally, accurate runoff forecasting is achieved by transforming the output of the MHSA layer through the Fully connected layer. To verify the effectiveness of the proposed model, monthly runoff data from two hydrological stations in China are selected, and eight models are constructed to compare the performance of the proposed model. The results show that compared with the Informer model, the 1th step MAE of the SMGformer model decreases by 42.2% and 36.6%, respectively; RMSE decreases by 37.9% and 43.6% respectively; NSE increases from 0.936 to 0.975 and from 0.487 to 0.837, respectively. In addition, the KGE of the SMGformer model at the 3th step are 0.960 and 0.805, both of which can maintain above 0.8. Therefore, the model can accurately capture key information in the monthly runoff sequence and extend the effective forecast period of the model.

PMID:39384833 | DOI:10.1038/s41598-024-74329-0

Categories: Literature Watch

Towards transforming malaria vector surveillance using VectorBrain: a novel convolutional neural network for mosquito species, sex, and abdomen status identifications

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 10;14(1):23647. doi: 10.1038/s41598-024-71856-8.

ABSTRACT

Malaria is a major public health concern, causing significant morbidity and mortality globally. Monitoring the local population density and diversity of the vectors transmitting malaria is critical to implementing targeted control strategies. However, the current manual identification of mosquitoes is a time-consuming and intensive task, posing challenges in low-resource areas like sub-Saharan Africa; in addition, existing automated identification methods lack scalability, mobile deployability, and field-test validity. To address these bottlenecks, a mosquito image database with fresh wild-caught specimens using basic smartphones is introduced, and we present a novel CNN-based architecture, VectorBrain, designed for identifying the species, sex, and abdomen status of a mosquito concurrently while being efficient and lightweight in computation and size. Overall, our proposed approach achieves 94.44±2% accuracy with a macro-averaged F1 score of 94.10±2% for the species classification, 97.66±1% accuracy with a macro-averaged F1 score of 96.17±1% for the sex classification, and 82.20±3.1% accuracy with a macro-averaged F1 score of 81.17±3% for the abdominal status classification. VectorBrain running on local mobile devices, paired with a low-cost handheld imaging tool, is promising in transforming the mosquito vector surveillance programs by reducing the burden of expertise required and facilitating timely response based on accurate monitoring.

PMID:39384771 | DOI:10.1038/s41598-024-71856-8

Categories: Literature Watch

AutoGater: a weakly supervised neural network model to gate cells in flow cytometric analyses

Wed, 2024-10-09 06:00

Sci Rep. 2024 Oct 9;14(1):23581. doi: 10.1038/s41598-024-66936-8.

ABSTRACT

Flow cytometry is a useful and efficient method for the rapid characterization of a cell population based on the optical and fluorescence properties of individual cells. Ideally, the cell population would consist of only healthy viable cells as dead cells can confound the analysis. Thus, separating out healthy cells from dying and dead cells, and any potential debris, is an important first step in analysis of flow cytometry data. While gating of debris can be conducted using measured optical properties, identifying dead and dying cells often requires utilizing fluorescent stains (e.g. Sytox, a nucleic acid stain that stains cells with compromised cell membranes) to identify cells that should be excluded from downstream analyses. These stains prolong the experimental preparation process and use a flow cytometer's fluorescence channels that could otherwise be used to measure additional fluorescent markers within the cells (e.g. reporter proteins). Here we outline a stain-free method for identifying viable cells for downstream processing by gating cells that are dying or dead. AutoGater is a weakly supervised deep learning model that can separate healthy populations from unhealthy and dead populations using only light-scatter channels. In addition, AutoGater harmonizes different measurements of dead cells such as Sytox and CFUs.

PMID:39384769 | DOI:10.1038/s41598-024-66936-8

Categories: Literature Watch

MSDAFL: Molecular substructure-based dual attention feature learning framework for predicting drug-drug interactions

Wed, 2024-10-09 06:00

Bioinformatics. 2024 Oct 9:btae596. doi: 10.1093/bioinformatics/btae596. Online ahead of print.

ABSTRACT

MOTIVATION: Drug-drug interactions (DDIs) can cause unexpected adverse drug reactions, affecting treatment efficacy and patient safety. The need for computational methods to predict DDIs has been growing due to the necessity of identifying potential risks associated with drug combinations in advance. Although several deep learning methods have been recently proposed to predict DDIs, many overlook feature learning based on interactions between the substructures of drug pairs.

RESULTS: In this work, we introduce a molecular Substructure-based Dual Attention Feature Learning framework (MSDAFL), designed to fully utilize the information between substructures of drug pairs to enhance the performance of DDI prediction. We employ a self-attention module to obtain a set number of self-attention vectors, which are associated with various substructural patterns of the drug molecule itself, while also extracting interaction vectors representing inter-substructure interactions between drugs through an interactive attention module. Subsequently, an interaction module based on cosine similarity is used to further capture the interactive characteristics between the self-attention vectors of drug pairs. We also perform normalization after the interaction feature extraction to mitigate overfitting. After applying three-fold cross-validation, the MSDAFL model achieved average precision (AP) scores of 0.9707, 0.9991, and 0.9987, and area under the receiver operating characteristic curve (AUROC) scores of 0.9874, 0.9934, and 0.9974 on three datasets, respectively. In addition, the experiment results of five-fold cross-validation and cross-datum study also indicate that MSDAFL performs well in predicting drug-drug interactions.

AVAILABILITY AND IMPLEMENTATION: Data and source codes are available at https://github.com/27167199/MSDAFL.

CONTACT: Yancheng01@hnucm.edu.cn.

PMID:39383521 | DOI:10.1093/bioinformatics/btae596

Categories: Literature Watch

AmpClass: an Antimicrobial Peptide Predictor Based on Supervised Machine Learning

Wed, 2024-10-09 06:00

An Acad Bras Cienc. 2024 Oct 4;96(4):e20230756. doi: 10.1590/0001-3765202420230756. eCollection 2024.

ABSTRACT

In the last decades, antibiotic resistance has been considered a severe problem worldwide. Antimicrobial peptides (AMPs) are molecules that have shown potential for the development of new drugs against antibiotic-resistant bacteria. Nowadays, medicinal drug researchers use supervised learning methods to screen new peptides with antimicrobial potency to save time and resources. In this work, we consolidate a database with 15945 AMPs and 12535 non-AMPs taken as the base to train a pool of supervised learning models to recognize peptides with antimicrobial activity. Results show that the proposed tool (AmpClass) outperforms classical state-of-the-art prediction models and achieves similar results compared with deep learning models.

PMID:39383429 | DOI:10.1590/0001-3765202420230756

Categories: Literature Watch

Deep learning to capture leaf shape in plant images: Validation by geometric morphometrics

Wed, 2024-10-09 06:00

Plant J. 2024 Oct 9. doi: 10.1111/tpj.17053. Online ahead of print.

ABSTRACT

Plant leaves play a pivotal role in automated species identification using deep learning (DL). However, achieving reproducible capture of leaf variation remains challenging due to the inherent "black box" problem of DL models. To evaluate the effectiveness of DL in capturing leaf shape, we used geometric morphometrics (GM), an emerging component of eXplainable Artificial Intelligence (XAI) toolkits. We photographed Ranunculus auricomus leaves directly in situ and after herbarization. From these corresponding leaf images, we automatically extracted DL features using a neural network and digitized leaf shapes using GM. The association between the extracted DL features and GM shapes was then evaluated using dimension reduction and covariation models. DL features facilitated the clustering of leaf images by source populations in both in situ and herbarized leaf image datasets, and certain DL features were significantly associated with biological leaf shape variation as inferred by GM. DL features also enabled leaf classification into morpho-phylogenomic groups within the intricate R. auricomus species complex. We demonstrated that simple in situ leaf imaging and DL reproducibly captured leaf shape variation at the population level, while combining this approach with GM provided key insights into the shape information extracted from images by computer vision, a necessary prerequisite for reliable automated plant phenotyping.

PMID:39383323 | DOI:10.1111/tpj.17053

Categories: Literature Watch

Evaluating Explainable Artificial Intelligence (XAI) techniques in chest radiology imaging through a human-centered Lens

Wed, 2024-10-09 06:00

PLoS One. 2024 Oct 9;19(10):e0308758. doi: 10.1371/journal.pone.0308758. eCollection 2024.

ABSTRACT

The field of radiology imaging has experienced a remarkable increase in using of deep learning (DL) algorithms to support diagnostic and treatment decisions. This rise has led to the development of Explainable AI (XAI) system to improve the transparency and trust of complex DL methods. However, XAI systems face challenges in gaining acceptance within the healthcare sector, mainly due to technical hurdles in utilizing these systems in practice and the lack of human-centered evaluation/validation. In this study, we focus on visual XAI systems applied to DL-enabled diagnostic system in chest radiography. In particular, we conduct a user study to evaluate two prominent visual XAI techniques from the human perspective. To this end, we created two clinical scenarios for diagnosing pneumonia and COVID-19 using DL techniques applied to chest X-ray and CT scans. The achieved accuracy rates were 90% for pneumonia and 98% for COVID-19. Subsequently, we employed two well-known XAI methods, Grad-CAM (Gradient-weighted Class Activation Mapping) and LIME (Local Interpretable Model-agnostic Explanations), to generate visual explanations elucidating the AI decision-making process. The visual explainability results were shared through a user study, undergoing evaluation by medical professionals in terms of clinical relevance, coherency, and user trust. In general, participants expressed a positive perception of the use of XAI systems in chest radiography. However, there was a noticeable lack of awareness regarding their value and practical aspects. Regarding preferences, Grad-CAM showed superior performance over LIME in terms of coherency and trust, although concerns were raised about its clinical usability. Our findings highlight key user-driven explainability requirements, emphasizing the importance of multi-modal explainability and the necessity to increase awareness of XAI systems among medical practitioners. Inclusive design was also identified as a crucial need to ensure better alignment of these systems with user needs.

PMID:39383147 | DOI:10.1371/journal.pone.0308758

Categories: Literature Watch

MRI-Seed-Wizard: Combining Deep Learning Algorithms with Magnetic Resonance Imaging Enables Advanced Seed Phenotyping

Wed, 2024-10-09 06:00

J Exp Bot. 2024 Oct 9:erae408. doi: 10.1093/jxb/erae408. Online ahead of print.

ABSTRACT

Evaluation of relevant seed traits is an essential part of most plant breeding and biotechnology programs. There is need for non-destructive, three-dimensional assessment of the morphometry, composition, and internal features of seeds. Here, we introduced a novel tool, MRI-Seed-Wizard, which integrates deep learning algorithms with non-invasive magnetic resonance imaging (MRI) for its use in the new domain - plant MRI. The tool enabled in vivo quantification of 23 grain traits, including volumetric parameters of inner seed structure. Several of these features cannot be assessed using conventional techniques, including X-ray computed tomography. MRI-Seed-Wizard was designed to automate the manual processes of identifying, labeling, and analyzing digital MRI data. We further provide advanced MRI protocols that allow the evaluation of multiple seeds simultaneously to increase throughput. The versatility of MRI-Seed-Wizard in seed phenotyping was demonstrated for wheat (Triticum aestivum) and barley (Hordeum vulgare) grains, and is applicable to a wide range of crop seeds. Thus, artificial intelligence, combined with the most versatile imaging modality - MRI, opens up new perspectives in seed phenotyping and crop improvement.

PMID:39383098 | DOI:10.1093/jxb/erae408

Categories: Literature Watch

Automated Quantification of HER2 Amplification Levels Using Deep Learning

Wed, 2024-10-09 06:00

IEEE J Biomed Health Inform. 2024 Oct 9;PP. doi: 10.1109/JBHI.2024.3476554. Online ahead of print.

ABSTRACT

HER2 assessment is necessary for patient selection in anti-HER2 targeted treatment. However, manual assessment of HER2 amplification is time-costly, labor-intensive, highly subjective and error-prone. Challenges in HER2 analysis in fluorescence in situ hybridization (FISH) and dual in situ hybridization (DISH) images include unclear and blurry cell boundaries, large variations in cell shapes and signals, overlapping and clustered cells and sparse label issues with manual annotations only on cells with high confidences, producing subjective assessment scores according to the individual choices on cell selection. To address the above-mentioned issues, we have developed a soft-sampling cascade deep learning model and a signal detection model in quantifying CEN17 and HER2 of cells to assist assessment of HER2 amplification status for patient selection of HER2 targeting therapy to breast cancer. In evaluation with two different kinds of clinical datasets, including a FISH data set and a DISH data set, the proposed method achieves high accuracy, recall and F1-score for both datasets in instance segmentation of HER2 related cells that must contain both CEN17 and HER2 signals. Moreover, the proposed method is demonstrated to significantly outperform seven state of the art recently published deep learning methods, including contour proposal network (CPN), soft label-based FCN (SL-FCN), modified fully convolutional network (M-FCN), bilayer convolutional network (BCNet), SOLOv2, Cascade R-CNN and DeepLabv3+ with three different backbones (p ≤ 0.01). Clinically, anti-HER2 therapy can also be applied to gastric cancer patients. We applied the developed model to assist in HER2 DISH amplification assessment for gastric cancer patients, and it also showed promising predictive results (accuracy 97.67 ±1.46%, precision 96.15 ±5.82%, respectively).

PMID:39383086 | DOI:10.1109/JBHI.2024.3476554

Categories: Literature Watch

MiRS-HF: A Novel Deep Learning Predictor for Cancer Classification and miRNA Expression Patterns

Wed, 2024-10-09 06:00

IEEE J Biomed Health Inform. 2024 Oct 9;PP. doi: 10.1109/JBHI.2024.3476672. Online ahead of print.

ABSTRACT

Cancer classification and biomarker identification are crucial for guiding personalized treatment. To make effective use of miRNA associations and expression data, we have developed a deep learning model for cancer classification and biomarker identification. To make effective use of miRNA associations and expression data, we have developed a deep learning model for cancer classification and biomarker identification. We propose an approach for cancer classification called MiRNA Selection and Hybrid Fusion (MiRS-HF), which consists of early fusion and intermediate fusion. The early fusion involves applying a Layer Attention Graph Convolutional Network (LAGCN) to a miRNA-disease heterogeneous network, resulting in a miRNA-disease association degree score matrix. The intermediate fusion employs a Graph Convolutional Network (GCN) in the classification tasks, weighting the expression data based on the miRNA-disease association degree score. Furthermore, MiRS-HF can identify the important miRNA biomarkers and their expression patterns. The proposed method demonstrates superior performance in the classification tasks of six cancers compared to other methods. Simultaneously, we incorporated the feature weighting strategy into the comparison algorithm, leading to a significant improvement in the algorithm's results, highlighting the extreme importance of this strategy.

PMID:39383085 | DOI:10.1109/JBHI.2024.3476672

Categories: Literature Watch

Pages