Deep learning
Diagnosis of Alzheimer's disease using transfer learning with multi-modal 3D Inception-v4
Quant Imaging Med Surg. 2025 Feb 1;15(2):1455-1467. doi: 10.21037/qims-24-1577. Epub 2025 Jan 20.
ABSTRACT
BACKGROUND: Deep learning (DL) technologies are playing increasingly important roles in computer-aided diagnosis in medicine. In this study, we sought to address issues related to the diagnosis of Alzheimer's disease (AD) based on multi-modal features, and introduced a multi-modal three-dimensional Inception-v4 model that employs transfer learning for AD diagnosis based on magnetic resonance imaging (MRI) and clinical score data.
METHODS: The multi-modal three-dimensional (3D) Inception-v4 model was first pre-trained using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Subsequently, independent validation data were used to fine-tune the model with pre-trained weight parameters. The model was quantitatively evaluated using the mean values obtained from five-fold cross-validation. Further, control experiments were conducted to verify the performance of the model patients with AD, and in the study of disease progression.
RESULTS: In the AD diagnosis task, when a single image marker was used, the average accuracy (ACC) and area under the curve (AUC) were 62.21% and 71.87%, respectively. When transfer learning was not employed, the average ACC and AUC were 75.74% and 83.13%, respectively. Conversely, the combined approach proposed in this study achieved an average ACC of 87.84%, and an average AUC of 90.80% [with an average precision (PRE) of 87.21%, an average recall (REC) of 82.52%, and an average F1 of 83.58%].
CONCLUSIONS: In comparison with existing methods, the performance of the proposed method was superior in terms of diagnostic accuracy. Specifically, the method showed an enhanced ability to accurately distinguish among various stages of AD. Our findings show that multi-modal feature fusion and transfer learning can be valuable resources in the treatment of patients with AD, and in the study of disease progression.
PMID:39995734 | PMC:PMC11847174 | DOI:10.21037/qims-24-1577
CTDNN-Spoof: compact tiny deep learning architecture for detection and multi-label classification of GPS spoofing attacks in small UAVs
Sci Rep. 2025 Feb 24;15(1):6656. doi: 10.1038/s41598-025-90809-3.
ABSTRACT
GPS spoofing presents a significant threat to small Unmanned Aerial Vehicles (UAVs) by manipulating navigation systems, potentially causing safety risks, privacy violations, and mission disruptions. Effective countermeasures include secure GPS signal authentication, anti-spoofing technologies, and continuous monitoring to detect and respond to such threats. Safeguarding small UAVs from GPS spoofing is crucial for their reliable operation in applications such as surveillance, agriculture, and environmental monitoring. In this paper, we propose a compact, tiny deep learning architecture named CTDNN-Spoof for detecting and multi-label classifying GPS spoofing attacks in small UAVs. The architecture utilizes a sequential neural network with 64 neurons in the input layer (ReLU activation), 32 neurons in the hidden layer (ReLU activation), and 4 neurons in the output layer (linear activation), optimized with the Adam optimizer. We use Mean Squared Error (MSE) loss for regression and accuracy for evaluation. First, early stopping with a patience of 10 epochs is implemented to improve training efficiency and restore the best weights. Furthermore, the model is also trained for 50 epochs, and its performance is assessed using a separate validation set. Additionally, we use two other models to compare with the CTDNN-Spoof in terms of complexity, loss, and accuracy. The proposed CTDNN-Spoof demonstrates varying accuracies across different labels, with the proposed architecture achieving the highest performance and promising time complexity. These results highlight the model's effectiveness in mitigating GPS spoofing threats in UAVs. This innovative approach provides a scalable, real-time solution to enhance UAV security, surpassing traditional methods in precision and adaptability.
PMID:39994281 | DOI:10.1038/s41598-025-90809-3
An enhanced denoising system for mammogram images using deep transformer model with fusion of local and global features
Sci Rep. 2025 Feb 24;15(1):6562. doi: 10.1038/s41598-025-89451-w.
ABSTRACT
Image denoising is a critical problem in low-level computer vision, where the aim is to reconstruct a clean, noise-free image from a noisy input, such as a mammogram image. In recent years, deep learning, particularly convolutional neural networks (CNNs), has shown great success in various image processing tasks, including denoising, image compression, and enhancement. While CNN-based approaches dominate, Transformer models have recently gained popularity for computer vision tasks. However, there have been fewer applications of Transformer-based models to low-level vision problems like image denoising. In this study, a novel denoising network architecture called DeepTFormer is proposed, which leverages Transformer models for the task. The DeepTFormer architecture consists of three main components: a preprocessing module, a local-global feature extraction module, and a reconstruction module. The local-global feature extraction module is the core of DeepTFormer, comprising several groups of ITransformer layers. Each group includes a series of Transformer layers, convolutional layers, and residual connections. These groups are tightly coupled with residual connections, which allow the model to capture both local and global information from the noisy images effectively. The design of these groups ensures that the model can utilize both local features for fine details and global features for larger context, leading to more accurate denoising. To validate the performance of the DeepTFormer model, extensive experiments were conducted using both synthetic and real noise data. Objective and subjective evaluations demonstrated that DeepTFormer outperforms leading denoising methods. The model achieved impressive results, surpassing state-of-the-art techniques in terms of key metrics like PSNR, FSIM, EPI, and SSIM, with values of 0.41, 0.93, 0.96, and 0.94, respectively. These results demonstrate that DeepTFormer is a highly effective solution for image denoising, combining the power of Transformer architecture with convolutional layers to enhance both local and global feature extraction.
PMID:39994276 | DOI:10.1038/s41598-025-89451-w
Deep structured learning with vision intelligence for oral carcinoma lesion segmentation and classification using medical imaging
Sci Rep. 2025 Feb 24;15(1):6610. doi: 10.1038/s41598-025-89971-5.
ABSTRACT
Oral carcinoma (OC) is a toxic illness among the most general malignant cancers globally, and it has developed a gradually significant public health concern in emerging and low-to-middle-income states. Late diagnosis, high incidence, and inadequate treatment strategies remain substantial challenges. Analysis at an initial phase is significant for good treatment, prediction, and existence. Despite the current growth in the perception of molecular devices, late analysis and methods near precision medicine for OC patients remain a challenge. A machine learning (ML) model was employed to improve early detection in medicine, aiming to reduce cancer-specific mortality and disease progression. Recent advancements in this approach have significantly enhanced the extraction and diagnosis of critical information from medical images. This paper presents a Deep Structured Learning with Vision Intelligence for Oral Carcinoma Lesion Segmentation and Classification (DSLVI-OCLSC) model for medical imaging. Using medical imaging, the DSLVI-OCLSC model aims to enhance OC's classification and recognition outcomes. To accomplish this, the DSLVI-OCLSC model utilizes wiener filtering (WF) as a pre-processing technique to eliminate the noise. In addition, the ShuffleNetV2 method is used for the group of higher-level deep features from an input image. The convolutional bidirectional long short-term memory network with a multi-head attention mechanism (MA-CNN-BiLSTM) approach is utilized for oral carcinoma recognition and identification. Moreover, the Unet3 + is employed to segment abnormal regions from the classified images. Finally, the sine cosine algorithm (SCA) approach is utilized to hyperparameter-tune the DL model. A wide range of simulations is implemented to ensure the enhanced performance of the DSLVI-OCLSC method under the OC images dataset. The experimental analysis of the DSLVI-OCLSC method portrayed a superior accuracy value of 98.47% over recent approaches.
PMID:39994267 | DOI:10.1038/s41598-025-89971-5
Progress on intelligent metasurfaces for signal relay, transmitter, and processor
Light Sci Appl. 2025 Feb 25;14(1):93. doi: 10.1038/s41377-024-01729-2.
ABSTRACT
Pursuing higher data rate with limited spectral resources is a longstanding topic that has triggered the fast growth of modern wireless communication techniques. However, the massive deployment of active nodes to compensate for propagation loss necessitates high hardware expenditure, energy consumption, and maintenance cost, as well as complicated network interference issues. Intelligent metasurfaces, composed of a number of subwavelength passive or active meta-atoms, have recently found to be a new paradigm to actively reshape wireless communication environment in a green way, distinct from conventional works that passively adapt to the surrounding. In this review, we offer a unified perspective on how intelligent metasurfaces can facilitate wireless communication in three manners: signal relay, signal transmitter, and signal processor. We start by the basic modeling of wireless channel and the evolution of metasurfaces from passive, active to intelligent metasurfaces. Integrated with various deep learning algorithms, intelligent metasurfaces adapt to cater for the ever-changing environments without human intervention. Then, we overview specific experimental advancements using intelligent metasurfaces. We conclude by identifying key issues in the practical implementations of intelligent metasurfaces, and surveying new directions, such as gain metasurfaces and knowledge migration.
PMID:39994200 | DOI:10.1038/s41377-024-01729-2
A PET/CT-based 3D deep learning model for predicting spread through air spaces in stage I lung adenocarcinoma
Clin Transl Oncol. 2025 Feb 24. doi: 10.1007/s12094-025-03870-9. Online ahead of print.
ABSTRACT
PURPOSE: This study evaluates a three-dimensional (3D) deep learning (DL) model based on fluorine-18 fluorodeoxyglucose positron emission tomography/computed tomography (18F-FDG PET/CT) for predicting the preoperative status of spread through air spaces (STAS) in patients with clinical stage I lung adenocarcinoma (LUAD).
METHODS: A retrospective analysis of 162 patients with stage I LUAD was conducted, splitting data into training and test sets (4:1). Six 3D DL models were developed, and the top-performing PET and CT models (ResNet50) were fused for optimal prediction. The model's clinical utility was assessed through a two-stage reader study.
RESULTS: The fused PET/CT model achieved an area under the curve (AUC) of 0.956 (95% CI 0.9230-0.9881) in the training set and 0.889 (95% CI 0.7624-1.0000) in the test set. Compared to three physicians, the model demonstrated superior sensitivity and specificity. After the artificial intelligence (AI) assistance's participation, the diagnostic accuracy of the physicians improved during their subsequent reading session.
CONCLUSION: Our DL model demonstrates potential as a resource to aid physicians in predicting STAS status and preoperative treatment planning for stage I LUAD, though prospective validation is required.
PMID:39994163 | DOI:10.1007/s12094-025-03870-9
scFTAT: a novel cell annotation method integrating FFT and transformer
BMC Bioinformatics. 2025 Feb 25;26(1):62. doi: 10.1186/s12859-025-06061-z.
ABSTRACT
BACKGROUND: Advancements in high-throughput sequencing and deep learning have boosted single-cell RNA studies. However, current methods for annotating single-cell data face challenges due to high data sparsity and tedious manual annotation on large-scale data.
RESULTS: Thus, we proposed a novel annotation model integrating FFT (Fast Fourier Transform) and an enhanced Transformer, named scFTAT. Initially, it reduces data sparsity using LDA (Linear Discriminant Analysis). Subsequently, automatic cell annotation is achieved through a proposed module integrating FFT and an enhanced Transformer. Moreover, the model is fine-tuned to improve training performance by effectively incorporating such techniques as kernel approximation, position encoding enhancement, and attention enhancement modules. Compared to existing popular annotation tools, scFTAT maintains high accuracy and robustness on six typical datasets. Specifically, the model achieves an accuracy of 0.93 on the human kidney data, with an F1 score of 0.84, precision of 0.96, recall rate of 0.80, and Matthews correlation coefficient of 0.89. The highest accuracy of the compared methods is 0.92, with an F1 score of 0.71, precision of 0.75, recall rate of 0.73, and Matthews correlation coefficient of 0.85. The compiled codes and supplements are available at: https://github.com/gladex/scFTAT .
CONCLUSION: In summary, the proposed scFTAT effectively integrates FFT and enhanced Transformer for automatic feature learning, addressing the challenges of high sparsity and tedious manual annotation in single-cell profiling data. Experiments on six typical scRNA-seq datasets from human and mouse tissues evaluate the model using five metrics as accuracy, F1 score, precision, recall, and Matthews correlation coefficient. Performance comparisons with existing methods further demonstrate the efficiency and robustness of our proposed method.
PMID:39994539 | DOI:10.1186/s12859-025-06061-z
Tensor-powered insights into neural dynamics
Commun Biol. 2025 Feb 24;8(1):298. doi: 10.1038/s42003-025-07711-x.
ABSTRACT
The complex spatiotemporal dynamics of neurons encompass a wealth of information relevant to perception and decision-making, making the decoding of neural activity a central focus in neuroscience research. Traditional machine learning or deep learning-based neural information modeling approaches have achieved significant results in decoding. Nevertheless, such methodologies require the vectorization of data, a process that disrupts the intrinsic relationships inherent in high-dimensional spaces, consequently impeding their capability to effectively process information in high-order tensor domains. In this paper, we introduce a novel decoding approach, namely the Least Squares Sport Tensor Machine (LS-STM), which is based on tensor space and represents a tensorized improvement over traditional vector learning frameworks. In extensive evaluations using human and mouse data, our results demonstrate that LS-STM exhibits superior performance in neural signal decoding tasks compared to traditional vectorization-based decoding methods. Furthermore, LS-STM demonstrates better performance in decoding neural signals with limited samples and the tensor weights of the LS-STM decoder enable the retrospective identification of key neurons during the neural encoding process. This study introduces a novel tensor computing approach and perspective for decoding high-dimensional neural information in the field.
PMID:39994447 | DOI:10.1038/s42003-025-07711-x
Algorithm for pixel-level concrete pavement crack segmentation based on an improved U-Net model
Sci Rep. 2025 Feb 24;15(1):6553. doi: 10.1038/s41598-025-91352-x.
ABSTRACT
Cracks that occur in concrete surfaces are numerous and diverse, and different cracks will affect road safety in different degrees. Accurately identifying pavement cracks is crucial for assessing road conditions and formulating maintenance strategies. This study improves the original U-shaped convolutional network (U-Net) model through the introduction of two innovations, thereby modifying its structure, reducing the number of parameters, enhancing its ability to distinguish between background and cracks, and improving its speed and accuracy in crack detection tasks. Additionally, datasets with different exposure levels and noise conditions are used to train the network, broadening its predictive ability. A custom dataset of 960 road crack images was added to the public dataset to train and evaluate the model. The test results demonstrate that the proposed U-Net-FML model achieves high accuracy and detection speed in complex environments, with MIoU, F1 score, precision, and recall values of 76.4%, 74.2%, 84.2%, and 66.4%, respectively, significantly surpassing those of the other models. Among the seven comparison models, U-Net-FML has the strongest overall performance, highlighting its engineering value for precise detection and efficient analysis of cracks.
PMID:39994438 | DOI:10.1038/s41598-025-91352-x
Real-world feasibility, accuracy and acceptability of automated retinal photography and AI-based cardiovascular disease risk assessment in Australian primary care settings: a pragmatic trial
NPJ Digit Med. 2025 Feb 24;8(1):122. doi: 10.1038/s41746-025-01436-1.
ABSTRACT
We aim to assess the real-world accuracy (primary outcome), feasibility and acceptability (secondary outcomes) of an automated retinal photography and artificial intelligence (AI)-based cardiovascular disease (CVD) risk assessment system (rpCVD) in Australian primary care settings. Participants aged 45-70 years who had recently undergone all or part of a CVD risk assessment were recruited from two general practice clinics in Victoria, Australia. After consenting, participants underwent retinal imaging using an automated fundus camera, and an rpCVD risk score was generated by a deep learning algorithm. This score was compared against the World Health Organisation (WHO) CVD risk score, which incorporates age, sex, and other clinical risk factors. The predictive accuracy of the rpCVD and WHO CVD risk scores for 10-year incident CVD events was evaluated using data from the UK Biobank, with the accuracy of each system assessed through the area under the receiver operating characteristic curve (AUC). Participant satisfaction was assessed through a survey, and the imaging success rate was determined by the percentage of individuals with images of sufficient quality to produce an rpCVD risk score. Of the 361 participants, 339 received an rpCVD risk score, resulting in a 93.9% imaging success rate. The rpCVD risk scores showed a moderate correlation with the WHO CVD risk scores (Pearson correlation coefficient [PCC] = 0.526, 95% CI: 0.444-0.599). Despite this, the rpCVD system, which relies solely on retinal images, demonstrated a similar level of accuracy in predicting 10-year incident CVD (AUC = 0.672, 95% CI: 0.658-0.686) compared to the WHO CVD risk score (AUC = 0.693, 95% CI: 0.680-0.707). High satisfaction rates were reported, with 92.5% of participants and 87.5% of general practitioners (GPs) expressing satisfaction with the system. The automated rpCVD system, using only retinal photographs, demonstrated predictive accuracy comparable to the WHO CVD risk score, which incorporates multiple clinical factors including age, the most heavily weighted factor for CVD prediction. This underscores the potential of the rpCVD approach as a faster, easier, and non-invasive alternative for CVD risk assessment in primary care settings, avoiding the need for more complex clinical procedures.
PMID:39994433 | DOI:10.1038/s41746-025-01436-1
On-patient medical record and mRNA therapeutics using intradermal microneedles
Nat Mater. 2025 Feb 24. doi: 10.1038/s41563-024-02115-4. Online ahead of print.
ABSTRACT
Medical interventions often require timed series of doses, thus necessitating accurate medical record-keeping. In many global settings, these records are unreliable or unavailable at the point of care, leading to less effective treatments or disease prevention. Here we present an invisible-to-the-naked-eye on-patient medical record-keeping technology that accurately stores medical information in the patient skin as part of microneedles that are used for intradermal therapeutics. We optimize the microneedle design for both a reliable delivery of messenger RNA (mRNA) therapeutics and the near-infrared fluorescent microparticles that encode the on-patient medical record-keeping. Deep learning-based image processing enables encoding and decoding of the information with excellent temporal and spatial robustness. Long-term studies in a swine model demonstrate the safety, efficacy and reliability of this approach for the co-delivery of on-patient medical record-keeping and the mRNA vaccine encoding severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This technology could help healthcare workers make informed decisions in circumstances where reliable record-keeping is unavailable, thus contributing to global healthcare equity.
PMID:39994390 | DOI:10.1038/s41563-024-02115-4
Explainable hybrid transformer for multi-classification of lung disease using chest X-rays
Sci Rep. 2025 Feb 24;15(1):6650. doi: 10.1038/s41598-025-90607-x.
ABSTRACT
Lung disease is an infection that causes chronic inflammation of the human lung cells, which is one of the major causes of death around the world. Thoracic X-ray medical image is a well-known cheap screening approach used for lung disease detection. Deep learning networks, which are used to identify disease features in X-rays medical images, diagnosing a variety of lung diseases, are playing an increasingly important role in assisting clinical diagnosis. This paper proposes an explainable transformer with a hybrid network structure (LungMaxViT) combining CNN initial stage block with SE block to improve feature recognition for predicting Chest X-ray images for multiple lung disease classification. We contrast four classical pre-training models (ResNet50, MobileNetV2, ViT and MaxViT) through transfer learning based on two public datasets. The LungMaxVit, based on maxvit pre-trained with ImageNet 1K datasets, is a hybrid transformer with fine-tuning hyperparameters on the both X-ray datasets. The LungMaxVit outperforms all the four mentioned models, achieving a classification accuracy of 96.8%, AUC scores of 98.3%, and F1 scores of 96.7% on the COVID-19 dataset, while AUC scores of 93.2% and F1 scores of 70.7% on the Chest X-ray 14 dataset. The LungMaxVit distinguishes by its superior performance in terms of Accuracy, AUC and F1-score compared with other hybrids Networks. Several enhancement techniques, such as CLAHE, flipping and denoising, are employed to improve the classification performance of our study. The Grad-CAM visual technique is leveraged to represent the heat map of disease detection, explaining the consistency among clinical doctors and neural network models in the treatment of lung disease from Chest X-ray. The LungMaxVit shows the robust results and generalization in detecting multiple lung lesions and COVID-19 on Chest X-ray images.
PMID:39994381 | DOI:10.1038/s41598-025-90607-x
The intelligent fault identification method based on multi-source information fusion and deep learning
Sci Rep. 2025 Feb 24;15(1):6643. doi: 10.1038/s41598-025-90823-5.
ABSTRACT
Faults represent significant geological structures. Conventional fault identification methods pri-marily rely on the linear features of faults, achieved through the interpretation of remote sensing imagery (RSI). To more accurately enhance the morphological features of faults and achieve their rapid, precise, and intelligent identification, this paper employs a multi-source information fusion method. By analyzing and processing RSI, digital elevation model, and geological map data, the spectral, topographic, geomorphic, and structural features of faults are extracted. By training samples and applying fusion algorithms, the spectral, topographic, geomorphic, and structural features are integrated to enhance the morphological features information of faults. Ultimately, intelligent fault identification is realized through deep learning-based image recognition technology. First, 16 influencing factors are selected from the perspectives of spectral, topographic, geomorphic, and structural features. Second, the importance of each influencing factor is predicted using 4 machine learning methods. Finally, fault identification is carried out on the fault identification map, which is fused with multi-source feature information, using the Convolutional Neural Network Model. The study applies the method to the southern part of Jinzhai County, Lu'an City. The results indicate that among the machine learning methods, the classification and regression Trees model achieved an accuracy of 0.993, true positive rate of 0.988, F1-score of 0.994. Topographic position index(TPI), Valley line (VL), Surface cutting depth (SCD), and RSI all show high importance across the four machine learning models, indicating their crucial role in fault identification. For the Convolutional Neural Network model-based method, the Validation Accuracy(Val_Accuracy) was 0.990, F1-score was 0.736, and Validation Loss(Val_Loss) was 0.025, suggesting that this method can accurately identify faults in the study area.
PMID:39994344 | DOI:10.1038/s41598-025-90823-5
Optimizing depression detection in clinical doctor-patient interviews using a multi-instance learning framework
Sci Rep. 2025 Feb 24;15(1):6637. doi: 10.1038/s41598-025-90117-w.
ABSTRACT
In recent years, the number of people suffering from depression has gradually increased, and early detection is of great significance for the well-being of the public. However, the current methods for detecting depression are relatively limited, typically relying on the self-rating depression scale (SDS) and interviews. These methods are influenced by subjective or environmental factors. To improve the objectivity and efficiency of diagnosis, deep learning techniques have been applied to the field of automatic depression detection (ADD), providing a more accurate and objective approach. During interviews, transcribed interview data is one of the most commonly used modalities in ADD. However, previous studies have only utilized response texts or selected question-answer pairs, resulting in information redundancy and loss. This paper is the first to apply the multiple instance learning (MIL) framework to the field of textual interview data, aiming to overcome issues of inadequate text representation and ineffective information extraction in long texts. In the MIL framework, each instance undergoes an independent feature extraction process, ensuring that the local features of each instance are fully captured. This not only enhances the overall text representation capability but also alleviates the issue of sample imbalance in the dataset. Additionally, this paper improves upon previous aggregation strategies by introducing two hyper-parameters to accommodate the uncertainties in the field of text sentiment. An ensemble model of MT5 and RoBERTa (referred to as multi-MTRB) was constructed to extract features from each instance and output confidence scores indicating the presence of depressive information in the instances. Due to the unique design of the MIL framework, the proposed method is highly interpretable and is able to identify specific sentences that identify people from depressed patients, while introducing LIME techniques to provide more in-depth interpretation of negative instance sentences. This provides a promising approach for depression detection in the context of text interview data patterns. We evaluated the proposed method on DAIC-WOZ and E-DAIC datasets with excellent results. The F1 score is 0.88 on the DAIC-WOZ dataset and 0.86 on the E-DAIC dataset.
PMID:39994325 | DOI:10.1038/s41598-025-90117-w
An integrated CSPPC and BiLSTM framework for malicious URL detection
Sci Rep. 2025 Feb 24;15(1):6659. doi: 10.1038/s41598-025-91148-z.
ABSTRACT
With the rapid development of the internet, phishing attacks have become more diverse, making phishing website detection a key focus in cybersecurity. While machine learning and deep learning have led to various phishing URL detection methods, many remain incomplete, limiting accuracy. This paper proposes CSPPC-BiLSTM, a malicious URL detection model based on BiLSTM (Bidirectional Long Short-Term Memory, BiLSTM). The model processes URL character sequences through an embedding layer and captures contextual information via BiLSTM. By integrating CBAM (Convolutional Block Attention Module, CBAM), it applies channel and spatial attention to highlight key features and transforms URL sequence features into a spatial matrix. The SPP (Spatial Pyramid Pooling, SPP) module enables multi-scale pooling. Finally, a fully connected layer fuses features, and dropout regularization enhances robustness. Compared to CharBiLSTM, CSPPC-BiLSTM significantly improves detection accuracy. Evaluated on two datasets, Grambedding (balanced) and Mendeley AK Singh 2020 phish (imbalanced)-and compared with six baselines, it demonstrates strong generalization and accuracy. Ablation experiments confirm the critical role of CBAM and SPP in boosting performance.
PMID:39994324 | DOI:10.1038/s41598-025-91148-z
Super-resolution mapping of anisotropic tissue structure with diffusion MRI and deep learning
Sci Rep. 2025 Feb 24;15(1):6580. doi: 10.1038/s41598-025-90972-7.
ABSTRACT
Diffusion magnetic resonance imaging (diffusion MRI) is widely employed to probe the diffusive motion of water molecules within the tissue. Numerous diseases and processes affecting the central nervous system can be detected and monitored via diffusion MRI thanks to its sensitivity to microstructural alterations in tissue. The latter has prompted interest in quantitative mapping of the microstructural parameters, such as the fiber orientation distribution function (fODF), which is instrumental for noninvasively mapping the underlying axonal fiber tracts in white matter through a procedure known as tractography. However, such applications demand repeated acquisitions of MRI volumes with varied experimental parameters demanding long acquisition times and/or limited spatial resolution. In this work, we present a deep-learning-based approach for increasing the spatial resolution of diffusion MRI data in the form of fODFs obtained through constrained spherical deconvolution. The proposed approach is evaluated on high quality data from the Human Connectome Project, and is shown to generate upsampled results with a greater correspondence to ground truth high-resolution data than can be achieved with ordinary spline interpolation methods. Furthermore, we employ a measure based on the earth mover's distance to assess the accuracy of the upsampled fODFs. At low signal-to-noise ratios, our super-resolution method provides more accurate estimates of the fODF compared to data collected with 8 times smaller voxel volume.
PMID:39994322 | DOI:10.1038/s41598-025-90972-7
Explainability and uncertainty: Two sides of the same coin for enhancing the interpretability of deep learning models in healthcare
Int J Med Inform. 2025 Feb 21;197:105846. doi: 10.1016/j.ijmedinf.2025.105846. Online ahead of print.
ABSTRACT
BACKGROUND: The increasing use of Deep Learning (DL) in healthcare has highlighted the critical need for improved transparency and interpretability. While Explainable Artificial Intelligence (XAI) methods provide insights into model predictions, reliability cannot be guaranteed by simply relying on explanations.
OBJECTIVES: This position paper proposes the integration of Uncertainty Quantification (UQ) with XAI methods to improve model reliability and trustworthiness in healthcare applications.
METHODS: We examine state-of-the-art XAI and UQ techniques, discuss implementation challenges, and suggest solutions to combine UQ with XAI methods. We propose a framework for estimating both aleatoric and epistemic uncertainty in the XAI context, providing illustrative examples of their potential application.
RESULTS: Our analysis indicates that integrating UQ with XAI could significantly enhance the reliability of DL models in practice. This approach has the potential to reduce interpretation biases and over-reliance, leading to more cautious and conscious use of AI in healthcare.
PMID:39993336 | DOI:10.1016/j.ijmedinf.2025.105846
Deep learning to quantify the pace of brain aging in relation to neurocognitive changes
Proc Natl Acad Sci U S A. 2025 Mar 11;122(10):e2413442122. doi: 10.1073/pnas.2413442122. Epub 2025 Feb 24.
ABSTRACT
Brain age (BA), distinct from chronological age (CA), can be estimated from MRIs to evaluate neuroanatomic aging in cognitively normal (CN) individuals. BA, however, is a cross-sectional measure that summarizes cumulative neuroanatomic aging since birth. Thus, it conveys poorly recent or contemporaneous aging trends, which can be better quantified by the (temporal) pace P of brain aging. Many approaches to map P, however, rely on quantifying DNA methylation in whole-blood cells, which the blood-brain barrier separates from neural brain cells. We introduce a three-dimensional convolutional neural network (3D-CNN) to estimate P noninvasively from longitudinal MRI. Our longitudinal model (LM) is trained on MRIs from 2,055 CN adults, validated in 1,304 CN adults, and further applied to an independent cohort of 104 CN adults and 140 patients with Alzheimer's disease (AD). In its test set, the LM computes P with a mean absolute error (MAE) of 0.16 y (7% mean error). This significantly outperforms the most accurate cross-sectional model, whose MAE of 1.85 y has 83% error. By synergizing the LM with an interpretable CNN saliency approach, we map anatomic variations in regional brain aging rates that differ according to sex, decade of life, and neurocognitive status. LM estimates of P are significantly associated with changes in cognitive functioning across domains. This underscores the LM's ability to estimate P in a way that captures the relationship between neuroanatomic and neurocognitive aging. This research complements existing strategies for AD risk assessment that estimate individuals' rates of adverse cognitive change with age.
PMID:39993207 | DOI:10.1073/pnas.2413442122
Optimizing Bi-LSTM networks for improved lung cancer detection accuracy
PLoS One. 2025 Feb 24;20(2):e0316136. doi: 10.1371/journal.pone.0316136. eCollection 2025.
ABSTRACT
Lung cancer remains a leading cause of cancer-related deaths worldwide, with low survival rates often attributed to late-stage diagnosis. To address this critical health challenge, researchers have developed computer-aided diagnosis (CAD) systems that rely on feature extraction from medical images. However, accurately identifying the most informative image features for lung cancer detection remains a significant challenge. This study aimed to compare the effectiveness of both hand-crafted and deep learning-based approaches for lung cancer diagnosis. We employed traditional hand-crafted features, such as Gray Level Co-occurrence Matrix (GLCM) features, in conjunction with traditional machine learning algorithms. To explore the potential of deep learning, we also optimized and implemented a Bidirectional Long Short-Term Memory (Bi-LSTM) network for lung cancer detection. The results revealed that the highest performance using hand-crafted features was achieved by extracting GLCM features and utilizing Support Vector Machine (SVM) with different kernels, reaching an accuracy of 99.78% and an AUC of 0.999. However, the deep learning Bi-LSTM network surpassed both methods, achieving an accuracy of 99.89% and an AUC of 1.0000. These findings suggest that the proposed methodology, combining hand-crafted features and deep learning, holds significant promise for enhancing early lung cancer detection and ultimately improving diagnosis systems.
PMID:39992919 | DOI:10.1371/journal.pone.0316136
Evaluation of stroke sequelae and rehabilitation effect on brain tumor by neuroimaging technique: A comparative study
PLoS One. 2025 Feb 24;20(2):e0317193. doi: 10.1371/journal.pone.0317193. eCollection 2025.
ABSTRACT
This study aims at the limitations of traditional methods in the evaluation of stroke sequelae and rehabilitation effect monitoring, especially for the accurate identification and tracking of brain injury areas. To overcome these challenges, we introduce an advanced neuroimaging technology based on deep learning, the SWI-BITR-UNet model. This model, introduced as novel Machine Learning (ML) model, combines the SWIN Transformer's local receptive field and shift mechanism, and the effective feature fusion strategy in the U-Net architecture, aiming to improve the accuracy of brain lesion region segmentation in multimodal MRI scans. Through the application of a 3-D CNN encoder and decoder, as well as the integration of the CBAM attention module and jump connection, the model can finely capture and refine features, to achieve a level of segmentation accuracy comparable to that of manual segmentation by experts. This study introduces a 3D CNN encoder-decoder architecture specifically designed to enhance the processing capabilities of 3D medical imaging data. The development of the 3D CNN model utilizes the ADAM optimization algorithm to facilitate the training process. The Bra2020 dataset is utilized to assess the accuracy of the proposed deep learning neural network. By employing skip connections, the model effectively integrates the high-resolution features from the encoder with the up-sampling features from the decoder, thereby increasing the model's sensitivity to 3D spatial characteristics. To assess both the training and testing phases, the SWI-BITR-Unet model is trained using reliable datasets and evaluated through a comprehensive array of statistical metrics, including Recall (Rec), Precision (Pre), F1 test score, Kappa Coefficient (KC), mean Intersection over Union (mIoU), and Receiver Operating Characteristic-Area Under Curve (ROC-AUC). Furthermore, various machine learning models, such as Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Adaptive Boosting (AdaBoost), and K-Nearest Neighbor (KNN), have been employed to analyze tumor progression in the brain, with performance characterized by Hausdorff distance. In From the performance of ML models, the SWI-BITR-Unet model was more accurate than other models. Subsequently, regarding DICE coefficient values, the segmentation maps (annotation maps of brain tumor distributions) generated by the ML models indicated the models's capability to autonomously delineate areas such as the tumor core (TC) and the enhancing tumor (ET). Moreover, the efficacy of the proposed machine learning models demonstrated superiority over existing research in the field. The computational efficiency and the ability to handle long-distance dependencies of the model make it particularly suitable for applications in clinical Settings. The results showed that the SNA-BITR-UNet model can not only effectively identify and monitor the subtle changes in the stroke injury area, but also provided a new and efficient tool in the rehabilitation process, providing a scientific basis for developing personalized rehabilitation plans.
PMID:39992898 | DOI:10.1371/journal.pone.0317193