Deep learning
DenseSeg: joint learning for semantic segmentation and landmark detection using dense image-to-shape representation
Int J Comput Assist Radiol Surg. 2025 Jan 23. doi: 10.1007/s11548-024-03315-8. Online ahead of print.
ABSTRACT
PURPOSE: Semantic segmentation and landmark detection are fundamental tasks of medical image processing, facilitating further analysis of anatomical objects. Although deep learning-based pixel-wise classification has set a new-state-of-the-art for segmentation, it falls short in landmark detection, a strength of shape-based approaches.
METHODS: In this work, we propose a dense image-to-shape representation that enables the joint learning of landmarks and semantic segmentation by employing a fully convolutional architecture. Our method intuitively allows the extraction of arbitrary landmarks due to its representation of anatomical correspondences. We benchmark our method against the state-of-the-art for semantic segmentation (nnUNet), a shape-based approach employing geometric deep learning and a convolutional neural network-based method for landmark detection.
RESULTS: We evaluate our method on two medical datasets: one common benchmark featuring the lungs, heart, and clavicle from thorax X-rays, and another with 17 different bones in the paediatric wrist. While our method is on par with the landmark detection baseline in the thorax setting (error in mm of 2.6 ± 0.9 vs. 2.7 ± 0.9 ), it substantially surpassed it in the more complex wrist setting ( 1.1 ± 0.6 vs. 1.9 ± 0.5 ).
CONCLUSION: We demonstrate that dense geometric shape representation is beneficial for challenging landmark detection tasks and outperforms previous state-of-the-art using heatmap regression. While it does not require explicit training on the landmarks themselves, allowing for the addition of new landmarks without necessitating retraining.
PMID:39849288 | DOI:10.1007/s11548-024-03315-8
Performance of Radiomics-based machine learning and deep learning-based methods in the prediction of tumor grade in meningioma: a systematic review and meta-analysis
Neurosurg Rev. 2025 Jan 24;48(1):78. doi: 10.1007/s10143-025-03236-3.
ABSTRACT
Currently, the World Health Organization (WHO) grade of meningiomas is determined based on the biopsy results. Therefore, accurate non-invasive preoperative grading could significantly improve treatment planning and patient outcomes. Considering recent advances in machine learning (ML) and deep learning (DL), this meta-analysis aimed to evaluate the performance of these models in predicting the WHO meningioma grade using imaging data. A systematic search was performed in PubMed/MEDLINE, Embase, and the Cochrane Library for studies published up to April 1, 2024, and reporting the performance metrics of the ML models in predicting of WHO meningioma grade using imaging studies. Pooled area under the receiver operating characteristics curve (AUROC), specificity, and sensitivity were estimated. Subgroup and meta-regression analyses were performed based on a number of potential influencing variables. A total of 32 studies with 15,365 patients were included in the present study. The overall pooled sensitivity, specificity, and AUROC of ML methods for prediction of tumor grade in meningioma were 85% (95% CI, 79-89%), 87% (95% CI, 81-91%), and 93% (95% CI, 90-95%), respectively. Both the type of validation and study cohort (training or test) were significantly associated with model performance. However, no significant association was found between the sample size or the type of ML method and model performance. The ML predictive models show a high overall performance in predicting the WHO meningioma grade using imaging data. Further studies on the performance of DL algorithms in larger datasets using external validation are needed.
PMID:39849257 | DOI:10.1007/s10143-025-03236-3
Electrophysiological biomarkers based on CISANET characterize illness severity and suicidal ideation among patients with major depressive disorder
Med Biol Eng Comput. 2025 Jan 24. doi: 10.1007/s11517-024-03279-6. Online ahead of print.
ABSTRACT
Major depressive disorder (MDD) is a significant neurological disorder that imposes a substantial burden on society, characterized by its high recurrence rate and associated suicide risk. Clinical diagnosis, which relies on interviews with psychiatrists and questionnaires used as auxiliary diagnostic tools, lacks precision and objectivity in diagnosing MDD. To address these challenges, this study proposes an assessment method based on EEG. It involves calculating the phase lag index (PLI) in alpha and gamma bands to construct functional brain connectivity. This method aims to find biomarkers to assess the severity of MDD and suicidal ideation. The convolutional inception with shuffled attention network (CISANET) was introduced for this purpose. The study included 61 patients with MDD, who were classified into mild, moderate, and severe levels based on depression scales, and the presence of suicidal ideation was evaluated. Two paradigms were designed for the study, with EEG analysis focusing on 32 selected electrodes to extract alpha and gamma bands. In the gamma band, the classification accuracy reached 77.37% in the visual paradigm and 80.12% in the auditory paradigm. The average accuracy in classifying suicidal ideation was 93.60%. The findings suggest that gamma bands can be used as potential biomarkers differentiating illness severity and identifying suicidal ideation of MDD, and that objective assessment methods can effectively assess MDD The objective assessment method can effectively assess the severity of MDD and identify suicidal ideation of MDD patients, which provides a valuable theoretical basis for understanding the biological characteristics of MDD.
PMID:39849234 | DOI:10.1007/s11517-024-03279-6
Deep Convolutional Neural Networks on Multiclass Classification of Three-Dimensional Brain Images for Parkinson's Disease Stage Prediction
J Imaging Inform Med. 2025 Jan 23. doi: 10.1007/s10278-025-01402-z. Online ahead of print.
ABSTRACT
Parkinson's disease (PD), a degenerative disorder of the central nervous system, is commonly diagnosed using functional medical imaging techniques such as single-photon emission computed tomography (SPECT). In this study, we utilized two SPECT data sets (n = 634 and n = 202) from different hospitals to develop a model capable of accurately predicting PD stages, a multiclass classification task. We used the entire three-dimensional (3D) brain images as input and experimented with various model architectures. Initially, we treated the 3D images as sequences of two-dimensional (2D) slices and fed them sequentially into 2D convolutional neural network (CNN) models pretrained on ImageNet, averaging the outputs to obtain the final predicted stage. We also applied 3D CNN models pretrained on Kinetics-400. Additionally, we incorporated an attention mechanism to account for the varying importance of different slices in the prediction process. To further enhance model efficacy and robustness, we simultaneously trained the two data sets using weight sharing, a technique known as cotraining. Our results demonstrated that 2D models pretrained on ImageNet outperformed 3D models pretrained on Kinetics-400, and models utilizing the attention mechanism outperformed both 2D and 3D models. The cotraining technique proved effective in improving model performance when the cotraining data sets were sufficiently large.
PMID:39849204 | DOI:10.1007/s10278-025-01402-z
Wound Segmentation with U-Net Using a Dual Attention Mechanism and Transfer Learning
J Imaging Inform Med. 2025 Jan 23. doi: 10.1007/s10278-025-01386-w. Online ahead of print.
ABSTRACT
Accurate wound segmentation is crucial for the precise diagnosis and treatment of various skin conditions through image analysis. In this paper, we introduce a novel dual attention U-Net model designed for precise wound segmentation. Our proposed architecture integrates two widely used deep learning models, VGG16 and U-Net, incorporating dual attention mechanisms to focus on relevant regions within the wound area. Initially trained on diabetic foot ulcer images, we fine-tuned the model to acute and chronic wound images and conducted a comprehensive comparison with other state-of-the-art models. The results highlight the superior performance of our proposed dual attention model, achieving a Dice coefficient and IoU of 94.1% and 89.3%, respectively, on the test set. This underscores the robustness of our method and its capacity to generalize effectively to new data.
PMID:39849203 | DOI:10.1007/s10278-025-01386-w
Deep Learning-Based Multi-View Projection Synthesis Approach for Improving the Quality of Sparse-View CBCT in Image-Guided Radiotherapy
J Imaging Inform Med. 2025 Jan 23. doi: 10.1007/s10278-025-01390-0. Online ahead of print.
ABSTRACT
While radiation hazards induced by cone-beam computed tomography (CBCT) in image-guided radiotherapy (IGRT) can be reduced by sparse-view sampling, the image quality is inevitably degraded. We propose a deep learning-based multi-view projection synthesis (DLMPS) approach to improve the quality of sparse-view low-dose CBCT images. In the proposed DLMPS approach, linear interpolation was first applied to sparse-view projections and the projections were rearranged into sinograms; these sinograms were processed with a sinogram restoration model and then rearranged back into projections. The sinogram restoration model was modified from the 2D U-Net by incorporating dynamic convolutional layers and residual learning techniques. The DLMPS approach was trained, validated, and tested on CBCT data from 163, 30, and 30 real patients respectively. Sparse-view projection datasets with 1/4 and 1/8 of the original sampling rate were simulated, and the corresponding full-view projection datasets were restored via the DLMPS approach. Tomographic images were reconstructed using the Feldkamp-Davis-Kress algorithm. Quantitative metrics including root-mean-square error (RMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and feature similarity (FSIM) were calculated in both the projection and image domains to evaluate the performance of the DLMPS approach. The DLMPS approach was compared with 11 state-of-the-art (SOTA) models, including CNN and Transformer architectures. For 1/4 sparse-view reconstruction task, the proposed DLMPS approach achieved averaged RMSE, PSNR, SSIM, and FSIM values of 0.0271, 45.93 dB, 0.9817, and 0.9587 in the projection domain, and 0.000885, 37.63 dB, 0.9074, and 0.9885 in the image domain, respectively. For 1/8 sparse-view reconstruction task, the DLMPS approach achieved averaged RMSE, PSNR, SSIM, and FSIM values of 0.0304, 44.85 dB, 0.9785, and 0.9524 in the projection domain, and 0.001057, 36.05 dB, 0.8786, and 0.9774 in the image domain, respectively. The DLMPS approach outperformed all the 11 SOTA models in both the projection and image domains for 1/4 and 1/8 sparse-view reconstruction tasks. The proposed DLMPS approach effectively improves the quality of sparse-view CBCT images in IGRT by accurately synthesizing missing projections, exhibiting potential in substantially reducing imaging dose to patients with minimal loss of image quality.
PMID:39849201 | DOI:10.1007/s10278-025-01390-0
Mapping the topography of spatial gene expression with interpretable deep learning
Nat Methods. 2025 Jan 23. doi: 10.1038/s41592-024-02503-3. Online ahead of print.
ABSTRACT
Spatially resolved transcriptomics technologies provide high-throughput measurements of gene expression in a tissue slice, but the sparsity of these data complicates analysis of spatial gene expression patterns. We address this issue by deriving a topographic map of a tissue slice-analogous to a map of elevation in a landscape-using a quantity called the isodepth. Contours of constant isodepths enclose domains with distinct cell type composition, while gradients of the isodepth indicate spatial directions of maximum change in expression. We develop GASTON (gradient analysis of spatial transcriptomics organization with neural networks), an unsupervised and interpretable deep learning algorithm that simultaneously learns the isodepth, spatial gradients and piecewise linear expression functions that model both continuous gradients and discontinuous variation in gene expression. We show that GASTON accurately identifies spatial domains and marker genes across several tissues, gradients of neuronal differentiation and firing in the brain, and gradients of metabolism and immune activity in the tumor microenvironment.
PMID:39849132 | DOI:10.1038/s41592-024-02503-3
Swin-transformer for weak feature matching
Sci Rep. 2025 Jan 23;15(1):2961. doi: 10.1038/s41598-025-87309-9.
ABSTRACT
Feature matching in computer vision is crucial but challenging in weakly textured scenes due to the lack of pattern repetition. We introduce the SwinMatcher feature matching method, aimed at addressing the issues of low matching quantity and poor matching precision in weakly textured scenes. Given the inherently significant local characteristics of image features, we employ a local self-attention mechanism to learn from weakly textured areas, maximally preserving the features of weak textures. To address the issue of incorrect matches in scenes with repetitive patterns, we use a cross-attention and positional encoding mechanism to learn the correct matches of repetitive patterns in two scenes, achieving higher matching precision. We also introduce a matching optimization algorithm that calculates the spatial expected coordinates of local two-dimensional heat maps of correspondences to obtain the final sub-pixel level matches. Experiments indicate that, under identical training conditions, the SwinMatcher outperforms other standard methods in pose estimation, homography estimation, and visual localization. It exhibits strong robustness and superior matching in weakly textured areas, offering a new research direction for feature matching in weakly textured images.
PMID:39849068 | DOI:10.1038/s41598-025-87309-9
Real-time detection and monitoring of public littering behavior using deep learning for a sustainable environment
Sci Rep. 2025 Jan 23;15(1):3000. doi: 10.1038/s41598-024-77118-x.
ABSTRACT
With the global population surpassing 8 billion, waste production has skyrocketed, leading to increased pollution that adversely affects both terrestrial and marine ecosystems. Public littering, a significant contributor to this pollution, poses severe threats to marine life due to plastic debris, which can inflict substantial ecological harm. Additionally, this pollution jeopardizes human health through contaminated food and water sources. Given the annual global plastic consumption of approximately 475 million tons and the pervasive issue of public littering, addressing this challenge has become critically urgent. The Surveillance and Waste Notification (SAWN) system presents an innovative solution to combat public littering. Leveraging surveillance cameras and advanced computer vision technology, SAWN aims to identify and reduce instances of littering. Our study explores the use of the MoViNet video classification model to detect littering activities by vehicles and pedestrians, alongside the YOLOv8 object detection model to identify individuals responsible through facial recognition and license plate detection. Collecting appropriate data for littering detection presented significant challenges due to its unavailability. Consequently, project members simulated real-life littering scenarios to gather the required data. This dataset was then used to train different models, including LRCN, CNN-RNN, and MoViNets. After extensive testing, MoViNets demonstrated the most promising results. Through a series of experiments, we progressively improved the model's performance, achieving accuracy rates of 93.42% in the first experiment, 95.53% in the second, and ultimately reaching 99.5% in the third experiment. To detect violators' identities, we employed YOLOv8, trained on the KSA vehicle plate dataset, achieving 99.5% accuracy. For face detection, we utilized the Haar Cascade from the OpenCV library, known for its real-time performance. Our findings will be used to further enhance littering behavior detection in future developments.
PMID:39848984 | DOI:10.1038/s41598-024-77118-x
A novel domain feature disentanglement method for multi-target cross-domain mechanical fault diagnosis
ISA Trans. 2025 Jan 13:S0019-0578(25)00013-8. doi: 10.1016/j.isatra.2025.01.012. Online ahead of print.
ABSTRACT
Existing cross-domain mechanical fault diagnosis methods primarily achieve feature alignment by directly optimizing interdomain and category distances. However, this approach can be computationally expensive in multi-target scenarios or fail due to conflicting objectives, leading to decreased diagnostic performance. To avoid these issues, this paper introduces a novel method called domain feature disentanglement. The key to the proposed method lies in computing domain features and embedding domain similarity into neural networks to assist in extracting cross-domain invariant features. Specifically, the neural network architecture designed based on information theory can disentangle key features from multiple entangled latent variables. It employs the concept of contrastive learning to extract domain-relevant information from each data point and uses the Wasserstein distance to determine the similarity relationships across all domains. By informing the neural network of domain similarity relationships, it learns how to extract cross-domain invariant features through adversarial learning Eight multi-target domain adaptation tasks were set up on two public datasets, and the proposed method achieved an average diagnostic accuracy of 96.82%, surpassing six other advanced domain adaptation methods, demonstrating its superiority.
PMID:39848906 | DOI:10.1016/j.isatra.2025.01.012
Retraction Note: Early diagnosis of COVID-19-affected patients based on X-ray and computed tomography images using deep learning algorithm
Soft comput. 2024;28(Suppl 1):67. doi: 10.1007/s00500-024-09993-5. Epub 2024 Jul 22.
ABSTRACT
[This retracts the article DOI: 10.1007/s00500-020-05275-y.].
PMID:39847670 | PMC:PMC11753128 | DOI:10.1007/s00500-024-09993-5
Retraction Note: Performance evaluation of deep learning techniques for lung cancer prediction
Soft comput. 2024;28(Suppl 1):295. doi: 10.1007/s00500-024-10107-4. Epub 2024 Aug 27.
ABSTRACT
[This retracts the article DOI: 10.1007/s00500-023-08313-7.].
PMID:39847665 | PMC:PMC11753125 | DOI:10.1007/s00500-024-10107-4
Retraction Note: COVID-CheXNet: hybrid deep learning framework for identifying COVID-19 virus in chest X-rays images
Soft comput. 2024;28(Suppl 1):65. doi: 10.1007/s00500-024-09992-6. Epub 2024 Jul 22.
ABSTRACT
[This retracts the article DOI: 10.1007/s00500-020-05424-3.].
PMID:39847664 | PMC:PMC11753127 | DOI:10.1007/s00500-024-09992-6
Flexible Tail of Antimicrobial Peptide PGLa Facilitates Water Pore Formation in Membranes
J Phys Chem B. 2025 Jan 23. doi: 10.1021/acs.jpcb.4c06190. Online ahead of print.
ABSTRACT
PGLa, an antimicrobial peptide (AMP), primarily exerts its antibacterial effects by disrupting bacterial cell membrane integrity. Previous theoretical studies mainly focused on the binding mechanism of PGLa with membranes, while the mechanism of water pore formation induced by PGLa peptides, especially the role of structural flexibility in the process, remains unclear. In this study, using all-atom simulations, we investigated the entire process of membrane deformation caused by the interaction of PGLa with an anionic cell membrane composed of dimyristoylphosphatidylcholine (DMPC) and dimyristoylphosphatidylglycerol (DMPG). Using a deep learning-based key intermediate identification algorithm, we found that the C-terminal tail plays a crucial role for PGLa insertion into the membrane, and that with its assistance, a variety of water pores formed inside the membrane. Mutation of the tail residues revealed that, in addition to electrostatic and hydrophobic interactions, the flexibility of the tail residues is crucial for peptide insertion and pore formation. The full extension of these flexible residues enhances peptide-peptide and peptide-membrane interactions, guiding the transmembrane movement of PGLa and the aggregation of PGLa monomers within the membrane, ultimately leading to the formation of water-filled pores in the membrane. Overall, this study provides a deep understanding of the transmembrane mechanism of PGLa and similar AMPs, particularly elucidating for the first time the importance of C-terminal flexibility in both insertion and oligomerization processes.
PMID:39847609 | DOI:10.1021/acs.jpcb.4c06190
Evaluating Machine Learning and Deep Learning models for predicting Wind Turbine power output from environmental factors
PLoS One. 2025 Jan 23;20(1):e0317619. doi: 10.1371/journal.pone.0317619. eCollection 2025.
ABSTRACT
This study presents a comprehensive comparative analysis of Machine Learning (ML) and Deep Learning (DL) models for predicting Wind Turbine (WT) power output based on environmental variables such as temperature, humidity, wind speed, and wind direction. Along with Artificial Neural Network (ANN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), and Convolutional Neural Network (CNN), the following ML models were looked at: Linear Regression (LR), Support Vector Regressor (SVR), Random Forest (RF), Extra Trees (ET), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Using a dataset of 40,000 observations, the models were assessed based on R-squared, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). ET achieved the highest performance among ML models, with an R-squared value of 0.7231 and a RMSE of 0.1512. Among DL models, ANN demonstrated the best performance, achieving an R-squared value of 0.7248 and a RMSE of 0.1516. The results show that DL models, especially ANN, did slightly better than the best ML models. This means that they are better at modeling non-linear dependencies in multivariate data. Preprocessing techniques, including feature scaling and parameter tuning, improved model performance by enhancing data consistency and optimizing hyperparameters. When compared to previous benchmarks, the performance of both ANN and ET demonstrates significant predictive accuracy gains in WT power output forecasting. This study's novelty lies in directly comparing a diverse range of ML and DL algorithms while highlighting the potential of advanced computational approaches for renewable energy optimization.
PMID:39847588 | DOI:10.1371/journal.pone.0317619
The tumour histopathology "glossary" for AI developers
PLoS Comput Biol. 2025 Jan 23;21(1):e1012708. doi: 10.1371/journal.pcbi.1012708. eCollection 2025 Jan.
ABSTRACT
The applications of artificial intelligence (AI) and deep learning (DL) are leading to significant advances in cancer research, particularly in analysing histopathology images for prognostic and treatment-predictive insights. However, effective translation of these computational methods requires computational researchers to have at least a basic understanding of histopathology. In this work, we aim to bridge that gap by introducing essential histopathology concepts to support AI developers in their research. We cover the defining features of key cell types, including epithelial, stromal, and immune cells. The concepts of malignancy, precursor lesions, and the tumour microenvironment (TME) are discussed and illustrated. To enhance understanding, we also introduce foundational histopathology techniques, such as conventional staining with hematoxylin and eosin (HE), antibody staining by immunohistochemistry, and including the new multiplexed antibody staining methods. By providing this essential knowledge to the computational community, we aim to accelerate the development of AI algorithms for cancer research.
PMID:39847582 | DOI:10.1371/journal.pcbi.1012708
Correction: Secure deep learning for distributed data against malicious central server
PLoS One. 2025 Jan 23;20(1):e0318164. doi: 10.1371/journal.pone.0318164. eCollection 2025.
ABSTRACT
[This corrects the article DOI: 10.1371/journal.pone.0272423.].
PMID:39847555 | DOI:10.1371/journal.pone.0318164
Predicting transcriptional changes induced by molecules with MiTCP
Brief Bioinform. 2024 Nov 22;26(1):bbaf006. doi: 10.1093/bib/bbaf006.
ABSTRACT
Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.
PMID:39847444 | DOI:10.1093/bib/bbaf006
Noninvasive Anemia Detection and Hemoglobin Estimation from Retinal Images Using Deep Learning: A Scalable Solution for Resource-Limited Settings
Transl Vis Sci Technol. 2025 Jan 2;14(1):20. doi: 10.1167/tvst.14.1.20.
ABSTRACT
PURPOSE: The purpose of this study was to develop and validate a deep-learning model for noninvasive anemia detection, hemoglobin (Hb) level estimation, and identification of anemia-related retinal features using fundus images.
METHODS: The dataset included 2265 participants aged 40 years and above from a population-based study in South India. The dataset included ocular and systemic clinical parameters, dilated retinal fundus images, and hematological data such as complete blood counts and Hb concentration levels. Eighty percent of the dataset was used for algorithm development and 20% for validation. A deep-convolutional neural network, utilizing VGG16, ResNet50, and InceptionV3 architectures, was trained to predict anemia and estimate Hb levels. Sensitivity, specificity, and accuracy were calculated, and receiver operating characteristic (ROC) curves were generated for comparison with clinical anemia data. GradCAM saliency maps highlighted regions linked to anemia and image processing techniques to quantify anemia-related features.
RESULTS: For predicting anemia, the InceptionV3 model demonstrated the best performance, achieving 98% accuracy, 99% sensitivity, 97% specificity, and an area under the curve (AUC) of 0.98 (95% confidence interval [CI] = 0.97-0.99). For estimating Hb levels, the mean absolute error for the InceptionV3 model was 0.58 g/dL (95% CI = 0.57-0.59 g/dL). The model focused on the area around the optic disc and the neighboring retinal vessels, revealing that anemic subjects exhibited significantly increased vessel tortuosity and reduced vessel density (P < 0.001), with variable effects on vessel thickness.
CONCLUSIONS: The InceptionV3 model accurately predicted anemia and Hb levels, highlighting the potential of deep learning and vessel analysis for noninvasive anemia detection.
TRANSLATIONAL RELEVANCE: The proposed method offers the possibility to quantitatively predict hematological parameters in a noninvasive manner.
PMID:39847377 | DOI:10.1167/tvst.14.1.20
Explainable Deep Learning for Glaucomatous Visual Field Prediction: Artifact Correction Enhances Transformer Models
Transl Vis Sci Technol. 2025 Jan 2;14(1):22. doi: 10.1167/tvst.14.1.22.
ABSTRACT
PURPOSE: The purpose of this study was to develop a deep learning approach that restores artifact-laden optical coherence tomography (OCT) scans and predicts functional loss on the 24-2 Humphrey Visual Field (HVF) test.
METHODS: This cross-sectional, retrospective study used 1674 visual field (VF)-OCT pairs from 951 eyes for training and 429 pairs from 345 eyes for testing. Peripapillary retinal nerve fiber layer (RNFL) thickness map artifacts were corrected using a generative diffusion model. Three convolutional neural networks and 2 transformer-based models were trained on original and artifact-corrected datasets to estimate 54 sensitivity thresholds of the 24-2 HVF test.
RESULTS: Predictive performances were calculated using root mean square error (RMSE) and mean absolute error (MAE), with explainability evaluated through GradCAM, attention maps, and dimensionality reduction techniques. The Distillation with No Labels (DINO) Vision Transformers (ViT) trained on artifact-corrected datasets achieved the highest accuracy (RMSE, 95% confidence interval [CI] = 4.44, 95% CI = 4.07, 4.82 decibel [dB], MAE = 3.46, 95% CI = 3.14, 3.79 dB), and the greatest interpretability, showing improvements of 0.15 dB in global RMSE and MAE (P < 0.05) compared to the performance on original maps. Feature maps and visualization tools indicate that artifacts compromise DINO-ViT's predictive ability but improve with artifact correction.
CONCLUSIONS: Combining self-supervised ViTs with generative artifact correction enhances the correlation between glaucomatous structures and functions.
TRANSLATIONAL RELEVANCE: Our approach offers a comprehensive tool for glaucoma management, facilitates the exploration of structure-function correlations in research, and underscores the importance of addressing artifacts in the clinical interpretation of OCT.
PMID:39847375 | DOI:10.1167/tvst.14.1.22