Deep learning
Nobel Prize in physics 2024 : John J. Hopfield and Geoffrey E. Hinton. From Hopfield and Hinton to AlphaFold: The 2024 Nobel Prize honors the pioneers of deep learning
Med Sci (Paris). 2025 Mar;41(3):277-280. doi: 10.1051/medsci/2025036. Epub 2025 Mar 21.
ABSTRACT
On October 8, 2024, the Nobel Prize in Physics was awarded to John J. Hopfield, professor at Princeton University, and Geoffrey E. Hinton, professor at the University of Toronto, for their "fundamental discoveries that made possible machine learning through artificial neural networks." According to the Nobel committee, John Hopfield designed an associative memory capable of storing and reconstructing images, while Geoffrey Hinton developed a method enabling tasks such as identifying specific elements within images. This article retraces the career paths of these two researchers and highlights their pioneering contributions.
PMID:40117553 | DOI:10.1051/medsci/2025036
Correction for Quach et al., Deep learning-driven bacterial cytological profiling to determine antimicrobial mechanisms in Mycobacterium tuberculosis
Proc Natl Acad Sci U S A. 2025 Apr;122(13):e2504475122. doi: 10.1073/pnas.2504475122. Epub 2025 Mar 21.
NO ABSTRACT
PMID:40117323 | DOI:10.1073/pnas.2504475122
Unveiling CNS cell morphology with deep learning: A gateway to anti-inflammatory compound screening
PLoS One. 2025 Mar 21;20(3):e0320204. doi: 10.1371/journal.pone.0320204. eCollection 2025.
ABSTRACT
Deciphering the complex relationships between cellular morphology and phenotypic manifestations is crucial for understanding cell behavior, particularly in the context of neuropathological states. Despite its importance, the application of advanced image analysis methodologies to central nervous system (CNS) cells, including neuronal and glial cells, has been limited. Furthermore, cutting-edge techniques in the field of cell image analysis, such as deep learning (DL), still face challenges, including the requirement for large amounts of labeled data, difficulty in detecting subtle cellular changes, and the presence of batch effects. Our study addresses these shortcomings in the context of neuroinflammation. Using our in-house data and a DL-based approach, we have effectively analyzed the morphological phenotypes of neuronal and microglial cells, both in pathological conditions and following pharmaceutical interventions. This innovative method enhances our understanding of neuroinflammation and streamlines the process for screening potential therapeutic compounds, bridging a gap in neuropathological research and pharmaceutical development.
PMID:40117300 | DOI:10.1371/journal.pone.0320204
Optimizing deep learning models for glaucoma screening with vision transformers for resource efficiency and the pie augmentation method
PLoS One. 2025 Mar 21;20(3):e0314111. doi: 10.1371/journal.pone.0314111. eCollection 2025.
ABSTRACT
Glaucoma is the leading cause of irreversible vision impairment, emphasizing the critical need for early detection. Typically, AI-based glaucoma screening relies on fundus imaging. To tackle the resource and time challenges in glaucoma screening with convolutional neural network (CNN), we chose the Data-efficient image Transformers (DeiT), a vision transformer, known for its reduced computational demands, with preprocessing time decreased by a factor of 10. Our approach utilized the meticulously annotated GlauCUTU-DATA dataset, curated by ophthalmologists through consensus, encompassing both unanimous agreement (3/3) and majority agreement (2/3) data. However, DeiT's performance was initially lower than CNN. Therefore, we introduced the "pie method," an augmentation method aligned with the ISNT rule. Along with employing polar transformation to improved cup region visibility and alignment with the vision transformer's input to elevated performance levels. The classification results demonstrated improvements comparable to CNN. Using the 3/3 data, excluding the superior and nasal regions, especially in glaucoma suspects, sensitivity increased by 40.18% from 47.06% to 88.24%. The average area under the curve (AUC) ± standard deviation (SD) for glaucoma, glaucoma suspects, and no glaucoma were 92.63 ± 4.39%, 92.35 ± 4.39%, and 92.32 ± 1.45%, respectively. With the 2/3 data, excluding the superior and temporal regions, sensitivity for diagnosing glaucoma increased by 11.36% from 47.73% to 59.09%. The average AUC ± SD for glaucoma, glaucoma suspects, and no glaucoma were 68.22 ± 4.45%, 68.23 ± 4.39%, and 73.09 ± 3.05%, respectively. For both datasets, the AUC values for glaucoma, glaucoma suspects, and no glaucoma were 84.53%, 84.54%, and 91.05%, respectively, which approach the performance of a CNN model that achieved 84.70%, 84.69%, and 93.19%, respectively. Moreover, the incorporation of attention maps from DeiT facilitated the precise localization of clinically significant areas, such as the disc rim and notching, thereby enhancing the overall effectiveness of glaucoma screening.
PMID:40117284 | DOI:10.1371/journal.pone.0314111
A Deep-Learning Empowered, Real-Time Processing Platform of fNIRS/DOT for Brain Computer Interfaces and Neurofeedback
IEEE Trans Neural Syst Rehabil Eng. 2025 Mar 21;PP. doi: 10.1109/TNSRE.2025.3553794. Online ahead of print.
ABSTRACT
Brain-Computer Interfaces (BCI) and Neurofeedback (NFB) approaches, which both rely on real-time monitoring of brain activity, are increasingly being applied in rehabilitation, assistive technology, neurological diseases and behavioral disorders. Functional near-infrared spectroscopy (fNIRS) and diffuse optical tomography (DOT) are promising techniques for these applications due to their non-invasiveness, portability, low cost, and relatively high spatial resolution. However, real-time processing of fNIRS/DOT data remains a significant challenge as it requires establishing a baseline of the measurement, simultaneously performing real-time motion artifact (MA) correction across all channels, and (in the case of DOT) addressing the time-consuming process of image reconstruction. This study proposes a real-time processing system for fNIRS/DOT that integrates baseline calibration, denoising autoencoder (DAE) based MA correction model with a sliding window strategy, and a pre-calculated inverse Jacobian matrix to streamline the reconstructed 3D brain hemodynamics. The DAE model was trained on an extensive whole-head high-density DOT (HD-DOT) dataset and tested on separate motor imagery dataset augmented with artificial MA. The system demonstrated the capability to simultaneously process approximately 750 channels in real-time. Our results show that the DAE-based MA correction method outperformed traditional MA correction in terms of mean squared error and correlation to the known MA-free data while maintaining low latency, which is critical for effective BCI and NFB applications. The system's high-channel, real-time processing capability provides channel-wise oxygenation information and functional 3D imaging, making it well-suited for fNIRS/DOT applications in BCI and NFB, particularly in movement-intensive scenarios such as motor rehabilitation and assistive technology for mobility support.
PMID:40117159 | DOI:10.1109/TNSRE.2025.3553794
GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator
IEEE Trans Pattern Anal Mach Intell. 2025 Mar 21;PP. doi: 10.1109/TPAMI.2025.3553485. Online ahead of print.
ABSTRACT
6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. Given that direct pose regression networks currently exhibit suboptimal performance, most methods still resort to traditional techniques to varying degrees. For example, top-performing methods often adopt an indirect strategy by first establishing 2D-3D or 3D-3D correspondences followed by applying the RANSAC-based P P or Kabsch algorithms, and further employing ICP for refinement. Despite the performance enhancement, the integration of traditional techniques makes the networks time-consuming and not end-to-end trainable. Orthogonal to them, this paper introduces a fully learning-based object pose estimator. In this work, we first perform an in-depth investigation of both direct and indirect methods and propose a simple yet effective Geometry-guided Direct Regression Network (GDRN) to learn the 6D pose from monocular images in an end-to-end manner. Afterwards, we introduce a geometry-guided pose refinement module, enhancing pose accuracy when extra depth data is available. Guided by the predicted coordinate map, we build an end-to-end differentiable architecture that establishes robust and accurate 3D-3D correspondences between the observed and rendered RGB-D images to refine the pose. Our enhanced pose estimation pipeline GDRNPP (GDRN Plus Plus) conquered the leaderboard of the BOP Challenge for two consecutive years, becoming the first to surpass all prior methods that relied on traditional techniques in both accuracy and speed. The code and models are available at https://github.com/shanice-l/gdrnpp_bop2022.
PMID:40117145 | DOI:10.1109/TPAMI.2025.3553485
Impact of Noisy Supervision in Foundation Model Learning
IEEE Trans Pattern Anal Mach Intell. 2025 Mar 21;PP. doi: 10.1109/TPAMI.2025.3552309. Online ahead of print.
ABSTRACT
Foundation models are usually pre-trained on large-scale datasets and then adapted to different downstream tasks through tuning. This pre-training and then fine-tuning paradigm has become a standard practice in deep learning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing distributions are significantly different. These observations are agnostic to scales of pre-training datasets, pre-training noise types, model architectures, pre-training objectives, downstream tuning methods, and downstream applications. We empirically ascertain that the reason behind this is that the pre-training noise shapes the feature space differently. We then propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization, which is applicable in both parameter-efficient and black-box tuning manners, considering one may not be able to access or fully fine-tune the pre-trained models. We additionally conduct extensive experiments on popular vision and language models, including APIs, which are supervised and self-supervised pre-trained on realistic noisy data for evaluation. Our analysis and results demonstrate the importance of this novel and fundamental research direction, which we term as Noisy Model Transfer Learning.
PMID:40117144 | DOI:10.1109/TPAMI.2025.3552309
Forecasting the concentration of the components of the particulate matter in Poland using neural networks
Environ Sci Pollut Res Int. 2025 Mar 21. doi: 10.1007/s11356-025-36265-y. Online ahead of print.
ABSTRACT
Air pollution is a significant global challenge with profound impacts on human health and the environment. Elevated concentrations of various air pollutants contribute to numerous premature deaths each year. In Europe, and particularly in Poland, air quality remains a critical concern due to pollutants such as particulate matter (PM), which pose serious risks to public health and ecological systems. Effective control of PM emissions and accurate forecasting of their concentrations are essential for improving air quality and supporting public health interventions. This paper presents four advanced deep learning-based forecasting methods: extended long short-term memory network (xLSTM), Kolmogorov-Arnold network (KAN), temporal convolutional network (TCN), and variational autoencoder (VAE). Using data from eight cities in Poland, we evaluate our methods' ability to predict particulate matter concentrations through extensive experiments, utilizing statistical hypothesis testing and error metrics such as mean absolute error (MAE) and root mean square error (RMSE). Our findings demonstrate that these methods achieve high prediction accuracy, significantly outperforming several state-of-the-art algorithms. The proposed forecasting framework offers practical applications for policymakers and public health officials by enabling timely interventions to decrease pollution impacts and enhance urban air quality management.
PMID:40117111 | DOI:10.1007/s11356-025-36265-y
Machine-learning models for Alzheimer's disease diagnosis using neuroimaging data: survey, reproducibility, and generalizability evaluation
Brain Inform. 2025 Mar 21;12(1):8. doi: 10.1186/s40708-025-00252-3.
ABSTRACT
Clinical diagnosis of Alzheimer's disease (AD) is usually made after symptoms such as short-term memory loss are exhibited, which minimizes the intervention and treatment options. The existing screening techniques cannot distinguish between stable MCI (sMCI) cases (i.e., patients who do not convert to AD for at least three years) and progressive MCI (pMCI) cases (i.e., patients who convert to AD in three years or sooner). Delayed diagnosis of AD also disproportionately affects underrepresented and socioeconomically disadvantaged populations. The significant positive impact of an early diagnosis solution for AD across diverse ethno-racial and demographic groups is well-known and recognized. While advancements in high-throughput technologies have enabled the generation of vast amounts of multimodal clinical, and neuroimaging datasets related to AD, most methods utilizing these data sets for diagnostic purposes have not found their way in clinical settings. To better understand the landscape, we surveyed the major preprocessing, data management, traditional machine-learning (ML), and deep learning (DL) techniques used for diagnosing AD using neuroimaging data such as structural magnetic resonance imaging (sMRI), functional magnetic resonance imaging (fMRI), and positron emission tomography (PET). Once we had a good understanding of the methods available, we conducted a study to assess the reproducibility and generalizability of open-source ML models. Our evaluation shows that existing models show reduced generalizability when different cohorts of the data modality are used while controlling other computational factors. The paper concludes with a discussion of major challenges that plague ML models for AD diagnosis and biomarker discovery.
PMID:40117001 | DOI:10.1186/s40708-025-00252-3
PCANN Program for Structure-Based Prediction of Protein-Protein Binding Affinity: Comparison With Other Neural-Network Predictors
Proteins. 2025 Mar 21. doi: 10.1002/prot.26821. Online ahead of print.
ABSTRACT
In this communication, we introduce a new structure-based affinity predictor for protein-protein complexes. This predictor, dubbed PCANN (Protein Complex Affinity by Neural Network), uses the ESM-2 language model to encode the information about protein binding interfaces and graph attention network (GAT) to parlay this information into K d $$ {K}_{\mathrm{d}} $$ predictions. In the tests employing two previously unused literature-extracted datasets, PCANN performed better than the best of the publicly available predictors, BindPPI, with mean absolute error (MAE) of 1.3 versus 1.4 kcal/mol. Further progress in the development of K d $$ {K}_{\mathrm{d}} $$ predictors using deep learning models is faced with two problems: (i) the amount of experimental data available to train and test new predictors is limited and (ii) the available K d $$ {K}_{\mathrm{d}} $$ data are often not very accurate and lack internal consistency with respect to measurement conditions. These issues can be potentially addressed through an AI-leveraged literature search followed by careful human curation and by introducing additional parameters to account for variations in experimental conditions.
PMID:40116085 | DOI:10.1002/prot.26821
ChiGNN: Interpretable Algorithm Framework of Molecular Chiral Knowledge-Embedding and Stereosensitive Property Prediction
J Chem Inf Model. 2025 Mar 21. doi: 10.1021/acs.jcim.4c02259. Online ahead of print.
ABSTRACT
Molecular chirality-related tasks have remained a notable challenge in materials machine learning (ML) due to the subtle spatial discrepancy between enantiomers. Designing appropriate steric molecular descriptions and embedding chiral knowledge are of great significance for improving the accuracy and interpretability of ML models. In this work, we propose a state-of-the-art deep learning framework, Chiral Graph Neural Network, which can effectively incorporate chiral physicochemical knowledge via Trinity Graph and stereosensitive Message Aggregation encoding. Combined with the quantile regression technique, the accuracy of the chiral chromatographic retention time prediction model outperformed the existing records. Accounting for the inherent merits of this framework, we have customized the Trinity Mask and Contribution Splitting techniques to enable a multilevel interpretation of the model's decision mechanism at atomic, functional group, and molecular hierarchy levels. This interpretation has both scientific and practical implications for the understanding of chiral chromatographic separation and the selection of chromatographic stationary phases. Moreover, the proposed chiral knowledge embedding and interpretable deep learning framework, together with the stereomolecular representation, chiral knowledge embedding method, and multilevel interpretation technique within it, also provide an extensible template and precedent for future chirality-related or stereosensitive ML tasks.
PMID:40116044 | DOI:10.1021/acs.jcim.4c02259
Performance of AlphaMissense and Other In Silico Predictors to Determine Pathogenicity of Missense Variants in Sarcomeric Genes
Circ Genom Precis Med. 2025 Mar 21:e004922. doi: 10.1161/CIRCGEN.124.004922. Online ahead of print.
NO ABSTRACT
PMID:40115995 | DOI:10.1161/CIRCGEN.124.004922
Integrative Computational Analysis of Common EXO5 Haplotypes: Impact on Protein Dynamics, Genome Stability, and Cancer Progression
J Chem Inf Model. 2025 Mar 21. doi: 10.1021/acs.jcim.5c00067. Online ahead of print.
ABSTRACT
Understanding the impact of common germline variants on protein structure, function, and disease progression is crucial in cancer research. This study presents a comprehensive analysis of the EXO5 gene, which encodes a DNA exonuclease involved in DNA repair that was previously associated with cancer susceptibility. We employed an integrated approach combining genomic and clinical data analysis, deep learning variant effect prediction, and molecular dynamics (MD) simulations to investigate the effects of common EXO5 haplotypes on protein structure, dynamics, and cancer outcomes. We characterized the haplotype structure of EXO5 across diverse human populations, identifying five common haplotypes, and studied their impact on the EXO5 protein. Extensive, all-atom MD simulations revealed significant structural and dynamic differences among the EXO5 protein variants, particularly in their catalytic region. The L151P EXO5 protein variant exhibited the most substantial conformational changes, potentially disruptive for EXO5's function and nuclear localization. Analysis of The Cancer Genome Atlas data showed that cancer patients carrying L151P EXO5 had significantly shorter progression-free survival in prostate and pancreatic cancers and exhibited increased genomic instability. This study highlights the strength of our methodology in uncovering the effects of common genetic variants on protein function and their implications for disease outcomes.
PMID:40115981 | DOI:10.1021/acs.jcim.5c00067
Artificial intelligence algorithm was used to establish and verify the prediction model of portal hypertension in hepatocellular carcinoma based on clinical parameters and imaging features
J Gastrointest Oncol. 2025 Feb 28;16(1):159-175. doi: 10.21037/jgo-2024-931. Epub 2025 Feb 26.
ABSTRACT
BACKGROUND: Portal hypertension (PHT) is an important factor leading to a poor prognosis in patients with hepatocellular carcinoma (HCC). Identifying patients with PHT for individualized treatment is of great clinical significance. The prediction model of HCC combined PHT is in urgent need of clinical practice. Combining clinical parameters and imaging features can improve prediction accuracy. The application of artificial intelligence algorithms can further tap the potential of data, optimize the prediction model, and provide strong support for early intervention and personalized treatment of PHT. This study aimed to establish a prediction model of PHT based on the clinicopathological features of PHT and computed tomography scanning features of the non-tumor liver area in the portal vein stage.
METHODS: A total of 884 patients were enrolled in this study, and randomly divided into a training set of 707 patients (of whom 89 had PHT) and a validation set of 177 patients (of whom 23 had PHT) at a ratio of 8:2. Univariate and multivariate logistic regression analyses were performed to screen the clinical features. Radiomics and deep-learning features were extracted from the non-tumorous liver regions. Feature selection was conducted using t-tests, correlation analyses, and least absolute shrinkage and selection operator regression models. Finally, a predictive model for PHT in HCC patients was constructed by combining clinical features with the selected radiomics and deep-learning features.
RESULTS: Portal vein diameter (PVD), Child-Pugh score, and fibrosis 4 (FIB-4) score were identified as independent risk factors for PHT. The predictive model that incorporated clinical features, radiomics features from non-tumorous liver regions, and deep-learning features had an area under the curve (AUC) of 0.966 [95% confidence interval (CI): 0.954-0.979] and a sensitivity of 0.966 in the training set, and an AUC of 0.698 (95% CI: 0.565-0.831) and a sensitivity of 0.609 in the validation set.
CONCLUSIONS: The preoperative evaluation showed that increased PVD, higher Child-Pugh score, and increased FIB-4 score were independent risk factors for PHT in patients with HCC. To predict the occurrence of PHT more effectively, we construct a comprehensive prediction model. The model incorporates clinical parameters, radiomic features, and deep learning features. This fusion of multi-modal features enables the model to capture complex information related to PHT more comprehensively, thus achieving high prediction accuracy and practicability.
PMID:40115915 | PMC:PMC11921233 | DOI:10.21037/jgo-2024-931
AM-MTEEG: multi-task EEG classification based on impulsive associative memory
Front Neurosci. 2025 Mar 6;19:1557287. doi: 10.3389/fnins.2025.1557287. eCollection 2025.
ABSTRACT
Electroencephalogram-based brain-computer interfaces (BCIs) hold promise for healthcare applications but are hindered by cross-subject variability and limited data. This article proposes a multi-task (MT) classification model, AM-MTEEG, which integrates deep learning-based convolutional and impulsive networks with bidirectional associative memory (AM) for cross-subject EEG classification. AM-MTEEG deals with the EEG classification of each subject as an independent task and utilizes common features across subjects. The model is built with a convolutional encoder-decoder and a population of impulsive neurons to extract shared features across subjects, as well as a Hebbian-learned bidirectional associative memory matrix to classify EEG within one subject. Experimental results on two BCI competition datasets demonstrate that AM-MTEEG improves average accuracy over state-of-the-art methods and reduces performance variance across subjects. Visualization of neuronal impulses in the bidirectional associative memory network reveal a precise mapping between hidden-layer neuron activities and specific movements. Given four motor imagery categories, the reconstructed waveforms resemble the real event-related potentials, highlighting the biological interpretability of the model beyond classification.
PMID:40115889 | PMC:PMC11922916 | DOI:10.3389/fnins.2025.1557287
Improvement of BCI performance with bimodal SSMVEPs: enhancing response intensity and reducing fatigue
Front Neurosci. 2025 Mar 6;19:1506104. doi: 10.3389/fnins.2025.1506104. eCollection 2025.
ABSTRACT
Steady-state visual evoked potential (SSVEP) is a widely used brain-computer interface (BCI) paradigm, valued for its multi-target capability and limited EEG electrode requirements. Conventional SSVEP methods frequently lead to visual fatigue and decreased recognition accuracy because of the flickering light stimulation. To address these issues, we developed an innovative steady-state motion visual evoked potential (SSMVEP) paradigm that integrated motion and color stimuli, designed specifically for augmented reality (AR) glasses. Our study aimed to enhance SSMVEP response intensity and reduce visual fatigue. Experiments were conducted under controlled laboratory conditions. EEG data were analyzed using the deep learning algorithm of EEGNet and fast Fourier transform (FFT) to calculate the classification accuracy and assess the response intensity. Experimental results showed that the bimodal motion-color integrated paradigm significantly outperformed single-motion SSMVEP and single-color SSVEP paradigms, respectively, achieving the highest accuracy of 83.81% ± 6.52% under the medium brightness (M) and area ratio of C of 0.6. Enhanced signal-to-noise ratio (SNR) and reduced visual fatigue were also observed, as confirmed by objective measures and subjective reports. The findings verified the bimodal paradigm as a novel application in SSVEP-based BCIs, enhancing both brain response intensity and user comfort.
PMID:40115888 | PMC:PMC11922886 | DOI:10.3389/fnins.2025.1506104
Fast aberration correction in 3D transcranial photoacoustic computed tomography via a learning-based image reconstruction method
Photoacoustics. 2025 Feb 20;43:100698. doi: 10.1016/j.pacs.2025.100698. eCollection 2025 Jun.
ABSTRACT
Transcranial photoacoustic computed tomography (PACT) holds significant potential as a neuroimaging modality. However, compensating for skull-induced aberrations in reconstructed images remains a challenge. Although optimization-based image reconstruction methods (OBRMs) can account for the relevant wave physics, they are computationally demanding and generally require accurate estimates of the skull's viscoelastic parameters. To circumvent these issues, a learning-based image reconstruction method was investigated for three-dimensional (3D) transcranial PACT. The method was systematically assessed in virtual imaging studies that involved stochastic 3D numerical head phantoms and applied to experimental data acquired by use of a physical head phantom that involved a human skull. The results demonstrated that the learning-based method yielded accurate images and exhibited robustness to errors in the assumed skull properties, while substantially reducing computational times compared to an OBRM. To the best of our knowledge, this is the first demonstration of a learned image reconstruction method for 3D transcranial PACT.
PMID:40115737 | PMC:PMC11923815 | DOI:10.1016/j.pacs.2025.100698
(KAUH-BCMD) dataset: advancing mammographic breast cancer classification with multi-fusion preprocessing and residual depth-wise network
Front Big Data. 2025 Mar 6;8:1529848. doi: 10.3389/fdata.2025.1529848. eCollection 2025.
ABSTRACT
The categorization of benign and malignant patterns in digital mammography is a critical step in the diagnosis of breast cancer, facilitating early detection and potentially saving many lives. Diverse breast tissue architectures often obscure and conceal breast issues. Classifying worrying regions (benign and malignant patterns) in digital mammograms is a significant challenge for radiologists. Even for specialists, the first visual indicators are nuanced and irregular, complicating identification. Therefore, radiologists want an advanced classifier to assist in identifying breast cancer and categorizing regions of concern. This study presents an enhanced technique for the classification of breast cancer using mammography images. The collection comprises real-world data from King Abdullah University Hospital (KAUH) at Jordan University of Science and Technology, consisting of 7,205 photographs from 5,000 patients aged 18-75. After being classified as benign or malignant, the pictures underwent preprocessing by rescaling, normalization, and augmentation. Multi-fusion approaches, such as high-boost filtering and contrast-limited adaptive histogram equalization (CLAHE), were used to improve picture quality. We created a unique Residual Depth-wise Network (RDN) to enhance the precision of breast cancer detection. The suggested RDN model was compared with many prominent models, including MobileNetV2, VGG16, VGG19, ResNet50, InceptionV3, Xception, and DenseNet121. The RDN model exhibited superior performance, achieving an accuracy of 97.82%, precision of 96.55%, recall of 99.19%, specificity of 96.45%, F1 score of 97.85%, and validation accuracy of 96.20%. The findings indicate that the proposed RDN model is an excellent instrument for early diagnosis using mammography images and significantly improves breast cancer detection when integrated with multi-fusion and efficient preprocessing approaches.
PMID:40115240 | PMC:PMC11922913 | DOI:10.3389/fdata.2025.1529848
Overview of the Head and Neck Tumor Segmentation for Magnetic Resonance Guided Applications (HNTS-MRG) 2024 Challenge
Head Neck Tumor Segm MR Guid Appl (2024). 2025;15273:1-35. doi: 10.1007/978-3-031-83274-1_1. Epub 2025 Mar 3.
ABSTRACT
Magnetic resonance (MR)-guided radiation therapy (RT) is enhancing head and neck cancer (HNC) treatment through superior soft tissue contrast and longitudinal imaging capabilities. However, manual tumor segmentation remains a significant challenge, spurring interest in artificial intelligence (AI)-driven automation. To accelerate innovation in this field, we present the Head and Neck Tumor Segmentation for MR-Guided Applications (HNTS-MRG) 2024 Challenge, a satellite event of the 27th International Conference on Medical Image Computing and Computer Assisted Intervention. This challenge addresses the scarcity of large, publicly available AI-ready adaptive RT datasets in HNC and explores the potential of incorporating multi-timepoint data to enhance RT auto-segmentation performance. Participants tackled two HNC segmentation tasks: automatic delineation of primary gross tumor volume (GTVp) and gross metastatic regional lymph nodes (GTVn) on pre-RT (Task 1) and mid-RT (Task 2) T2-weighted scans. The challenge provided 150 HNC cases for training and 50 for final testing hosted on grand-challenge.org using a Docker submission framework. In total, 19 independent teams from across the world qualified by submitting both their algorithms and corresponding papers, resulting in 18 submissions for Task 1 and 15 submissions for Task 2. Evaluation using the mean aggregated Dice Similarity Coefficient showed top-performing AI methods achieved scores of 0.825 in Task 1 and 0.733 in Task 2. These results surpassed clinician interobserver variability benchmarks, marking significant strides in automated tumor segmentation for MR-guided RT applications in HNC.
PMID:40115167 | PMC:PMC11925392 | DOI:10.1007/978-3-031-83274-1_1
Segment Like A Doctor: Learning reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation
Med Image Anal. 2025 Mar 13;102:103539. doi: 10.1016/j.media.2025.103539. Online ahead of print.
ABSTRACT
Pancreatic cancer is a lethal invasive tumor with one of the worst prognosis. Accurate and reliable segmentation for pancreas and pancreatic cancer on computerized tomography (CT) images is vital in clinical diagnosis and treatment. Although certain deep learning-based techniques have been tentatively applied to this task, current performance of pancreatic cancer segmentation is far from meeting the clinical needs due to the tiny size, irregular shape and extremely uncertain boundary of the cancer. Besides, most of the existing studies are established on the black-box models which only learn the annotation distribution instead of the logical thinking and diagnostic experience of high-level medical experts, the latter is more credible and interpretable. To alleviate the above issues, we propose a novel Segment-Like-A-Doctor (SLAD) framework to learn the reliable clinical thinking and experience for pancreas and pancreatic cancer segmentation on CT images. Specifically, SLAD aims to simulate the essential logical thinking and experience of doctors in the progressive diagnostic stages of pancreatic cancer: organ, lesion and boundary stage. Firstly, in the organ stage, an Anatomy-aware Masked AutoEncoder (AMAE) is introduced to model the doctors' overall cognition for the anatomical distribution of abdominal organs on CT images by self-supervised pretraining. Secondly, in the lesion stage, a Causality-driven Graph Reasoning Module (CGRM) is designed to learn the global judgment of doctors for lesion detection by exploring topological feature difference between the causal lesion and the non-causal organ. Finally, in the boundary stage, a Diffusion-based Discrepancy Calibration Module (DDCM) is developed to fit the refined understanding of doctors for uncertain boundary of pancreatic cancer by inferring the ambiguous segmentation discrepancy based on the trustworthy lesion core. Experimental results on three independent datasets demonstrate that our approach boosts pancreatic cancer segmentation accuracy by 4%-9% compared with the state-of-the-art methods. Additionally, the tumor-vascular involvement analysis is also conducted to verify the superiority of our method in clinical applications. Our source codes will be publicly available at https://github.com/ZouLiwen-1999/SLAD.
PMID:40112510 | DOI:10.1016/j.media.2025.103539