Deep learning
A new deep learning-based fast transcoding for internet of things applications
Sci Rep. 2025 May 10;15(1):16325. doi: 10.1038/s41598-025-99533-4.
ABSTRACT
To achieve low-power video communication in Internet of Things, this study presents a new deep learning-based fast transcoding algorithm from distributed video coding (DVC) to high efficiency video coding (HEVC). The proposed method accelerates transcoding by minimizing HEVC encoding complexity. Specifically, it models the selections of coding unit (CU) partitions and prediction unit (PU) partition modes as classification tasks. To address these tasks, a novel lightweight deep learning network has been developed acting as the classifier in a top-down transcoding strategy for improved efficiency. The proposed transcoding algorithm operates efficiently at both CU and PU levels. At the CU level, it reduces HEVC encoding complexity by accurately predicting CU partitions. At the PU level, predicting PU partition modes for non-split CUs further streamlines the encoding process. Experimental results demonstrate that the proposed CU-level transcoding reduces complexity overhead by 45.69%, with a 1.33% average Bjøntegaard delta bit-rate (BD-BR) increase. At the PU level, the transcoding achieves an even greater complexity reduction, averaging 60.97%, with a 2.16% average BD-BR increase. These results highlight the algorithm's efficiency in balancing computational cost and compression performance. The proposed method provides a promising low-power video coding scheme for resource-constrained terminals in both upstream and downstream video communication scenarios.
PMID:40348899 | DOI:10.1038/s41598-025-99533-4
Performance of fully automated deep-learning-based coronary artery calcium scoring in ECG-gated calcium CT and non-gated low-dose chest CT
Eur Radiol. 2025 May 10. doi: 10.1007/s00330-025-11559-4. Online ahead of print.
ABSTRACT
OBJECTIVES: This study aimed to validate the agreement and diagnostic performance of a deep-learning-based coronary artery calcium scoring (DL-CACS) system for ECG-gated and non-gated low-dose chest CT (LDCT) across multivendor datasets.
MATERIALS AND METHODS: In this retrospective study, datasets from Seoul National University Hospital (SNUH, 652 paired ECG-gated and non-gated CT scans) and the Stanford public dataset (425 ECG-gated and 199 non-gated CT scans) were analyzed. Agreement metrics included intraclass correlation coefficient (ICC), coefficient of determination (R²), and categorical agreement (κ). Diagnostic performance was assessed using categorical accuracy and the area under the receiver operating characteristic curve (AUROC).
RESULTS: DL-CACS demonstrated excellent performance for ECG-gated CT in both datasets (SNUH: R² = 0.995, ICC = 0.997, κ = 0.97, AUROC = 0.99; Stanford: R² = 0.989, ICC = 0.990, κ = 0.97, AUROC = 0.99). For non-gated CT using manual LDCT CAC scores as a reference, performance was similarly high (R² = 0.988, ICC = 0.994, κ = 0.96, AUROC = 0.98-0.99). When using ECG-gated CT scores as the reference, performance for non-gated CT was slightly lower but remained robust (SNUH: R² = 0.948, ICC = 0.968, κ = 0.88, AUROC = 0.98-0.99; Stanford: R² = 0.949, ICC = 0.948, κ = 0.71, AUROC = 0.89-0.98).
CONCLUSION: DL-CACS provides a reliable and automated solution for CACS, potentially reducing workload while maintaining robust performance in both ECG-gated and non-gated CT settings.
KEY POINTS: Question How accurate and reliable is deep-learning-based coronary artery calcium scoring (DL-CACS) in ECG-gated CT and non-gated low-dose chest CT (LDCT) across multivendor datasets? Findings DL-CACS showed near-perfect performance for ECG-gated CT. For non-gated LDCT, performance was excellent using manual scores as the reference and lower but reliable when using ECG-gated CT scores. Clinical relevance DL-CACS provides a reliable and automated solution for CACS, potentially reducing workload and improving diagnostic workflow. It supports cardiovascular risk stratification and broader clinical adoption, especially in settings where ECG-gated CT is unavailable.
PMID:40348882 | DOI:10.1007/s00330-025-11559-4
Multimodal anomaly detection in complex environments using video and audio fusion
Sci Rep. 2025 May 10;15(1):16291. doi: 10.1038/s41598-025-01146-4.
ABSTRACT
Due to complex environmental conditions and varying noise levels, traditional models are limited in their effectiveness for detecting anomalies in video sequences. Aiming at the challenges of accuracy, robustness, and real-time processing requirements in the field of image and video processing, this study proposes an anomaly detection and recognition algorithm for video image data based on deep learning. The algorithm combines the innovative methods of spatio-temporal feature extraction and noise suppression, and aims to improve the processing performance, especially in complex environments, by introducing an improved Variable Auto Encoder (VAE) structure. The model named Spatio-Temporal Anomaly Detection Network (STADNet) captures the spatio-temporal features of video images through multi-scale Three-Dimensional (3D) convolution module and spatio-temporal attention mechanism. This approach improves the accuracy of anomaly detection. Multi-stream network architecture and cross-attention fusion mechanism are also adopted to comprehensively consider different factors such as color, texture, and motion, and further improve the robustness and generalization ability of the model. The experimental results show that compared with the existing models, the new model has obvious advantages in performance stability and real-time processing under different noise levels. Specifically, the AUC value of the proposed model is 0.95 on UCSD Ped2 dataset, which is about 10% higher than other models, and the AUC value on Avenue dataset is 0.93, which is about 12% higher. This study not only proposes an effective image and video processing scheme but also demonstrates wide practical potential, providing a new perspective and methodological basis for future research and application in related fields.
PMID:40348836 | DOI:10.1038/s41598-025-01146-4
A deep learning framework for virtual continuous glucose monitoring and glucose prediction based on life-log data
Sci Rep. 2025 May 10;15(1):16290. doi: 10.1038/s41598-025-01367-7.
ABSTRACT
While continuous glucose monitoring (CGM) has revolutionized metabolic health management, widespread adoption remains limited by cost constraints and usage burden, often resulting in interrupted monitoring periods. We propose a deep learning framework for glucose level inference that operates independently of prior glucose measurements, utilizing comprehensive life-log data. The model employs a bidirectional Long Short-Term Memory (LSTM) network with an encoder-decoder architecture, incorporating dual attention mechanisms for temporal and feature importance. The system was trained on data from 171 healthy adults, encompassing detailed records of dietary intake, physical activity metrics, and glucose measurements. The encoder's hidden state as latent representations were analyzed for distributions of patterns of glucose and life-log sequences. The model showed a 19.49 ± 5.42 (mg/dL) in Root Mean Squared Error, 0.43 ± 0.2 in correlation coefficient, and 12.34 ± 3.11 (%) in Mean Absolute Percentage Eror for current glucose level predictions without any information of glucose at the inference step. The distribution of latent representations from the encoder showed the potential differentiation for glucose patterns. The model's ability to maintain predictive accuracy during periods of CGM unavailability has the potential to support intermittent monitoring scenarios for users.
PMID:40348812 | DOI:10.1038/s41598-025-01367-7
Estimation method of dynamic range parameters for cochlear implants based on neural response telemetry threshold
Acta Otolaryngol. 2025 May 10:1-9. doi: 10.1080/00016489.2025.2492359. Online ahead of print.
ABSTRACT
BACKGROUND: There is a lack of correlation studies between subjective behavioral test threshold and neural response telemetry (NRT) thresholds in cochlear implant (CI) patients. At present, there is no predictive model that can predict the parameters of CI adjustment objectively and reliably.
OBJECTIVES: To explore the correlation between the subjective behavior test method threshold and the NRT thresholds in patients with CI with normal cochlear (NC) morphology and inner ear malformation (IEM). To explore the value of using deep learning technology to predict the parameters of machine adjustment and guide the postoperative machine adjustment.
METHODS: NRT and subjective behavior tests were conducted on 57 cases of CI patients with NC morphology and 20 cases of IEM using electrodes 1, 6, 11, 16, and 22, respectively. The correlation between the NRT thresholds and T and C values was analyzed. Using deep learning techniques, establish a prediction model based on convolutional neural networks to predict the parameters of machine adjustment of CI.
RESULTS: The average NRT thresholds values of the NC group and the IEM group were both greater than the T values, close to and slightly smaller than the C values. The average values of T values, C values, and NRT thresholds in the IEM group were slightly higher than those in the NC group. The NRT thresholds of the both groups is significantly correlated with the C values and T values. The constructed prediction model has high accuracy between the predicted values and the actual values of each electrode. Moreover, the linear regression equation between the predicted and actual values is highly similar.
CONCLUSIONS: The NRT thresholds is significantly related to the subjective behavior test threshold. The correlation between NRT thresholds and T or C values can be used to assist in CI tuning. Especially for patients with IEM, different machine adjustment strategies should be adopted compared to NC patients. Moreover, the constructed neural network prediction model can also guide the postoperative adjustment of patients with cochlear implants.
PMID:40347195 | DOI:10.1080/00016489.2025.2492359
ReQuant: improved base modification calling by k-mer value imputation
Nucleic Acids Res. 2025 May 10;53(9):gkaf323. doi: 10.1093/nar/gkaf323.
ABSTRACT
Nanopore sequencing allows identification of base modifications, such as methylation, directly from raw current data. Prevailing approaches, including deep learning (DL) methods, require training data covering all possible sequence contexts. These data can be prohibitively expensive or impossible to obtain for some modifications. Hence, research into DNA modifications focuses on the most prevalent modification in human DNA: 5mC in a CpG context. Improved generalization is required to reach the technology's full potential: calling any modification from raw current values. We developed ReQuant, an algorithm to impute full, k-mer based, modification models from limited k-mer context training data. ReQuant is highly accurate for calling modifications (CpG/GpC methylation and CpG glucosylation) in Lambda Phage R9 data when fitting on ≤25% of all possible 6-mers with a modification and extends to human R10 data. The success of our approach shows that DNA modifications have a consistent and therefore predictable effect on Nanopore current levels, suggesting that interpretable rule-based imputation in unseen contexts is possible. Our approach circumvents the need for modification-specific DL tools and enables modification calling when not all sequence contexts can be obtained, opening a vast field of biological base modification research.
PMID:40347136 | DOI:10.1093/nar/gkaf323
Optimizing Deep Learning Models for Luminal and Nonluminal Breast Cancer Classification Using Multidimensional ROI in DCE-MRI-A Multicenter Study
Cancer Med. 2025 May;14(9):e70931. doi: 10.1002/cam4.70931.
ABSTRACT
OBJECTIVES: Previous deep learning studies have not explored the synergistic effects of ROI dimensions (2D/2.5D/3D), peritumoral expansion levels (0-8 mm), and segmentation scenarios (ROI only vs. ROI original). Our study aims to evaluate the performance of multidimensional deep transfer learning models in distinguishing molecular subtypes of breast cancer (luminal vs. nonluminal) using DCE-MRI. Under two segmentation scenarios, we systematically compare the effects of ROI dimensions and peritumoral expansion levels to optimize multidimensional deep learning models via transfer learning for distinguishing luminal from nonluminal breast cancers in DCE-MRI-based analysis.
MATERIALS AND METHODS: From October 2020 to October 2023, data from 426 patients with primary invasive breast cancer were retrospectively collected. Patients were divided into three cohorts: (1) training cohort, n = 108, from SYSU Hospital (Zhuhai, China); (2) validation cohort 1, n = 165, from HZ Hospital (Huizhou, China); and (3) validation cohort 2, n = 153, from LY Hospital (Linyi, China). ROIs were delineated, and expansions of 2, 4, 6, and 8 mm beyond the lesion boundary were performed. We assessed the performance of various deep transfer learning models, considering precise segmentation (ROI only and ROI original) and varying peritumoral regions, using ROC curves and decision curve analysis.
RESULTS: The 2.5D1-based deep learning model (ROI original, 4 mm expansion) demonstrated optimal performance, achieving an AUC of 0.808 (95% CI 0.715-0.901) in the training cohort, 0.766 (95% CI 0.682-0.850) in validation cohort 1, and 0.799 (95% CI 0.725-0.874) in validation cohort 2.
CONCLUSION: The study highlights that the 2.5D1-based deep learning model utilizing the three principal slices of the minimum bounding box (ROI original) with a 4 mm peritumoral region is effective in distinguishing between luminal and nonluminal breast cancer tumors, serving as a potential diagnostic tool.
PMID:40347080 | DOI:10.1002/cam4.70931
Clinical Validation of Artificial Intelligence Algorithms for the Diagnosis of Adult Obstructive Sleep Apnea and Sleep Staging From Oximetry and Photoplethysmography-SleepAI
J Sleep Res. 2025 May 10:e70093. doi: 10.1111/jsr.70093. Online ahead of print.
ABSTRACT
Home sleep apnea tests (HSATs) have emerged as alternatives to in-laboratory polysomnography (PSG), but Type IV HSATs often show limited diagnostic performance. This study clinically validates SleepAI, a novel remote digital health system that applies AI algorithms to raw oximetry data for automated sleep staging and obstructive sleep apnea (OSA) diagnosis. SleepAI algorithms were trained on over 10,000 PSG recordings. The system consists of a wearable oximeter connected via Bluetooth to a mobile app transmitting raw data to a cloud-based platform for AI-driven analysis. Clinical validation was conducted in 53 subjects with suspected OSA, who used SleepAI for three nights at home and one night in a sleep centre alongside PSG. SleepAI's apnea-hypopnea index (AHI) estimates and three-class sleep staging (Wake, REM, NREM) were compared to PSG references. For OSA severity classification (non-OSA, mild, moderate, severe), SleepAI achieved an overall accuracy of 89%, with F1-scores of 1.0, 1.0, 0.9, and 0.88, respectively. The three-stage sleep classification achieved a Cohen's kappa of 0.75. Night-to-night AHI variability showed that 37.5% of participants experienced a one-level severity change across nights at home. No significant differences in sleep metrics were found between the first and subsequent nights at home, indicating no sleep disturbance by SleepAI. These findings support the SleepAI system as a promising and scalable alternative to existing Type IV HSATs, with the potential to address key clinical gaps by improving diagnostic accuracy and accessibility.
PMID:40346945 | DOI:10.1111/jsr.70093
Estimating canopy leaf angle from leaf to ecosystem scale: a novel deep learning approach using unmanned aerial vehicle imagery
New Phytol. 2025 May 10. doi: 10.1111/nph.70197. Online ahead of print.
ABSTRACT
Leaf angle distribution (LAD) impacts plant photosynthesis, water use efficiency, and ecosystem primary productivity, which are crucial for understanding surface energy balance and climate change responses. Traditional LAD measurement methods are time-consuming and often limited to individual sites, hindering effective data acquisition at the ecosystem scale and complicating the modeling of canopy LAD variations. We present a deep learning approach that is more affordable, efficient, automated, and less labor-intensive than traditional methods for estimating LAD. The method uses unmanned aerial vehicle images processed with structure-from-motion point cloud algorithms and the Mask Region-based convolutional neural network. Validation at the single-leaf scale using manual measurements across three plant species confirmed high accuracy of the proposed method (Pachira glabra: R2 = 0.87, RMSE = 7.61°; Ficus elastica: R2 = 0.91, RMSE = 6.72°; Schefflera macrostachya: R2 = 0.85, RMSE = 5.67°). Employing this method, we efficiently measured leaf angles for 57 032 leaves within a 30 m × 30 m plot, revealing distinct LAD among four representative tree species: Melodinus suaveolens (mean inclination angle 34.79°), Daphniphyllum calycinum (31.22°), Endospermum chinense (25.40°), and Tetracera sarmentosa (30.37°). The method can efficiently estimate LAD across scales, providing critical structural information of vegetation canopy for ecosystem modeling, including species-specific leaf strategies and their effects on light interception and photosynthesis in diverse forests.
PMID:40346911 | DOI:10.1111/nph.70197
An AI-Powered Methodology for Atomic-Scale Analysis of Heterogenized Correlated Single-Atom Catalysts
Small Methods. 2025 May 9:e2402010. doi: 10.1002/smtd.202402010. Online ahead of print.
ABSTRACT
Correlated single-atom catalysts offer transformative potential in catalysis, particularly in the field of electrocatalysis, with a focus on oxygen evolution reactions. Advanced characterization is critical to understanding their atomic-scale properties when techniques usually used in molecular science (Nuclear Magnetic Resonance (NMR), X-ray Diffraction (XRD), Infrared spectroscopy (IR), or Mass Spectrometry (MS)) cannot be applied after dispersing them on a carrier material. Here, a methodology that combines machine learning and mathematical optimization techniques to detect and quantify metal-metal interactions within heterobinuclear Au(III)-Pd(II) macrocyclic complexes on atomically resolved high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) images is introduced. Both supervised and unsupervised machine learning methods are evaluated, with the U-net architecture demonstrating superior performance in distinguishing the two involved chemical species. Mathematical optimization models further enhance the reliability of metal pair identification by providing precise distance metrics for the pairs. This methodology allows for the study of both the dynamics and bond interaction of heterobinuclear Au(III)-Pd(II) complexes. Notably, the analysis of time series of images reveals that most metal pairs remained stable under the high-energy electron beam irradiation conditions. Likewise, the Au-Pd distance within the pairs remains unchanged, indicating a robust interaction of the two metals with the ligand even after being deposited on the amorphous carbon substrate.
PMID:40346778 | DOI:10.1002/smtd.202402010
Author Correction: Deep learning and genome-wide association meta-analyses of bone marrow adiposity in the UK Biobank
Nat Commun. 2025 May 9;16(1):4331. doi: 10.1038/s41467-025-59574-9.
NO ABSTRACT
PMID:40346038 | DOI:10.1038/s41467-025-59574-9
FaceAge, a deep learning system to estimate biological age from face photographs to improve prognostication: a model development and validation study
Lancet Digit Health. 2025 May 7:100870. doi: 10.1016/j.landig.2025.03.002. Online ahead of print.
ABSTRACT
BACKGROUND: As humans age at different rates, physical appearance can yield insights into biological age and physiological health more reliably than chronological age. In medicine, however, appearance is incorporated into medical judgements in a subjective and non-standardised way. In this study, we aimed to develop and validate FaceAge, a deep learning system to estimate biological age from easily obtainable and low-cost face photographs.
METHODS: FaceAge was trained on data from 58 851 presumed healthy individuals aged 60 years or older: 56 304 individuals from the IMDb-Wiki dataset (training) and 2547 from the UTKFace dataset (initial validation). Clinical utility was evaluated on data from 6196 patients with cancer diagnoses from two institutions in the Netherlands and the USA: the MAASTRO, Harvard Thoracic, and Harvard Palliative cohorts FaceAge estimates in these cancer cohorts were compared with a non-cancerous reference cohort of 535 individuals. To assess the prognostic relevance of FaceAge, we performed Kaplan-Meier survival analysis and Cox modelling, adjusting for several clinical covariates. We also assessed the performance of FaceAge in patients with metastatic cancer receiving palliative treatment at the end of life by incorporating FaceAge into clinical prediction models. To evaluate whether FaceAge has the potential to be a biomarker for molecular ageing, we performed a gene-based analysis to assess its association with senescence genes.
FINDINGS: FaceAge showed significant independent prognostic performance in various cancer types and stages. Looking older was correlated with worse overall survival (after adjusting for covariates per-decade hazard ratio [HR] 1·151, p=0·013 in a pan-cancer cohort of n=4906; 1·148, p=0·011 in a thoracic cohort of n=573; and 1·117, p=0·021 in a palliative cohort of n=717). We found that, on average, patients with cancer looked older than their chronological age (mean increase of 4·79 years with respect to non-cancerous reference cohort, p<0·0001). We found that FaceAge can improve physicians' survival predictions in patients with incurable cancer receiving palliative treatments (from area under the curve 0·74 [95% CI 0·70-0·78] to 0·8 [0·76-0·83]; p<0·0001), highlighting the clinical use of the algorithm to support end-of-life decision making. FaceAge was also significantly associated with molecular mechanisms of senescence through gene analysis, whereas age was not.
INTERPRETATION: Our results suggest that a deep learning model can estimate biological age from face photographs and thereby enhance survival prediction in patients with cancer. Further research, including validation in larger cohorts, is needed to verify these findings in patients with cancer and to establish whether the findings extend to patients with other diseases. Subject to further testing and validation, approaches such as FaceAge could be used to translate a patient's visual appearance into objective, quantitative, and clinically valuable measures.
FUNDING: US National Institutes of Health and EU European Research Council.
PMID:40345937 | DOI:10.1016/j.landig.2025.03.002
Comparative analysis of open-source against commercial AI-based segmentation models for online adaptive MR-guided radiotherapy
Z Med Phys. 2025 May 8:S0939-3889(25)00077-7. doi: 10.1016/j.zemedi.2025.04.008. Online ahead of print.
ABSTRACT
BACKGROUND AND PURPOSE: Online adaptive magnetic resonance-guided radiotherapy (MRgRT) has emerged as a state-of-the-art treatment option for multiple tumour entities, accounting for daily anatomical and tumour volume changes, thus allowing sparing of relevant organs at risk (OARs). However, the annotation of treatment-relevant anatomical structures in context of online plan adaptation remains challenging, often relying on commercial segmentation solutions due to limited availability of clinically validated alternatives. The aim of this study was to investigate whether an open-source artificial intelligence (AI) segmentation network can compete with the annotation accuracy of a commercial solution, both trained on the identical dataset, questioning the need for commercial models in clinical practice.
MATERIALS AND METHODS: For 47 pelvic patients, T2w MR imaging data acquired on a 1.5 T MR-Linac were manually contoured, identifying prostate, seminal vesicles, rectum, anal canal, bladder, penile bulb, and bony structures. These training data were used for the generation of an in-house AI segmentation model, a nnU-Net with residual encoder architecture featuring a streamlined single image inference pipeline, and re-training of a commercial solution. For quantitative evaluation, 20 MR images were contoured by a radiation oncologist, considered as ground truth contours (GTC) and compared with the in-house/commercial AI-based contours (iAIC/cAIC) using Dice Similarity Coefficient (DSC), 95% Hausdorff distances (HD95), and surface DSC (sDSC). For qualitative evaluation, four radiation oncologists assessed the usability of OAR/target iAIC within an online adaptive workflow using a four-point Likert scale: (1) acceptable without modification, (2) requiring minor adjustments, (3) requiring major adjustments, and (4) not usable.
RESULTS: Patient-individual annotations were generated in a median [range] time of 23 [16-34] s for iAIC and 152 [121-198] s for cAIC, respectively. OARs showed a maximum median DSC of 0.97/0.97 (iAIC/cAIC) for bladder and minimum median DSC of 0.78/0.79 (iAIC/cAIC) for anal canal/penile bulb. Maximal respectively minimal median HD95 were detected for rectum with 17.3/20.6 mm (iAIC/cAIC) and for bladder with 5.6/6.0 mm (iAIC/cAIC). Overall, the average median DSC/HD95 values were 0.87/11.8mm (iAIC) and 0.83/10.2mm (cAIC) for OAR/targets and 0.90/11.9mm (iAIC) and 0.91/16.5mm (cAIC) for bony structures. For a tolerance of 3 mm, the highest and lowest sDSC were determined for bladder (iAIC:1.00, cAIC:0.99) and prostate in iAIC (0.89) and anal canal in cAIC (0.80), respectively. Qualitatively, 84.8% of analysed contours were considered as clinically acceptable for iAIC, while 12.9% required minor and 2.3% major adjustments or were classed as unusable. Contour-specific analysis showed that iAIC achieved the highest mean scores with 1.00 for the anal canal and the lowest with 1.61 for the prostate.
CONCLUSION: This study demonstrates that open-source segmentation framework can achieve comparable annotation accuracy to commercial solutions for pelvic anatomy in online adaptive MRgRT. The adapted framework not only maintained high segmentation performance, with 84.8% of contours accepted by physicians or requiring only minor corrections (12.9%) but also enhanced clinical workflow efficiency of online adaptive MRgRT through reduced inference times. These findings establish open-source frameworks as viable alternatives to commercial systems in supervised clinical workflows.
PMID:40345918 | DOI:10.1016/j.zemedi.2025.04.008
Enhanced Graph Attention Network by Integrating Transformer for Epileptic EEG Identification
Int J Neural Syst. 2025 May 9:2550037. doi: 10.1142/S0129065725500376. Online ahead of print.
ABSTRACT
Electroencephalography signal classification is essential for the diagnosis and monitoring of neurological disorders, with significant implications for patient treatment. Despite the progress made, existing methods face challenges such as capturing the complex dynamics of Electroencephalogram (EEG) signals and generalizing across diverse patient populations. In this study, the graph attention network and the transformer model are integrated for EEG signal classification, leveraging the enhanced capability to dynamically compute attention weights and adapt to the variable relevance of brain regions. The proposed approach is capable of modeling the intricate relationships within EEG activities by learning context-dependent attention scores. We conducted a comprehensive evaluation of the proposed approach comparing with the state-of-the-art algorithms. Experimental outcomes show that it surpasses the competing models. The superior performance is attributed to the proposed approach's dynamic attention mechanism, which better captures the nuanced patterns in EEG signals across different subjects and seizure types. In the experiments, the CHB-MIT dataset was exploited, which served as a benchmark for evaluating the performance of the proposed framework in distinguishing interictal, ictal, and normal EEG patterns. The results prove the usefulness of our work in advancing EEG signal classification. The findings suggest that the combination of graph attention and self-attention mechanisms is a promising approach for improving the accuracy and reliability of EEG-based diagnostics, potentially improving the management of neurological disorders.
PMID:40346731 | DOI:10.1142/S0129065725500376
SMFF-DTA: using a sequential multi-feature fusion method with multiple attention mechanisms to predict drug-target binding affinity
BMC Biol. 2025 May 9;23(1):120. doi: 10.1186/s12915-025-02222-x.
ABSTRACT
BACKGROUND: Drug-target binding affinity (DTA) prediction can accelerate the drug screening process, and deep learning techniques have been used in all facets of drug research. Affinity prediction based on deep learning methods has proven crucial to drug discovery, design, and reuse. Among these, the sequence-based approach using 1D sequences of drugs and targets as inputs typically results in the loss of structural information, whereas the structure-based method frequently results in increased computing costs due to the intricate structure of the molecule graph.
RESULTS: We propose a sequential multifeature fusion method (SMFF-DTA) to achieve efficient and accurate prediction. SMFF-DTA uses sequential methods to represent the structural information and physicochemical properties of drugs and targets and introduces multiple attention blocks to capture interaction features closely.
CONCLUSIONS: As demonstrated by our extensive studies, SMFF-DTA outperforms the other methods in terms of various metrics, showing its advantages and effectiveness as a drug-target binding affinity predictor.
PMID:40346536 | DOI:10.1186/s12915-025-02222-x
PCVR: a pre-trained contextualized visual representation for DNA sequence classification
BMC Bioinformatics. 2025 May 9;26(1):125. doi: 10.1186/s12859-025-06136-x.
ABSTRACT
BACKGROUND: The classification of DNA sequences is pivotal in bioinformatics, essentially for genetic information analysis. Traditional alignment-based tools tend to have slow speed and low recall. Machine learning methods learn implicit patterns from data with encoding techniques such as k-mer counting and ordinal encoding, which fail to handle long sequences or sacrifice structural and sequential information. Frequency chaos game representation (FCGR) converts DNA sequences of arbitrary lengths into fixed-size images, breaking free from the constraints of sequence length while preserving more sequential information than other representations. However, existing works merely consider local information, ignoring long-range dependencies and global contextual information within FCGR image.
RESULTS: We propose PCVR, a Pre-trained Contextualized Visual Representation for DNA sequence classification. PCVR encodes FCGR with a vision transformer into contextualized features containing more global information. To meet the substantial data requirements of the training of vision transformer and learn more robust features, we pre-train the encoder with a masked autoencoder. Pre-trained PCVR exhibits impressive performance on three datasets even with only unsupervised learning. After fine-tuning, PCVR outperforms existing methods on superkingdom and phylum levels. Additionally, our ablation studies confirm the contribution of the vision transformer encoder and masked autoencoder pre-training to performance improvement.
CONCLUSIONS: PCVR significantly improves DNA sequence classification accuracy and shows strong potential for new species discovery due to its effective capture of global information and robustness. Codes for PCVR are available at https://github.com/jiaruizhou/PCVR .
PMID:40346458 | DOI:10.1186/s12859-025-06136-x
CT-based quantification of intratumoral heterogeneity for predicting distant metastasis in retroperitoneal sarcoma
Insights Imaging. 2025 May 9;16(1):99. doi: 10.1186/s13244-025-01977-9.
ABSTRACT
OBJECTIVES: Retroperitoneal sarcoma (RPS) is highly heterogeneous, leading to different risks of distant metastasis (DM) among patients with the same clinical stage. This study aims to develop a quantitative method for assessing intratumoral heterogeneity (ITH) using preoperative contrast-enhanced CT (CECT) scans and evaluate its ability to predict DM risk.
METHODS: We conducted a retrospective analysis of 274 PRS patients who underwent complete surgical resection and were monitored for ≥ 36 months at two centers. Conventional radiomics (C-radiomics), ITH radiomics, and deep-learning (DL) features were extracted from the preoperative CECT scans and developed single-modality models. Clinical indicators and high-throughput CECT features were integrated to develop a combined model for predicting DM. The performance of the models was evaluated by measuring the receiver operating characteristic curve and Harrell's concordance index (C-index). Distant metastasis-free survival (DMFS) was also predicted to further assess survival benefits.
RESULTS: The ITH model demonstrated satisfactory predictive capability for DM in internal and external validation cohorts (AUC: 0.735, 0.765; C-index: 0.691, 0.729). The combined model that combined clinicoradiological variables, ITH-score, and DL-score achieved the best predictive performance in internal and external validation cohorts (AUC: 0.864, 0.801; C-index: 0.770, 0.752), successfully stratified patients into high- and low-risk groups for DM (p < 0.05).
CONCLUSIONS: The combined model demonstrated promising potential for accurately predicting the DM risk and stratifying the DMFS risk in RPS patients undergoing complete surgical resection, providing a valuable tool for guiding treatment decisions and follow-up strategies.
CRITICAL RELEVANCE STATEMENT: The intratumoral heterogeneity analysis facilitates the identification of high-risk retroperitoneal sarcoma patients prone to distant metastasis and poor prognoses, enabling the selection of candidates for more aggressive surgical and post-surgical interventions.
KEY POINTS: Preoperative identification of retroperitoneal sarcoma (RPS) with a high potential for distant metastasis (DM) is crucial for targeted interventional strategies. Quantitative assessment of intratumoral heterogeneity achieved reasonable performance for predicting DM. The integrated model combining clinicoradiological variables, ITH radiomics, and deep-learning features effectively predicted distant metastasis-free survival.
PMID:40346399 | DOI:10.1186/s13244-025-01977-9
Deep learning for Parkinson's disease classification using multimodal and multi-sequences PET/MR images
EJNMMI Res. 2025 May 9;15(1):55. doi: 10.1186/s13550-025-01245-3.
ABSTRACT
BACKGROUND: We aimed to use deep learning (DL) techniques to accurately differentiate Parkinson's disease (PD) from multiple system atrophy (MSA), which share similar clinical presentations. In this retrospective analysis, 206 patients who underwent PET/MR imaging at the Chinese PLA General Hospital were included, having been clinically diagnosed with either PD or MSA; an additional 38 healthy volunteers served as normal controls (NC). All subjects were randomly assigned to the training and test sets at a ratio of 7:3. The input to the model consists of 10 two-dimensional (2D) slices in axial, coronal, and sagittal planes from multi-modal images. A modified Residual Block Network with 18 layers (ResNet18) was trained with different modal images, to classify PD, MSA, and NC. A four-fold cross-validation method was applied in the training set. Performance evaluations included accuracy, precision, recall, F1 score, Receiver operating characteristic (ROC), and area under the ROC curve (AUC).
RESULTS: Six single-modal models and seven multi-modal models were trained and tested. The PET models outperformed MRI models. The 11C-methyl-N-2β-carbomethoxy-3β-(4-fluorophenyl)-tropanel (11C-CFT) -Apparent Diffusion Coefficient (ADC) model showed the best classification, which resulted in 0.97 accuracy, 0.93 precision, 0.95 recall, 0.92 F1, and 0.96 AUC. In the test set, the accuracy, precision, recall, and F1 score of the CFT-ADC model were 0.70, 0.73, 0.93, and 0.82, respectively.
CONCLUSIONS: The proposed DL method shows potential as a high-performance assisting tool for the accurate diagnosis of PD and MSA. A multi-modal and multi-sequence model could further enhance the ability to classify PD.
PMID:40346391 | DOI:10.1186/s13550-025-01245-3
Scoping review of deep learning research illuminates artificial intelligence chasm in otolaryngology-head and neck surgery
NPJ Digit Med. 2025 May 10;8(1):265. doi: 10.1038/s41746-025-01693-0.
ABSTRACT
Clinical validation studies are important to translate artificial intelligence (AI) technology in healthcare but may be underperformed in Otolaryngology - Head & Neck Surgery (OHNS). This scoping review examined deep learning publications in OHNS between 1996 and 2023. Searches on MEDLINE, EMBASE, and Web of Science databases identified 3236 articles of which 444 met inclusion criteria. Publications increased exponentially from 2012-2022 across 48 countries and were most concentrated in otology and neurotology (28%), most targeted extending health care provider capabilities (56%), and most used image input data (55%) and convolutional neural network models (63%). Strikingly, nearly all studies (99.3%) were in silico, proof of concept early-stage studies. Three (0.7%) studies conducted offline validation and zero (0%) clinical validation, illuminating the "AI chasm" in OHNS. Recommendations to cross this chasm include focusing on low complexity and low risk tasks, adhering to reporting guidelines, and prioritizing clinical translation studies.
PMID:40346307 | DOI:10.1038/s41746-025-01693-0
Addressing significant challenges for animal detection in camera trap images: a novel deep learning-based approach
Sci Rep. 2025 May 9;15(1):16191. doi: 10.1038/s41598-025-90249-z.
ABSTRACT
Wildlife biologists increasingly use camera traps for monitoring animal populations. However, manually sifting through the collected images is expensive and time-consuming. Current deep learning studies for camera trap images do not adequately tackle real-world challenges such as imbalances between animal and empty images, distinguishing similar species, and the impact of backgrounds on species identification, limiting the models' applicability in new locations. Here, we present a novel two-stage deep learning framework. First, we train a global deep-learning model using all animal species in the dataset. Then, an agglomerative clustering algorithm groups animals based on their appearance. Subsequently, we train a specialized deep-learning expert model for each animal group to detect similar features. This approach leverages Transfer Learning from the MegaDetectorV5 (YOLOv5 version) model, already pre-trained on various animal species and ecosystems. Our two-stage deep learning pipeline uses the global model to redirect images to the appropriate expert models for final classification. We validated this strategy using 1.3 million images from 91 camera traps encompassing 24 mammal species and used 120,000 images for testing, achieving an F1-Score of 96.2% using expert models for final classification. This method surpasses existing deep learning models, demonstrating improved precision and effectiveness in automated wildlife detection.
PMID:40346172 | DOI:10.1038/s41598-025-90249-z