Deep learning
SMoFFI-SegFormer: a novel approach for ovarian tumor segmentation based on an improved SegFormer architecture
Front Oncol. 2025 Jul 21;15:1555585. doi: 10.3389/fonc.2025.1555585. eCollection 2025.
ABSTRACT
Ovarian cancer remains one of the most lethal gynecological malignancies, posing significant challenges for early detection due to its asymptomatic nature in early stages. Accurate segmentation of ovarian tumors from ultrasound images is critical for improving diagnostic accuracy and patient outcomes. In this study, we introduce SMoFFI-SegFormer, an advanced deep learning model specifically designed to enhance multi-scale feature representation and address the complexities of ovarian tumor segmentation. Building upon the SegFormer architecture, SMoFFI-SegFormer incorporates a novel Self-modulate Fusion with Feature Inhibition (SMoFFI) module that promotes cross-scale information exchange and effectively handles spatial heterogeneity within tumors. Through extensive experimentation on two public datasets-OTU_2D and OTU_CEUS-our model demonstrates superior performance with high overall accuracy, mean Intersection over Union (mIoU), and class accuracy. Specifically, SMoFFI-SegFormer achieves state-of-the-art results, significantly outperforming existing models in both segmentation precision and efficiency. This work paves the way for more reliable and automated tools in the diagnosis and management of ovarian cancer.
PMID:40761240 | PMC:PMC12320497 | DOI:10.3389/fonc.2025.1555585
Optimizing FCN for devices with limited resources using quantization and sparsity enhancement
Sci Rep. 2025 Aug 4;15(1):28472. doi: 10.1038/s41598-025-06848-3.
ABSTRACT
This study addresses the optimization of fully convolutional networks (FCNs) for deployment on resource-limited devices in real-time scenarios. While prior research has extensively applied quantization techniques to architectures like VGG-16, there is limited exploration of comprehensive layer-wise quantization specifically within the FCN-8 architecture. To fill this gap, we propose an innovative approach utilizing full-layer quantization with an [Formula: see text] error minimization algorithm, accompanied by sensitivity analysis to optimize fixed-point representation of network weights. Our results demonstrate that this method significantly enhances sparsity, achieving up to 40%, while preserving performance, yielding an impressive 89.3% pixel accuracy under extreme quantization conditions. The findings highlight the efficacy of full-layer quantization and retraining in simultaneously reducing network complexity and maintaining accuracy in both image classification and semantic segmentation tasks.
PMID:40759661 | DOI:10.1038/s41598-025-06848-3
AI-Driven Integration of Deep Learning with Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction: Progress and Perspectives
JMIR Med Inform. 2025 Aug 4. doi: 10.2196/73995. Online ahead of print.
ABSTRACT
This Perspective article explores the transformative role of artificial intelligence (AI) in predicting perioperative hypoxemia through the integration of deep learning (DL) with multimodal clinical data, including lung imaging, pulmonary function tests (PFTs), and arterial blood gas (ABG) analysis. Perioperative hypoxemia, defined as arterial oxygen partial pressure (PaO₂) <60 mmHg or oxygen saturation (SpO₂) <90%, poses significant risks of delayed recovery and organ dysfunction. Traditional diagnostic methods, such as radiological imaging and ABG analysis, often lack integrated predictive accuracy. AI frameworks, particularly convolutional neural networks (CNNs) and hybrid models like TD-CNNLSTM-LungNet, demonstrate exceptional performance in detecting pulmonary inflammation and stratifying hypoxemia risk, achieving up to 96.57% accuracy in pneumonia subtype differentiation and an AUC of 0.96 for postoperative hypoxemia prediction. Multimodal AI systems, such as DeepLung-Predict, unify CT scans, PFTs, and ABG parameters to enhance predictive precision, surpassing conventional methods by 22%. However, challenges persist, including dataset heterogeneity, model interpretability, and clinical workflow integration. Future directions emphasize multicenter validation, explainable AI (XAI) frameworks, and pragmatic trials to ensure equitable and reliable deployment. This AI-driven approach not only optimizes resource allocation but also mitigates financial burdens on healthcare systems by enabling early interventions and reducing ICU admission risks.
PMID:40759599 | DOI:10.2196/73995
Diagnostic systematic review and meta-analysis of machine learning in predicting biochemical recurrence of prostate cancer
Sci Rep. 2025 Aug 4;15(1):28378. doi: 10.1038/s41598-025-11445-5.
ABSTRACT
Prostate cancer (PCa) is the most prevalent malignant tumor in males, and many patients remain at risk of biochemical recurrence (BCR) following initial treatment. Accurate prediction of BCR is vital for effective clinical management and treatment planning. This study evaluates the effectiveness of machine learning (ML) models in predicting BCR among prostate cancer patients, comparing their performance to traditional prognostic methods. We systematically searched four databases (PubMed, Web of Science, Embase, and Cochrane) for studies employing ML techniques to predict prostate cancer BCR. Data extraction included model type, sample size, and the area under the curve (AUC). A meta-analysis was conducted using AUC as the primary performance metric to assess predictive accuracy and heterogeneity across models. Sixteen studies comprising a total of 17,316 prostate cancer patients were included. The pooled AUC for ML models was 0.82 (95% CI: 0.81-0.84). Deep learning and hybrid models outperformed traditional models (AUC = 0.83). Models using imaging data showed improved performance (AUC = 0.82). ML models were most effective in predicting 1-year BCR (AUC = 0.86), with performance slightly decreasing for longer time intervals. ML models outperform traditional methods in predicting BCR, especially in the short term. Incorporating multimodal data, such as imaging, enhances predictive accuracy. Future studies should optimize and validate these models through large-scale clinical trials.
PMID:40760134 | DOI:10.1038/s41598-025-11445-5
Real-time facial recognition via multitask learning on raspberry Pi
Sci Rep. 2025 Aug 4;15(1):28467. doi: 10.1038/s41598-025-97490-6.
ABSTRACT
This paper investigates the feasibility of multi-task learning (MTL) for facial recognition on the Raspberry Pi, a low-cost single-board computer, demonstrating its ability to perform complex deep learning tasks in real time. Using MobileNet, MobileNetV2, and InceptionV3 as base models, we trained MTL models on a custom database derived from the VGGFace2 dataset, focusing on three tasks: person identification, age estimation, and ethnicity prediction. MobileNet achieved the highest accuracy, with 99% in person identification, 99.3% in age estimation, and 99.5% in ethnicity prediction. Compared to previous studies, which primarily relied on high-end hardware for MTL in facial recognition, this work uniquely demonstrates the successful deployment of efficient MTL models on resource-constrained devices like the Raspberry Pi. This advancement significantly reduces computational load and energy consumption while maintaining high accuracy, making facial recognition systems more accessible and practical for real-world applications such as security, personalized customer experiences, and demographic analytics. This study opens new avenues for innovation in resource-efficient deep learning systems.
PMID:40760089 | DOI:10.1038/s41598-025-97490-6
Human fall direction recognition in the indoor and outdoor environment using multi self-attention RBnet deep architectures and tree seed optimization
Sci Rep. 2025 Aug 4;15(1):28475. doi: 10.1038/s41598-025-11031-9.
ABSTRACT
Falling poses a significant health risk to the elderly, often resulting in severe injuries if not promptly addressed. As the global population increases, the frequency of falls increases along with the associated financial burden. Hence, early detection is crucial for initiating timely medical interventions and minimizing physical, social, and economic harm. With the growing demand for safety monitoring of older adults, particularly those living alone, effective fall detection has become increasingly important for supporting independent living. In this study, we propose a novel deep learning architecture and an optimization algorithm for human fall direction recognition. Subsequently, we developed four novel residual block and self-attention mechanisms, named residual block-deep convolutional neural network (3-RBNet), 5-RBNet, 7-RBNet, and 9-RBNet self-attention models. The models were trained on enhanced images, and deep features were extracted from the self-attention layer. The 7-RBNet and 9-RBNet self-attention models demonstrated superior accuracy and precision rates, leading us to exclude the 3-RBNet self model from further analysis. To optimize feature selection and improve classification performance while reducing computational costs, we employed the tree seed algorithm on the self-attention features of 7-RBNet and 9-RBNet self-attention models. Experiments using the proposed method were performed on a human fall dataset collected from Soonchunhyang University, South Korea. The proposed method achieved maximum accuracies of 93.2% and 92.5%, respectively. Compared with recent techniques, our approach improved accuracy and precision.
PMID:40760069 | DOI:10.1038/s41598-025-11031-9
Gated recurrent unit with decay has real-time capability for postoperative ileus surveillance and offers cross-hospital transferability
Commun Med (Lond). 2025 Aug 4;5(1):331. doi: 10.1038/s43856-025-01053-9.
ABSTRACT
BACKGROUND: Ileus, a postoperative complication after colorectal surgery, increases morbidity, costs, and hospital stays. Assessing risk of ileus is crucial, especially with the trend towards early discharge. Prior studies assessed risk of ileus with regression models, the role of deep learning remains unexplored.
METHODS: We evaluated the Gated Recurrent Unit with Decay (GRU-D) for real-time ileus risk assessment in 7349 colorectal surgeries across three Mayo Clinic sites with two Electronic Health Record (EHR) systems. The results were compared with atemporal models on a panel of benchmark metrics.
RESULTS: Here we show that despite extreme data sparsity (e.g., 72.2% of labs, 26.9% of vitals lack measurements within 24 h post-surgery), GRU-D demonstrates improved performance by integrating new measurements and exhibits robust transferability. In brute-force transfer, AUROC decreases by no more than 5%, while multi-source instance transfer yields up to a 2.6% improvement in AUROC and an 86% narrower confidence interval. Although atemporal models perform better at certain pre-surgical time points, their performance fluctuates considerably and generally falls short of GRU-D in post-surgical hours.
CONCLUSIONS: GRU-D's dynamic risk assessment capability is crucial in scenarios where clinical follow-up is essential, warranting further research on built-in explainability for clinical integration.
PMID:40760048 | DOI:10.1038/s43856-025-01053-9
Machine learning enables legal risk assessment in internet healthcare using HIPAA data
Sci Rep. 2025 Aug 5;15(1):28477. doi: 10.1038/s41598-025-13720-x.
ABSTRACT
This study explores how artificial intelligence technologies can enhance the regulatory capacity for legal risks in internet healthcare based on a machine learning (ML) analytical framework and utilizes data from the health insurance portability and accountability act (HIPAA) database. The research methods include data collection and processing, construction and optimization of ML models, and the application of a risk assessment framework. Firstly, the data are sourced from the HIPAA database, encompassing various data types, such as medical records, patient personal information, and treatment costs. Secondly, to address missing values and noise in the data, preprocessing methods such as denoising, normalization, and feature extraction are employed to ensure data quality and model accuracy. Finally, in the selection of ML models, this study experiments with several common algorithms, including extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF), and deep neural network (DNN). Each algorithm has its strengths and limitations depending on the specific legal risk assessment task. RF enhances classification performance by integrating multiple decision trees, while SVM achieves efficient classification by identifying the maximum margin hyperplane. DNN demonstrates strong capabilities in handling complex nonlinear relationships, and XGBoost further improves classification accuracy by optimizing decision tree models through gradient boosting. Model performance is evaluated using metrics such as accuracy, recall, precision, F1 score, and area under curve (AUC) value. The experimental results indicate that the DNN model performs excellently in terms of F1 score, accuracy, and recall, showcasing its efficiency and stability in legal risk assessment. The principal component analysis-random forest (PCA+RF) and RF models also exhibit stable performance, making them suitable for various application scenarios. In contrast, the SVM and K-Nearest Neighbor models perform relatively weaker, although they still retain some validity in certain contexts, their overall performance is inferior to deep learning and ensemble learning methods. This study not only provides effective ML tools for legal risk assessment in internet healthcare but also offers theoretical support and practical guidance for future research in this field.
PMID:40760025 | DOI:10.1038/s41598-025-13720-x
Adaptive fusion of multi-cultural visual elements using deep learning in cross-cultural visual communication design
Sci Rep. 2025 Aug 4;15(1):28431. doi: 10.1038/s41598-025-13386-5.
ABSTRACT
This paper presents a novel deep learning approach for the adaptive fusion of multicultural visual elements in cross-cultural visual communication design for interface development. We address the challenge of creating culturally appropriate digital interfaces by developing a comprehensive framework that combines convolutional neural networks, attention mechanisms, and generative adversarial networks to analyze, extract, and adaptively fuse cultural features from diverse visual communication design elements. The proposed algorithm dynamically adjusts color schemes, spatial arrangements, typography, and iconography based on target cultural preferences while maintaining visual communication design coherence and functional clarity. Experimental evaluations conducted across five cultural regions demonstrate that our approach outperforms existing methods in cultural appropriateness (17.3% improvement), aesthetic coherence (12.8% enhancement), and user satisfaction (27.3% increase). Implementation in e-commerce, educational, and financial service applications showed significant improvements in user engagement, task efficiency, and conversion rates. Our research contributes to the advancement of inclusive digital experiences by providing a computational framework for cross-cultural visual communication design that respects cultural diversity while enhancing user experience across cultural boundaries.
PMID:40760013 | DOI:10.1038/s41598-025-13386-5
Multilingual sentiment analysis in restaurant reviews using aspect focused learning
Sci Rep. 2025 Aug 4;15(1):28371. doi: 10.1038/s41598-025-12464-y.
ABSTRACT
Cross-cultural sentiment analysis in restaurant reviews presents unique challenges due to linguistic and cultural differences across regions. The purpose of this study is to develop a culturally adaptive sentiment analysis model that improves sentiment detection across multilingual restaurant reviews. This paper proposes XLM-RSA, a novel multilingual model based on XLM-RoBERTa with Aspect-Focused Attention, tailored for enhanced sentiment analysis across diverse cultural contexts. We evaluated XLM-RSA on three benchmark datasets: 10,000 Restaurant Reviews, Restaurant Reviews, and European Restaurant Reviews, achieving state-of-the-art performance across all datasets. XLM-RSA attained an accuracy of 91.9% on the Restaurant Reviews dataset, surpassing traditional models such as BERT (87.8%) and RoBERTa (88.5%). In addition to sentiment classification, we introduce an aspect-based attention mechanism to capture sentiment variations specific to key aspects like food, service, and ambiance, yielding aspect-level accuracy improvements. Furthermore, XLM-RSA demonstrated strong performance in detecting cultural sentiment shifts, with an accuracy of 85.4% on the European Restaurant Reviews dataset, showcasing its robustness to diverse linguistic and cultural expressions. An ablation study highlighted the significance of the Aspect-Focused Attention, where XLM-RSA with this enhancement achieved an F1-score of 91.5%, compared to 89.1% with a simple attention mechanism. These results affirm XLM-RSA's capacity for effective cross-cultural sentiment analysis, paving the way for more accurate sentiment-driven insights in globally distributed customer feedback.
PMID:40759996 | DOI:10.1038/s41598-025-12464-y
Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines
Nat Methods. 2025 Aug 4. doi: 10.1038/s41592-025-02772-6. Online ahead of print.
ABSTRACT
Recent research in deep-learning-based foundation models promises to learn representations of single-cell data that enable prediction of the effects of genetic perturbations. Here we compared five foundation models and two other deep learning models against deliberately simple baselines for predicting transcriptome changes after single or double perturbations. None outperformed the baselines, which highlights the importance of critical benchmarking in directing and evaluating method development.
PMID:40759747 | DOI:10.1038/s41592-025-02772-6
Internet of things enabled deep learning monitoring system for realtime performance metrics and athlete feedback in college sports
Sci Rep. 2025 Aug 4;15(1):28405. doi: 10.1038/s41598-025-13949-6.
ABSTRACT
This study presents an Internet of Things (IoT)-enabled Deep Learning Monitoring (IoT-E-DLM) model for real-time Athletic Performance (AP) tracking and feedback in collegiate sports. The proposed work integrates advanced wearable sensor technologies with a hybrid neural network combining Temporal Convolutional Networks, Bidirectional Long Short-Term Memory (TCN + BiLSTM) + Attention mechanisms. It is designed to overcome key challenges in processing heterogeneous, high-frequency sensor data and delivering low-latency, sport-specific feedback. The system deployed edge computing for real-time local processing and cloud setup for high-complexity analytics, achieving a balance between responsiveness and accuracy. Extensive research was tested with 147 student-athletes across numerous sports, including track and field, basketball, soccer, and swimming, over 12 months at Shangqiu University. The proposed model achieved a prediction accuracy of 93.45% with an average processing latency of 12.34 ms, outperforming conventional and state-of-the-art approaches. The system also demonstrated efficient resource usage (CPU: 68.34%, GPU: 72.56%), high data capture reliability (98.37%), and precise temporal synchronization. These results confirm the model's effectiveness in enabling real-time performance monitoring and feedback delivery, establishing a robust groundwork for future developments in Artificial Intelligence (AI)-driven sports analytics.
PMID:40759726 | DOI:10.1038/s41598-025-13949-6
Deep learning and digital twin integration for structural damage detection in ancient pagodas
Sci Rep. 2025 Aug 4;15(1):28408. doi: 10.1038/s41598-025-14029-5.
ABSTRACT
In recent years, with the rise of digital twin technology in the field of artificial intelligence and the continuous advancement of hardware imaging equipment, significant progress has been made in the detection of structural damage in buildings and sculptures. Structural damage to cultural heritage buildings poses a major threat to their integrity, making accurate detection of such damage crucial for cultural heritage preservation. However, existing deep learning-based object detection technologies face limitations in achieving full coverage of architectural sculptures and enabling multi-angle, free observation, while also exhibiting substantial detection errors. To address these challenges, this paper proposes a detection method that integrates digital modeling with an improved YOLO algorithm. By scanning architectural scenes to generate digital twin models, this method enables full-angle and multi-seasonal scene transformations. Specifically, the Nanjing Sheli pagoda is selected as the research subject, where drone-based panoramic scanning is employed to create a digitalized full-scene model. The improved YOLO algorithm is then used to evaluate detection performance under varying weather and lighting conditions. Finally, evaluation metrics are utilized to automatically analyze detection accuracy and the extent of damage. Compared to traditional on-site manual measurement methods, the proposed YOLO-based automatic detection technology in digitalized scenarios significantly reduces labor costs while improving detection accuracy and efficiency. This approach provides a highly effective and reliable technical solution for assessing the extent of damage in historical buildings.
PMID:40759712 | DOI:10.1038/s41598-025-14029-5
Accurate and Rapid Ranking of Protein-Ligand Binding Affinities Using Density Matrix Fragmentation and Physics-Informed Machine Learning Dispersion Potentials
Chemphyschem. 2025 Aug 4:e2500094. doi: 10.1002/cphc.202500094. Online ahead of print.
ABSTRACT
The generalized many-body expansion for building density matrices (GMBE-DM), truncated at the one-body level and combined with a purification scheme, is applied to rank protein-ligand binding affinities across two cyclin-dependent kinase 2 (CDK2) datasets and one Janus kinase 1 (JAK1) dataset, totaling 28 ligands. This quantum fragmentation-based method achieves strong correlation with experimental binding free energies (R2 = 0.84), while requiring less than 5 min per complex without extensive parallelization, making it highly efficient for rapid drug screening and lead prioritization. In addition, our physics-informed, machine learning-corrected dispersion potential (D3-ML) demonstrates even stronger ranking performance (R2 = 0.87), effectively capturing binding trends through favorable cancelation of non-dispersion, solvation, and entropic contributions, emphasizing the central role of dispersion interactions in protein-ligand binding. With sub-second runtime per complex, D3-ML offers exceptional speed and accuracy, making it ideally suited for high-throughput virtual screening. By comparison, the deep learning model Sfcnn shows lower transferability across datasets (R2 = 0.57), highlighting the limitations of broadly trained neural networks in chemically diverse systems. Together, these results establish GMBE-DM and D3-ML as robust and scalable tools for protein-ligand affinity ranking, with D3-ML emerging as a particularly promising candidate for large-scale applications in drug discovery.
PMID:40758915 | DOI:10.1002/cphc.202500094
Quantifying the Predictability of Lesion Growth and Its Contribution to Quantitative Resistance Using Field Phenomics
Phytopathology. 2025 Aug 4. doi: 10.1094/PHYTO-05-25-0187-R. Online ahead of print.
ABSTRACT
Measuring individual components of pathogen reproduction is key to understanding mechanisms underlying rate-reducing quantitative resistance (QR). Simulation models predict that lesion expansion plays a key role in seasonal epidemics of foliar diseases, but measuring lesion growth with sufficient precision and scale to test these predictions under field conditions has remained impractical. We used deep learning-based image analysis to track 6889 individual lesions caused by Zymoseptoria tritici on 14 wheat cultivars across two field seasons, enabling 27,218 precise and objective measurements of lesion growth in the field. Lesion appearance traits reflecting specific interactions between particular host and pathogen genotypes were consistently associated with lesion growth, whereas overall effects of host genotype and environment were modest. Both host cultivar and cultivar-by-environment interaction effects on lesion growth were highly significant and moderately heritable (h2 ≥ 0.40). After excluding a single outlier cultivar, a strong and statistically significant association between lesion growth and overall QR was found. Lesion expansion appears to be an important component of QR to STB in most-but not all-wheat cultivars, underscoring its potential as a selection target. By facilitating the dissection of individual resistance components, our approach can support more targeted, knowledge-based breeding for durable QR.
PMID:40758903 | DOI:10.1094/PHYTO-05-25-0187-R
Multi-scale feature pyramid network with bidirectional attention for efficient mural image classification
PLoS One. 2025 Aug 4;20(8):e0328507. doi: 10.1371/journal.pone.0328507. eCollection 2025.
ABSTRACT
Mural image recognition plays a critical role in the digital preservation of cultural heritage; however, it faces cross-cultural and multi-period style generalization challenges, compounded by limited sample sizes and intricate details, such as losses caused by natural weathering of mural surfaces and complex artistic patterns.This paper proposes a deep learning model based on DenseNet201-FPN, incorporating a Bidirectional Convolutional Block Attention Module (Bi-CBAM), dynamic focal distillation loss, and convex regularization. First, a lightweight Feature Pyramid Network (FPN) is embedded into DenseNet201 to fuse multi-scale texture features (28 × 28 × 256, 14 × 14 × 512, 7 × 7 × 1024). Second, a bidirectional LSTM-driven attention module iteratively optimizes channel and spatial weights, enhancing detail perception for low-frequency categories. Third, a dynamic temperature distillation strategy (T = 3 → 1) balances supervision from teacher models (ResNeXt101) and ground truth, improving the F1-score of rare classes by 6.1%. Experimental results on a self-constructed mural dataset (2,000 images,26 subcategories.) demonstrate 87.9% accuracy (+3.7% over DenseNet201) and real-time inference on edge devices (63ms/frame at 8.1W on Jetson TX2). This study provides a cost-effective solution for large-scale mural digitization in resource-constrained environments.
PMID:40758742 | DOI:10.1371/journal.pone.0328507
Longitudinal image-based prediction of surgical intervention in infants with hydronephrosis using deep learning: Is a single ultrasound enough?
PLOS Digit Health. 2025 Aug 4;4(8):e0000939. doi: 10.1371/journal.pdig.0000939. eCollection 2025 Aug.
ABSTRACT
The potential of deep learning to predict renal obstruction using kidney ultrasound images has been demonstrated. However, these image-based classifiers have incorporated information using only single-visit ultrasounds. Here, we developed machine learning (ML) models incorporating ultrasounds from multiple clinic visits for hydronephrosis to generate a hydronephrosis severity index score to discriminate patients into high versus low risk for needing pyeloplasty and compare these against models trained with single clinic visit data. We included patients followed for hydronephrosis from three institutions. The outcome of interest was low risk versus high risk of obstructive hydronephrosis requiring pyeloplasty. The model was trained on data from Toronto, ON and validated on an internal holdout set, and tested on an internal prospective set and two external institutions. We developed models trained with single ultrasound (single-visit) and multi-visit models using average prediction, convolutional pooling, long-short term memory and temporal shift models. We compared model performance by area under the receiver-operator-characteristic (AUROC) and area under the precision-recall-curve (AUPRC). A total of 794 patients were included (603 SickKids, 102 Stanford, and 89 CHOP) with a pyeloplasty rate of 12%, 5%, and 67%, respectively. There was no significant difference in developing single-visit US models using the first ultrasound vs. the latest ultrasound. Comparing single-visit vs. multi-visit models, all multi-visit models fail to produce AUROC or AUPRC significantly greater than single-visit models. We developed ML models for hydronephrosis that incorporate multi-visit inference across multiple institutions but did not demonstrate superiority over single-visit inference. These results imply that the single-visit models would be sufficient in aiding accurate risk stratification from single, early ultrasound images.
PMID:40758672 | DOI:10.1371/journal.pdig.0000939
Segmentation of the Left Atrium in Cardiovascular Magnetic Resonance Images of Patients with Myocarditis
J Vis Exp. 2025 Jul 18;(221). doi: 10.3791/68664.
ABSTRACT
Cardiovascular magnetic resonance (CMR) cine sequences serve as the cornerstone imaging technique for evaluating dynamic left atrial (LA) function in myocarditis patients. By capturing three-dimensional motion characteristics throughout the cardiac cycle with high temporal resolution, this modality provides critical data for analyzing myocardial contractile coordination and wall motion abnormalities. Key technological innovations, such as dynamic modeling and strain-encoded imaging, enable quantitative assessment of early-stage LA systolic-diastolic dysfunction in myocarditis. However, the primary challenges in cine sequence segmentation involve dynamic artifacts and spatiotemporal continuity modeling of thin-walled structures. Traditional threshold-based segmentation methods demonstrate limited consistency in dynamic sequences due to their inability to capture motion patterns. Deep learning approaches utilizing three-dimensional fully convolutional network (3D-FCN) achieved superior accuracy through three strategic enhancements: (1) Spatiotemporal feature fusion: This employed 3D convolutional kernels to simultaneously extract spatial structures and temporal dimensional features, thereby reducing motion blurring effects. (2) Dynamic skip connections: Incorporated within encoder-decoder architectures, these connections strengthened deformation correlation modeling across different cardiac phases through cross-temporal feature propagation. (3) Lightweight design: By utilizing patch-wise processing and depthwise separable convolutions, computational efficiency was optimized for real-time processing of large-scale four-dimensional datasets. The 3D-FCN achieved a Dice coefficient of 0.921 for LA segmentation, representing a 12.3% improvement over conventional methods. This design reduced the LA ejection fraction prediction error from 8.7% to 3.2%. The segmentation results directly facilitated the calculation of quantitative metrics, including LA volume-time curves and strain rates. These metrics supported the clinical diagnosis of myocarditis-associated atrial mechanical dysfunction.
PMID:40758568 | DOI:10.3791/68664
Protecting Feature Privacy in Person Re-identification
IEEE Trans Pattern Anal Mach Intell. 2025 Aug 4;PP. doi: 10.1109/TPAMI.2025.3590979. Online ahead of print.
ABSTRACT
Person re-identification (ReID) is to identify the same person across non-overlapping camera views. After a decade of development, the methods based on deep networks have achieved high performance on benchmarks and become mainstream. In applications, the features of gallery images extracted by deep learning-based methods are stored to speed up the query process and protect the sensitive information contained in the images. Unfortunately, it is demonstrated that turning the images into features cannot properly protect privacy, as these features could be reversed to the corresponding images, revealing the sensitive information they contain. Therefore, for preventing privacy leakage, recent methods learn their features against some feature reversal methods, and most conventional reversal methods focus on minimizing the difference between a reconstruction and its original image. However, there could be many reasonable reconstruction results from a single feature, and the conventional reversal methods will inevitably generate reconstruction results that lie in a different distribution from one of the original images, which cannot properly assess the private information for learning to protect and thus hamper the privacy-protected feature learning. To mitigate this problem, we enforce the reconstructions to follow the same distribution as the original images by the generative adversarial network (GAN). We operate this GAN-based feature reversal module accompanied by the conventional ReID feature extraction module and form a novel GAN-based feature privacy-protected person ReID model, which is expected to protect feature privacy so as against reversal attack and maintain ReID utility. We demonstrate that optimizing ReID model to accommodate privacy protection faces a double adversarial objective and is thus challenging. As a remedy, we design a novel two-step training and lazy update strategy that alternatively optimizes the feature extraction module and stabilizes the update process of the GAN-based feature reversal module. To evaluate the efficiency of the model in balancing its ReID utility and feature privacy protection, we introduce a novel metric called utility-reversibility ratio (URR). Compared with existing privacy-protected feature extraction models, the proposed method achieves a better balance between privacy protection and person ReID performance. Extensive experiments validate that our model can effectively protect feature privacy at a tiny accuracy cost, and validate the effectiveness of our model with the emerging diffusion model.
PMID:40758524 | DOI:10.1109/TPAMI.2025.3590979
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
IEEE Trans Pattern Anal Mach Intell. 2025 Aug 4;PP. doi: 10.1109/TPAMI.2025.3593987. Online ahead of print.
ABSTRACT
Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations. Our empirical benchmarks highlight the ability of NUPES to circumvent the limitations of previous post-training quantization techniques on transformers and large language models in particular.
PMID:40758517 | DOI:10.1109/TPAMI.2025.3593987