Deep learning
AG-MS3D-CNN multiscale attention guided 3D convolutional neural network for robust brain tumor segmentation across MRI protocols
Sci Rep. 2025 Jul 7;15(1):24306. doi: 10.1038/s41598-025-09351-x.
ABSTRACT
Accurate segmentation of brain tumors from multimodal Magnetic Resonance Imaging (MRI) plays a critical role in diagnosis, treatment planning, and disease monitoring in neuro-oncology. Traditional methods of tumor segmentation, often manual and labour-intensive, are prone to inconsistencies and inter-observer variability. Recently, deep learning models, particularly Convolutional Neural Networks (CNNs), have shown great promise in automating this process. However, these models face challenges in terms of generalization across diverse datasets, accurate tumor boundary delineation, and uncertainty estimation. To address these challenges, we propose AG-MS3D-CNN, an attention-guided multiscale 3D convolutional neural network for brain tumor segmentation. Our model integrates local and global contextual information through multiscale feature extraction and leverages spatial attention mechanisms to enhance boundary delineation, particularly in complex tumor regions. We also introduce Monte Carlo dropout for uncertainty estimation, providing clinicians with confidence scores for each segmentation, which is crucial for informed decision-making. Furthermore, we adopt a multitask learning framework, which enables the simultaneous segmentation, classification, and volume estimation of tumors. To ensure robustness and generalizability across diverse MRI acquisition protocols and scanners, we integrate a domain adaptation module into the network. Extensive evaluations on the BraTS 2021 dataset and additional external datasets, such as OASIS, ADNI, and IXI, demonstrate the superior performance of AG-MS3D-CNN compared to existing state-of-the-art methods. Our model achieves high Dice scores and shows excellent robustness, making it a valuable tool for clinical decision support in neuro-oncology.
PMID:40624142 | DOI:10.1038/s41598-025-09351-x
An enhanced fusion of transfer learning models with optimization based clinical diagnosis of lung and colon cancer using biomedical imaging
Sci Rep. 2025 Jul 7;15(1):24247. doi: 10.1038/s41598-025-10246-0.
ABSTRACT
Lung and colon cancers (LCC) are among the foremost reasons for human death and disease. Early analysis of this disorder contains various tests, namely ultrasound (US), magnetic resonance imaging (MRI), and computed tomography (CT). Despite analytical imaging, histopathology is one of the effective methods that delivers cell-level imaging of tissue under inspection. These are mainly due to a restricted number of patients receiving final analysis and early healing. Furthermore, there are probabilities of inter-observer faults. Clinical informatics is an interdisciplinary field that integrates healthcare, information technology, and data analytics to improve patient care, clinical decision-making, and medical research. Recently, deep learning (DL) proved to be effective in the medical sector, and cancer diagnosis can be made automatically by utilizing the capabilities of artificial intelligence (AI), enabling faster analysis of more cases cost-effectively. On the other hand, with extensive technical developments, DL has arisen as an effective device in medical settings, mainly in medical imaging. This study presents an Enhanced Fusion of Transfer Learning Models and Optimization-Based Clinical Biomedical Imaging for Accurate Lung and Colon Cancer Diagnosis (FTLMO-BILCCD) model. The main objective of the FTLMO-BILCCD technique is to develop an efficient method for LCC detection using clinical biomedical imaging. Initially, the image pre-processing stage applies the median filter (MF) model to eliminate the unwanted noise from the input image data. Furthermore, fusion models such as CapsNet, EffcientNetV2, and MobileNet-V3 Large are employed for the feature extraction. The FTLMO-BILCCD technique implements a hybrid of temporal pattern attention and bidirectional gated recurrent unit (TPA-BiGRU) for classification. Finally, the beluga whale optimization (BWO) technique alters the hyperparameter range of the TPA-BiGRU model optimally and results in greater classification performance. The FTLMO-BILCCD approach is experimented with under the LCC-HI dataset. The performance validation of the FTLMO-BILCCD approach portrayed a superior accuracy value of 99.16% over existing models.
PMID:40624106 | DOI:10.1038/s41598-025-10246-0
Automated cell annotation and classification on histopathology for spatial biomarker discovery
Nat Commun. 2025 Jul 7;16(1):6240. doi: 10.1038/s41467-025-61349-1.
ABSTRACT
Histopathology with hematoxylin and eosin (H&E) staining is routinely employed for clinical diagnoses. Single-cell analysis of histopathology provides a powerful tool for understanding the intricate cellular interactions underlying disease progression and therapeutic response. However, existing efforts are hampered by inefficient and error-prone human annotations. Here, we present an experimental and computational approach for automated cell annotation and classification on H&E-stained images. Instead of human annotations, we use multiplexed immunofluorescence (mIF) to define cell types based on cell lineage protein markers. By co-registering H&E images with mIF of the same tissue section at the single-cell level, we create a dataset of 1,127,252 cells with high-quality annotations on tissue microarray cores. A deep learning model combining self-supervised learning with domain adaptation is trained to classify four cell types on H&E images with an overall accuracy of 86%-89%, and the cell classification model is applicable to whole slide images. Further, we show that spatial interactions among specific immune cells in the tumor microenvironment are linked to patient survival and response to immune checkpoint inhibitors. Our work provides a scalable approach for single-cell analysis of standard histopathology and may enable discovery of novel spatial biomarkers for precision oncology.
PMID:40624052 | DOI:10.1038/s41467-025-61349-1
Development and retrospective validation of an artificial intelligence system for diagnostic assessment of prostate biopsies: study protocol
BMJ Open. 2025 Jul 7;15(7):e097591. doi: 10.1136/bmjopen-2024-097591.
ABSTRACT
INTRODUCTION: Histopathological evaluation of prostate biopsies using the Gleason scoring system is critical for prostate cancer diagnosis and treatment selection. However, grading variability among pathologists can lead to inconsistent assessments, risking inappropriate treatment. Similar challenges complicate the assessment of other prognostic features like cribriform cancer morphology and perineural invasion. Many pathology departments are also facing an increasingly unsustainable workload due to rising prostate cancer incidence and a decreasing pathologist workforce coinciding with increasing requirements for more complex assessments and reporting. Digital pathology and artificial intelligence (AI) algorithms for analysing whole slide images show promise in improving the accuracy and efficiency of histopathological assessments. Studies have demonstrated AI's capability to diagnose and grade prostate cancer comparably to expert pathologists. However, external validations on diverse data sets have been limited and often show reduced performance. Historically, there have been no well-established guidelines for AI study designs and validation methods. Diagnostic assessments of AI systems often lack preregistered protocols and rigorous external cohort sampling, essential for reliable evidence of their safety and accuracy.
METHODS AND ANALYSIS: This study protocol covers the retrospective validation of an AI system for prostate biopsy assessment. The primary objective of the study is to develop a high-performing and robust AI model for diagnosis and Gleason scoring of prostate cancer in core needle biopsies, and at scale evaluate whether it can generalise to fully external data from independent patients, pathology laboratories and digitalisation platforms. The secondary objectives cover AI performance in estimating cancer extent and detecting cribriform prostate cancer and perineural invasion. This protocol outlines the steps for data collection, predefined partitioning of data cohorts for AI model training and validation, model development and predetermined statistical analyses, ensuring systematic development and comprehensive validation of the system. The protocol adheres to Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis+AI (TRIPOD+AI), Protocol Items for External Cohort Evaluation of a Deep Learning System in Cancer Diagnostics (PIECES), Checklist for AI in Medical Imaging (CLAIM) and other relevant best practices.
ETHICS AND DISSEMINATION: Data collection and usage were approved by the respective ethical review boards of each participating clinical laboratory, and centralised anonymised data handling was approved by the Swedish Ethical Review Authority. The study will be conducted in agreement with the Helsinki Declaration. The findings will be disseminated in peer-reviewed publications (open access).
PMID:40623883 | DOI:10.1136/bmjopen-2024-097591
Deep learning method for cucumber disease detection in complex environments for new agricultural productivity
BMC Plant Biol. 2025 Jul 7;25(1):888. doi: 10.1186/s12870-025-06841-y.
ABSTRACT
Cucumber disease detection under complex agricultural conditions faces significant challenges due to multi-scale variation, background clutter, and hardware limitations. This study proposes YOLO-Cucumber, an improved lightweight detection algorithm based on YOLOv11n, incorporating four key innovations: (1) Deformable Convolutional Networks (DCN) for enhanced feature extraction of irregular targets, (2) a P2 prediction layer for fine-grained detection of early-stage lesions, (3) a Target-aware Loss (TAL) function addressing class imbalance, and (4) Channel Pruning via Batch Normalization (CPBN) for model compression. Experiments on our cucumber disease dataset demonstrate that YOLO-Cucumber achieves a 6.5% improvement in mAP@50 (93.8%), while reducing model size by 3.87 MB and increasing inference speed to 218 FPS. The model effectively handles symptom variability and complex detection scenarios, outperforming mainstream detection algorithms in accuracy, speed, and compactness, making it ideal for embedded agricultural applications.
PMID:40624628 | DOI:10.1186/s12870-025-06841-y
Multilayer perceptron deep learning radiomics model based on Gd-BOPTA MRI to identify vessels encapsulating tumor clusters in hepatocellular carcinoma: a multi-center study
Cancer Imaging. 2025 Jul 7;25(1):87. doi: 10.1186/s40644-025-00895-9.
ABSTRACT
OBJECTIVES: The purpose of this study is to mainly develop a predictive model based on clinicoradiological and radiomics features from preoperative gadobenate-enhanced (Gd-BOPTA) magnetic resonance imaging (MRI) using multilayer perceptron (MLP) deep learning to predict vessels encapsulating tumor clusters (VETC) in hepatocellular carcinoma (HCC) patients.
METHODS: A total of 230 patients with histopathologically confirmed HCC who underwent preoperative Gd-BOPTA MRI before hepatectomy were retrospectively enrolled from three hospitals (144, 54, and 32 in training, test, and validation set, respectively). Univariate and multivariate logistic regression analyses were used to determine independent clinicoradiological predictors significantly associated with VETC, which then constituted the clinicoradiological model. Regions of interest (ROIs) included four modes, intratumoral (Tumor), peritumoral area ≤ 2 mm (Peri2mm), intratumoral + peritumoral area ≤ 2 mm (Tumor + Peri2mm) and intratumoral integrated with peritumoral ≤ 2 mm as a whole (TumorPeri2mm). A total of 7322 radiomics features were extracted respectively for ROI(Tumor), ROI(Peri2mm), ROI(TumorPeri2mm) and 14644 radiomics features for ROI(Tumor + Peri2mm). Least absolute shrinkage and selection operator (LASSO) and univariate logistic regression analysis were used to select the important features. Seven different machine learning classifiers respectively combined the radiomics signatures selected from four ROIs to constitute different models, and compare the performance between them in three sets and then select the optimal combination to become the radiomics model we need. Then a radiomics score (rad-score) was generated, which combined significant clinicoradiological predictors to constituted the fusion model through multivariate logistic regression analysis. After comparing the performance of the three models using area under receiver operating characteristic curve (AUC), integrated discrimination index (IDI) and net reclassification index (NRI), choose the optimal predictive model for VETC prediction.
RESULT: Arterial peritumoral enhancement and peritumoral hypointensity on hepatobiliary phase (HBP) were independent risk factors for VETC, and constituted the Radiology model, without any clinical variables. Arterial peritumoral enhancement defined as the enhancement outside the tumor boundary in the late stage of arterial phase or early stage of portal phase, extensive contact with the tumor edge, which becomes isointense during the DP. MLP deep learning algorithm integrated radiomics features selected from ROI TumorPeri2mm was the best combination, which constituted the radiomics model (MLP model). A MLP score (MLP_score) was calculated then, which combining the two radiology features composed the fusion model (Radiology MLP model), with AUCs of 0.871, 0.894, 0.918 in the training, test and validation sets. Compared with the two models aforementioned, the Radiology MLP model demonstrated a 33.4%-131.3% improvement in NRI and a 9.3%-50% improvement in IDI, showing better discrimination, calibration and clinical usefulness in three sets, which was selected as the optimal predictive model.
CONCLUSION: We mainly developed a fusion model (Radiology MLP model) that integrated radiology and radiomics features using MLP deep learning algorithm to predict vessels encapsulating tumor clusters (VETC) in hepatocellular carcinoma (HCC) patients, which yield an incremental value over the radiology and the MLP model.
PMID:40624579 | DOI:10.1186/s40644-025-00895-9
Multi-task genomic prediction using gated residual variable selection neural networks
BMC Bioinformatics. 2025 Jul 7;26(1):167. doi: 10.1186/s12859-025-06188-z.
ABSTRACT
BACKGROUND: The recent development of high-throughput sequencing techniques provide massive data that can be used in genome-wide prediction (GWP). Although GWP is effective on its own, the incorporation of traditional polygenic pedigree information into GWP has been shown to further improve prediction accuracy. However, most of the methods developed in this field require that individuals with genomic information can be connected to the polygenic pedigree within a standard linear mixed model framework that involves calculation of computationally demanding matrix inverses of the combined pedigrees. The extension of this integrated approach to more flexible machine learning methods has been slow.
METHODS: This study aims to enhance genomic prediction by implementing gated residual variable selection neural networks (GRVSNN) for multi-task genomic prediction. By integrating low-rank information from pedigree-based relationship matrices with genomic markers, we seek to improve predictive accuracy and interpretability compared to conventional regression and deep learning (DL) models. The prediction properties of the GRVSNN model are evaluated on several real-world datasets, including loblolly pine, mouse and pig.
RESULTS: The experimental results demonstrate that the GRVSNN model outperforms traditional tabular genomic prediction models, including Bayesian regression methods and LassoNet. Using genomic and pedigree information, GRVSNN achieves a lower mean squared error (MSE), and higher Pearson (r) and distance (dCor) correlation between predicted and true phenotypic values in the test data. Moreover, GRVSNN selects fewer genetic markers and pedigree loadings which improves interpretability.
CONCLUSION: The suggested GRVSNN framework provides a novel and computationally effective approach to improve genomic prediction accuracy by integrating information from traditional pedigrees with genomic data. The model's ability to conduct multi-task predictions underscores its potential to enhance selection processes in agricultural species and predict multiple diseases in precision medicine.
PMID:40624470 | DOI:10.1186/s12859-025-06188-z
Gender difference in cross-sectional area and fat infiltration of thigh muscles in the elderly population on MRI: an AI-based analysis
Eur Radiol Exp. 2025 Jul 7;9(1):64. doi: 10.1186/s41747-025-00606-w.
ABSTRACT
BACKGROUND: Aging alters musculoskeletal structure and function, affecting muscle mass, composition, and strength, increasing the risk of falls and loss of independence in older adults. This study assessed cross-sectional area (CSA) and fat infiltration (FI) of six thigh muscles through a validated deep learning model. Gender differences and correlations between fat, muscle parameters, and age were also analyzed.
METHODS: We retrospectively analyzed 141 participants (67 females, 74 males) aged 52-82 years. Participants underwent magnetic resonance imaging (MRI) scans of the right thigh and dual-energy x-ray absorptiometry to determine appendicular skeletal muscle mass index (ASMMI) and body fat percentage (FAT%). A deep learning-based application was developed to automate the segmentation of six thigh muscle groups.
RESULTS: Deep learning model accuracy was evaluated using the "intersection over union" (IoU) metric, with average IoU values across muscle groups ranging from 0.84 to 0.99. Mean CSA was 10,766.9 mm² (females 8,892.6 mm², males 12,463.9 mm², p < 0.001). The mean FI value was 14.92% (females 17.42%, males 12.62%, p < 0.001). Males showed larger CSA and lower FI in all thigh muscles compared to females. Positive correlations were identified in females between the FI of posterior thigh muscle groups (biceps femoris, semimembranosus, and semitendinosus) and age (r or ρ = 0.35-0.48; p ≤ 0.004), while no significant correlations were observed between CSA, ASMMI, or FAT% and age.
CONCLUSION: Deep learning accurately quantifies muscle CSA and FI, reducing analysis time and human error. Aging impacts on muscle composition and distribution and gender-specific assessments in older adults is needed.
RELEVANCE STATEMENT: Efficient deep learning-based MRI image segmentation to assess the composition of six thigh muscle groups in over 50 individuals revealed gender differences in thigh muscle CSA and FI. These findings have potential clinical applications in assessing muscle quality, decline, and frailty.
KEY POINTS: Deep learning model enhanced MRI segmentation, providing high assessment accuracy. Significant gender differences in cross-sectional area and fat infiltration across all thigh muscles were observed. In females, fat infiltration of the posterior thigh muscles was positively correlated with age.
PMID:40624409 | DOI:10.1186/s41747-025-00606-w
Radiographic Bone Texture Analysis using Deep Learning Models for Early Rheumatoid Arthritis Diagnosis
J Imaging Inform Med. 2025 Jul 7. doi: 10.1007/s10278-025-01579-3. Online ahead of print.
ABSTRACT
Rheumatoid arthritis (RA) is distinguished by the presence of modified bone microarchitecture, also known as 'texture,' in the periarticular regions. The radiographic detection of such alterations in RA can be challenging. To train and to validate a deep learning model to quantitatively produce periarticular texture features directly from radiography and predict the diagnosis of early RA without human reading. Two kinds of deep learning models were compared for diagnostic performance. Anterior-posterior bilateral hands radiographs of 891 early RA (within one year of initial diagnosis) and 1237 non-RA patients were split into a training set (64%), a validation set (16%), and a test set (20%). The second, third, and fourth distal metacarpal areas were segmented for the Deep Texture Encoding Network (Deep-TEN; texture-based) and residual network-50 (ResNet-50; texture and structure-based) models to predict the probability of RA. The area under the curve of the receiver operating characteristics curve for RA was 0.69 for the Deep-TEN model and 0.73 for the ResNet-50 model. The positive predictive values of a high texture score to classify RA using the Deep-TEN and ResNet-50 models were 0.64 and 0.67, respectively. High mean texture scores were associated with age- and sex-adjusted odds ratios (ORs) with 95% confidence interval (CI) for RA of 3.42 (2.59-4.50) and 4.30 (3.26-5.69) using the Deep-TEN and ResNet-50 models, respectively. The moderate and high RA risk groups determined by the Deep-TEN model were associated with adjusted ORs (95% CIs) of 2.48 (1.78-3.47) and 4.39 (3.11-6.20) for RA, respectively, and those using the ResNet-50 model were 2.17 (1.55-3.04) and 6.91 (4.83-9.90), respectively. Fully automated quantitative assessment for periarticular texture by deep learning models can help in the classification of early RA.
PMID:40624389 | DOI:10.1007/s10278-025-01579-3
PolSAR image classification using shallow to deep feature fusion network with complex valued attention
Sci Rep. 2025 Jul 7;15(1):24315. doi: 10.1038/s41598-025-10475-3.
ABSTRACT
Polarimetric Synthetic Aperture Radar (PolSAR) images encompass valuable information that can facilitate extensive land cover interpretation and generate diverse output products. Extracting meaningful features from PolSAR data poses challenges distinct from those encountered in optical imagery. Deep Learning (DL) methods offer effective solutions for overcoming these challenges in PolSAR feature extraction. Convolutional Neural Networks (CNNs) play a crucial role in capturing PolSAR image characteristics by exploiting kernel capabilities to consider local information and the complex-valued nature of PolSAR data. In this study, a novel three-branch fusion of Complex-Valued CNN named (CV-ASDF2Net) is proposed for PolSAR image classification. To validate the performance of the proposed method, classification results are compared against multiple state-of-the-art approaches using the Airborne Synthetic Aperture Radar (AIRSAR) datasets of Flevoland, San Francisco, and ESAR Oberpfaffenhofen dataset. Moreover, quantitative and qualitative evaluation measures are conducted to assess the classification performance. The results indicate that the proposed approach achieves notable improvements in Overall Accuracy (OA), with enhancements of 1.30% and 0.80% for the AIRSAR datasets, and 0.50% for the ESAR dataset. However, the most remarkable performance of the CV-ASDF2Net model is observed with the Flevoland dataset; the model achieves an impressive OA of 96.01% with only a 1% sampling ratio. The source code is available at: https://github.com/mqalkhatib/CV-ASDF2Net.
PMID:40624319 | DOI:10.1038/s41598-025-10475-3
A latent variable deep generative model for 3D anterior tooth shape
J Prosthodont. 2025 Jul 7. doi: 10.1111/jopr.14092. Online ahead of print.
ABSTRACT
PURPOSE: To introduce a 3D generative technology, PointFlow, which can generate 3D tooth shapes that integrate with conventional digital design workflows, and to evaluate its clinical applicability for tooth reconstruction.
MATERIALS AND METHODS: A dataset of 1337 3D scans of natural anterior teeth was used to train a deep generative model (DGM) called PointFlow. This model encodes complex 3D tooth geometries into compact latent codes that efficiently represent essential morphological features. PointFlow models these latent codes as a continuous distribution, enabling the generation of new, realistic tooth shapes as point clouds by sampling from this latent space. The generative quality of the outputs was quantitatively evaluated using seven 3D shape metrics by comparing both the generated and training samples to a validation set. Clinical applicability was further explored by reconstructing 60 artificially damaged samples using the trained model.
RESULTS: The PointFlow model effectively represented the diversity of anterior tooth shapes. The generated tooth shapes showed superior performance on multiple generative metrics compared to the reference dataset. In the reconstruction task, the model successfully recovered the missing regions in the damaged samples. The average Chamfer Distance for the missing regions across all damage types was 0.2738 ± 0.095 mm.
CONCLUSIONS: Deep generative models can effectively learn tooth characteristics and demonstrate potential in generating high-quality tooth shapes, suggesting their applicability for further clinical use.
PMID:40624318 | DOI:10.1111/jopr.14092
Spatio-temporal transformer and graph convolutional networks based traffic flow prediction
Sci Rep. 2025 Jul 7;15(1):24299. doi: 10.1038/s41598-025-10287-5.
ABSTRACT
Traffic flow prediction is a core component of intelligent transportation systems, providing accurate decision support for traffic management and urban planning. Traffic flow data exhibits highly complex spatiotemporal characteristics due to the intricate spatial correlations between nodes and the significant temporal dependencies across different time intervals. Despite substantial progress in this field, several challenges still remain. Firstly, most current methods rely on Graph Convolutional Networks (GCNs) to extract spatial correlations, typically using predefined adjacency matrices. However, these matrices are inadequate for dynamically capturing the complex and evolving spatial correlations within traffic networks. Secondly, traditional prediction methods predominantly focus on short-term forecasting, which is insufficient for long-term prediction needs. Additionally, many approaches fail to fully consider the local trend information in traffic flow data which reflects short-term temporal variations. To address these issues, a novel deep learning-based traffic flow prediction model, TDMGCN, is proposed. It integrates the Transformer and a multi-graph GCN to tackle the limitations of long-term prediction and the challenges of using the predefined adjacency matrices for spatial correlation extraction. Specifically, in the temporal dimension, a convolution-based multi-head self-attention module is designed. It can not only capture long-term temporal dependencies but also extract local trend information. In the spatial dimension, the model incorporates a spatial embedding module and a multi-graph convolutional module. The former is designed to learn traffic characteristics of different nodes, and the latter is used to extract spatial correlations effectively from multiple graphs. Additionally, the model integrates the periodic features of traffic flow data to further enhance prediction accuracy. Experimental results on five real-world traffic datasets demonstrate that TDMGCN outperforms the current most advanced baseline models.
PMID:40624240 | DOI:10.1038/s41598-025-10287-5
Retraction Note: A combined microfluidic deep learning approach for lung cancer cell high throughput screening toward automatic cancer screening applications
Sci Rep. 2025 Jul 7;15(1):24201. doi: 10.1038/s41598-025-09817-y.
NO ABSTRACT
PMID:40624239 | DOI:10.1038/s41598-025-09817-y
Deep learning-based video analysis for automatically detecting penetration and aspiration in videofluoroscopic swallowing study
Sci Rep. 2025 Jul 7;15(1):24296. doi: 10.1038/s41598-025-10397-0.
ABSTRACT
The videofluoroscopic swallowing study (VFSS) is the gold standard for diagnosing dysphagia, but its interpretation is time-consuming and requires expertise. This study developed a deep learning model for automatically detecting penetration and aspiration in VFSS and assessed its diagnostic accuracy. Images corresponding to the highest and lowest positions of the hyoid bone -representing the moment of upper esophageal sphincter opening during swallow and the pre-swallow and post-swallow phases, respectively- were automatically extracted from VFSS videos, resulting in a total of 18,145 images from 1,467 patients. The model was trained with a convolutional neural network architecture, incorporating techniques to address class imbalance and optimize performance. The model achieved high diagnostic accuracy at the patient level, with the area under the receiver operating characteristic curve values of 0.935 (normal swallowing), 0.889 (penetration), and 0.845 (aspiration). However, despite strong performance in identifying normal swallowing, the model exhibited low sensitivity for detecting penetration and aspiration. The findings suggest that the proposed model may reduce interpretation time by minimizing the need for repeated video review to identify penetration or aspiration, enabling clinicians to focus on other clinically relevant VFSS findings. Future studies should address its limitations by analyzing full-frame VFSS data and incorporating multicenter datasets.
PMID:40624237 | DOI:10.1038/s41598-025-10397-0
Hippocampal blood oxygenation predicts choices about everyday consumer experiences: A deep-learning approach
Proc Natl Acad Sci U S A. 2025 Jul 15;122(28):e2421905122. doi: 10.1073/pnas.2421905122. Epub 2025 Jul 7.
ABSTRACT
This research investigates the neurophysiological mechanisms of experiential versus monetary choices under risk. While ventral striatum and insula activity are instrumental in predicting monetary choices, we find that hippocampal activity plays a key role in predicting experiential choices, which we theorize is due to its role in retrieving autobiographical memories. This neurophysiological differentiation clarifies observed variations in risk preferences between experiential and monetary prospects and highlights the importance of domain-specific neurophysiological processes in shaping human decision-making.
PMID:40623186 | DOI:10.1073/pnas.2421905122
Vibration-based gearbox fault diagnosis using a multi-scale convolutional neural network with depth-wise feature concatenation
PLoS One. 2025 Jul 7;20(7):e0324905. doi: 10.1371/journal.pone.0324905. eCollection 2025.
ABSTRACT
This article proposes a novel approach for vibration-based gearbox fault diagnosis using a multi-scale convolutional neural network with depth-wise feature concatenation named MixNet. In industrial environments where equipment reliability directly impacts productivity, safety, and operational efficiency, timely and accurate fault detection in gearboxes is of paramount importance. As critical components in manufacturing, energy production, transportation, and heavy machinery, gearboxes constitute major potential failure points, with malfunctions leading to costly downtime and, in severe cases, catastrophic incidents. The proposed method addresses these industrial challenges by integrating advanced signal processing techniques with deep learning architectures to enhance diagnostic accuracy and robustness. Specifically, MixNet utilizes multi-scale convolutional layers combined with depth-wise feature concatenation to extract discriminative features from spectrogram representations of vibration signals, generated via the Short-time Fourier transform (STFT). This approach offers several practical advantages for engineering applications, including non-invasive monitoring that eliminates the need for disassembly, early fault detection that facilitates condition-based maintenance strategies, automated diagnosis that minimizes reliance on domain-specific expertise, and robust performance under noisy and variable operating conditions. Experimental results on the Gearbox fault diagnosis dataset demonstrate that MixNet outperforms existing deep learning models, achieving a significantly higher accuracy of 99.32% with a relatively fast training time of only 4 minutes and 29 seconds. The combination of high accuracy and computational efficiency renders the proposed method well-suited for deployment in real-time monitoring systems across manufacturing plants, power generation facilities, and automotive applications, where it has the potential to reduce maintenance costs by up to 30% and improve equipment availability by enabling the detection of incipient faults before catastrophic failures.
PMID:40623058 | DOI:10.1371/journal.pone.0324905
Air-ground collaborative multi-source orbital integrated detection system: Combining 3D imaging and intrusion recognition
PLoS One. 2025 Jul 7;20(7):e0326951. doi: 10.1371/journal.pone.0326951. eCollection 2025.
ABSTRACT
With the rapid expansion of railway networks globally, ensuring rail infrastructure safety through efficient detection methods has become critical. Traditional inspection systems face limitations in flexibility, adaptability to adverse weather, and multifunctional integration. This study proposes a ground-air collaborative multi-source detection system that integrates 3D light detection and ranging (LiDAR)-based point cloud imaging and deep learning-driven intrusion detection. The system employs a lightweight rail inspection vehicle equipped with dual LiDARs and an Astro camera, synchronized with an unmanned aerial vehicle (UAV) carrying industrial-grade LiDAR. We propose an improved LiDAR odometry and mapping with sliding window (LOAM-SLAM) algorithm enables real-time dynamic mapping, while an optimized iterative closest point (ICP) algorithm achieves high-precision point cloud registration and colorization. For intrusion detection, a You Only Look Once version 3 (YOLOv3)-ResNet fusion model achieves a recall rate of 0.97 and precision of 0.99. The system's innovative design and technical implementation offer significant improvements in railway track inspection efficiency and safety. This work establishes a new paradigm for adaptive railway maintenance in complex environments.
PMID:40623035 | DOI:10.1371/journal.pone.0326951
Saliency-enhanced infrared and visible image fusion via sub-window variance filter and weighted least squares optimization
PLoS One. 2025 Jul 7;20(7):e0323285. doi: 10.1371/journal.pone.0323285. eCollection 2025.
ABSTRACT
This paper proposes a novel method for infrared and visible image fusion (IVIF) to address the limitations of existing techniques in enhancing salient features and improving visual clarity. The method employs a sub-window variance filter (SVF) based decomposition technique to separate salient features and texture details into distinct band layers. A saliency map measurement scheme based on weighted least squares optimization (WLSO) is then designed to compute weight maps, enhancing the visibility of important features. Finally, pixel-level summation is used for feature map reconstruction, producing high-quality fused images. Experiments on three public datasets demonstrate that our method outperforms nine state-of-the-art fusion techniques in both qualitative and quantitative evaluations, particularly in salient target highlighting and texture detail preservation. Unlike deep learning-based approaches, our method does not require large-scale training datasets, reducing dependence on ground truth and avoiding fused image distortion. Limitations include potential challenges in handling highly complex scenes, which will be addressed in future work by exploring adaptive parameter optimization and integration with deep learning frameworks.
PMID:40622931 | DOI:10.1371/journal.pone.0323285
Video swin-CLSTM transformer: Enhancing human action recognition with optical flow and long-term dependencies
PLoS One. 2025 Jul 7;20(7):e0327717. doi: 10.1371/journal.pone.0327717. eCollection 2025.
ABSTRACT
As video data volumes soar exponentially, the significance of video content analysis, particularly Human Action Recognition (HAR), has become increasingly prominent in fields such as intelligent surveillance, sports analytics, medical rehabilitation, and virtual reality. However, current deep learning-based HAR methods encounter challenges in recognizing subtle actions within complex backgrounds, comprehending long-term semantics, and maintaining computational efficiency. To address these challenges, we introduce the Video Swin-CLSTM Transformer. Based on the Video Swin Transformer backbone, our model incorporates optical flow information at the input stage to effectively counteract background interference, employing a sparse sampling strategy. Combined with the backbone's 3D Patch Partition and Patch Merging techniques, it efficiently extracts and fuses multi-level features from both optical flow and raw RGB inputs, thereby enhancing the model's ability to capture motion characteristics in complex backgrounds. Additionally, by embedding Convolutional Long Short-Term Memory (ConvLSTM) units, the model's capacity to capture and understand long-term dependencies among key actions in videos is further enhanced. Experiments on the UCF-101 dataset demonstrate that our model achieves mean Top-1/Top-5 accuracies of 92.8% and 99.4%, which are 3.2% and 2.0% higher than those of the baseline model, while the computational cost is reduced by an average of 3.3% at peak performance compared to models without optical flow. Ablation studies further validate the effectiveness of our model's crucial components, with the integration of optical flow and the embedding of ConvLSTM modules yielding maximum improvements in mean Top-1 accuracy of 2.6% and 1.9%, respectively. Notably, employing our custom ImageNet-1K-LSTM pre-training model results in a maximum increase of 2.7% in mean Top-1 accuracy compared to traditional ImageNet-1K pre-training model. These experimental results indicate that our model offers certain advantages over other Swin Transformer-based methods for video HAR tasks.
PMID:40622917 | DOI:10.1371/journal.pone.0327717
FusionMVSA: Multi-View Fusion Strategy with Self-Attention for Enhancing Drug Recommendation
IEEE J Biomed Health Inform. 2025 Jul 7;PP. doi: 10.1109/JBHI.2025.3586758. Online ahead of print.
ABSTRACT
Leveraging the wealth of biomedical data available, we can derive insights into the relationships between biological entities from various angles. This underscores the complexity and significance of developing a dynamic approach for integrating data from multiple sources, a critical endeavor in drug recommendation. In this study, we introduce an innovative deep learning approach termed "Multi-View Fusion Strategy with Self-Attention" (FusionMVSA), designed to predict associations between drugs and diseases. To effectively amalgamate data from diverse sources and extract representative features, we have developed a feature extraction mechanism that capitalizes on similarities. This mechanism computes self-attention across multiple perspectives using shared group parameters, thereby highlighting common characteristics. Simultaneously, we utilize biomedical similarities among multi-source data as guiding factors for calculating similarity, enabling the capture of more nuanced features. Subsequently, we integrate these features through a feature fusion process, where known associations between drugs and diseases act as guiding terms. This strategy allows us to uncover the complementary aspects of different viewpoints. Ultimately, we predict potential drug-disease associations using a multi-layer perceptron neural network. Our methodology has undergone rigorous testing through various cross-validation experiments and case studies. We are confident that FusionMVSA will prove to be a valuable tool in drug recommendation, offering new avenues for exploration and discovery in the quest to combat diseases.
PMID:40622834 | DOI:10.1109/JBHI.2025.3586758