Deep learning
WSDC-ViT: a novel transformer network for pneumonia image classification based on windows scalable attention and dynamic rectified linear unit convolutional modules
Sci Rep. 2025 Jul 30;15(1):27868. doi: 10.1038/s41598-025-12117-0.
ABSTRACT
Accurate differential diagnosis of pneumonia remains a challenging task, as different types of pneumonia require distinct treatment strategies. Early and precise diagnosis is crucial for minimizing the risk of misdiagnosis and for effectively guiding clinical decision-making and monitoring treatment response. This study proposes the WSDC-ViT network to enhance computer-aided pneumonia detection and alleviate the diagnostic workload for radiologists. Unlike existing models such as Swin Transformer or CoAtNet, which primarily improve attention mechanisms through hierarchical designs or convolutional embedding, WSDC-ViT introduces a novel architecture that simultaneously enhances global and local feature extraction through a scalable self-attention mechanism and convolutional refinement. Specifically, the network integrates a scalable self-attention mechanism that decouples the query, key, and value dimensions to reduce computational overhead and improve contextual learning, while an interactive window-based attention module further strengthens long-range dependency modeling. Additionally, a convolution-based module equipped with a dynamic ReLU activation function is embedded within the transformer encoder to capture fine-grained local details and adaptively enhance feature expression. Experimental results demonstrate that the proposed method achieves an average classification accuracy of 95.13% and an F1-score of 95.63% on a chest X-ray dataset, along with 99.36% accuracy and a 99.34% F1-score on a CT dataset. These results highlight the model's superior performance compared to existing automated pneumonia classification approaches, underscoring its potential clinical applicability.
PMID:40739309 | DOI:10.1038/s41598-025-12117-0
Ensemble of deep learning and IoT technologies for improved safety in smart indoor activity monitoring for visually impaired individuals
Sci Rep. 2025 Jul 30;15(1):27863. doi: 10.1038/s41598-025-09716-2.
ABSTRACT
Old and vision-impaired indoor action monitoring utilizes sensor technology to observe movement and interaction in the living area. This model can recognize changes from regular patterns, deliver alerts, and ensure safety in case of any dangers or latent risks. These solutions improve quality of life by promoting independence while providing peace of mind to loved ones and caregivers. Visual impairment challenges daily independence, and deep learning (DL)-based Human Activity Recognition (HAR) enhances safe, real-time task performance for the visually impaired. For individuals with visual impairments, it enhances independence and safety in daily tasks while supporting caregivers with timely alerts and monitoring. This paper develops an Ensemble of Deep Learning for Enhanced Safety in Smart Indoor Activity Monitoring (EDLES-SIAM) technique for visually impaired people. The EDLES-SIAM technique is primarily designed to enhance indoor activity monitoring, ensuring the safety of visually impaired people in IoT technologies. Initially, the proposed EDLES-SIAM technique performs image pre-processing using adaptive bilateral filtering (ABF) to reduce noise and enhance sensor data quality. Furthermore, the ResNet50 model is employed for feature extraction to capture complex spatial patterns in visual data. For detecting indoor activities, an ensemble DL classifier contains three approaches: deep neural network (DNN), bidirectional long short-term memory (BiLSTM), and sparse stacked autoencoder (SSAE). A wide range of simulation analyses are implemented to ensure the enhanced performance of the EDLES-SIAM method under the fall detection dataset. The performance validation of the EDLES-SIAM method portrayed a superior [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] of 99.25%, 98.00%, 98.53%, and 98.23% over existing techniques in terms of dissimilar evaluation measures.
PMID:40739295 | DOI:10.1038/s41598-025-09716-2
Deep learning for tooth detection and segmentation in panoramic radiographs: a systematic review and meta-analysis
BMC Oral Health. 2025 Jul 30;25(1):1280. doi: 10.1186/s12903-025-06349-9.
ABSTRACT
BACKGROUND: This systematic review and meta-analysis aimed to summarize and evaluate the available information regarding the performance of deep learning methods for tooth detection and segmentation in orthopantomographies.
MATERIAL AND METHODS: Electronic databases (Medline, Embase and Cochrane) were searched up to September 2023 for relevant observational studies and both, randomized and controlled clinical trials. Two reviewers independently conducted the study selection, data extraction, and quality assessments. GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) assessment was adopted for collective grading of the overall body of evidence. From the 2,207 records identified, 20 studies were included in the analysis. Meta-analysis was conducted for the comparison of mesiodens detection and segmentation (n = 6) using sensitivity and specificity as the two main diagnostic parameters. A graphical summary of the analysis was also plotted and a Hierarchical Summary Receiver Operating Characteristic curve, prediction region, summary point, and confidence region were illustrated.
RESULTS: The included studies quantitative analysis showed pooled sensitivity, specificity, positive LR, negative LR, and diagnostic odds ratio of 0.92 (95% confidence interval [CI], 0.84-0.96), 0.94 (95% CI, 0.89-0.97), 15.7 (95% CI, 7.6-32.2), 0.08 (95% CI, 0.04-0.18), and 186 (95% CI, 44-793), respectively. A graphical summary of the meta-analysis was plotted based on sensitivity and specificity. Hierarchical Summary Receiver Operating Characteristic curves showed a positive correlation between logit-transformed sensitivity and specificity (r = 0.886).
CONCLUSIONS: Based on the results of the meta-analysis and GRADE assessment, a moderate recommendation is advised to dental operators when relying on AI-based tools for tooth detection and segmentation in panoramic radiographs.
PMID:40739210 | DOI:10.1186/s12903-025-06349-9
A privacy preserving machine learning framework for medical image analysis using quantized fully connected neural networks with TFHE based inference
Sci Rep. 2025 Jul 30;15(1):27880. doi: 10.1038/s41598-025-07622-1.
ABSTRACT
Medical image analysis using deep learning algorithms has become a basis of modern healthcare, enabling early detection, diagnosis, treatment planning, and disease monitoring. However, sharing sensitive raw medical data with third parties for analysis raises significant privacy concerns. This paper presents a privacy-preserving machine learning (PPML) framework using a Fully Connected Neural Network (FCNN) for secure medical image analysis using the MedMNIST dataset. The proposed PPML framework leverages a torus-based fully homomorphic encryption (TFHE) to ensure data privacy during inference, maintain patient confidentiality, and ensure compliance with privacy regulations. The FCNN model is trained in a plaintext environment for FHE compatibility using Quantization-Aware Training to optimize weights and activations. The quantized FCNN model is then validated under FHE constraints through simulation and compiled into an FHE-compatible circuit for encrypted inference on sensitive data. The proposed framework is evaluated on the MedMNIST datasets to assess its accuracy and inference time in both plaintext and encrypted environments. Experimental results reveal that the PPML framework achieves a prediction accuracy of 88.2% in the plaintext setting and 87.5% during encrypted inference, with an average inference time of 150 milliseconds per image. This shows that FCNN models paired with TFHE-based encryption achieve high prediction accuracy on MedMNIST datasets with minimal performance degradation compared to unencrypted inference.
PMID:40739149 | DOI:10.1038/s41598-025-07622-1
Refined prognostication of pathological complete response in breast cancer using radiomic features and optimized InceptionV3 with DCE-MRI
Sci Rep. 2025 Jul 30;15(1):27844. doi: 10.1038/s41598-025-08565-3.
ABSTRACT
BACKGROUND: Neoadjuvant therapy plays a pivotal role in breast cancer treatment, particularly for patients aiming to conserve their breast by reducing tumor size pre-surgery. The ultimate goal of this treatment is achieving a pathologic complete response (pCR), which signifies the complete eradication of cancer cells, thereby lowering the likelihood of recurrence. This study introduces a novel predictive approach to identify patients likely to achieve pCR using radiomic features extracted from MR images, enhanced by the InceptionV3 model and cutting-edge validation methodologies.
METHODS: In our study, we gathered data from 255 unique Patient IDs sourced from the -SPY 2 MRI database with the goal of classifying pCR (pathological complete response). Our research introduced two key areas of novelty.Firstly, we explored the extraction of advanced features from the dcom series such as Area, Perimeter, Entropy, Intensity of the places where the intensity is more than the average intensity of the image. These features provided deeper insights into the characteristics of the MRI data and enhanced the discriminative power of our classification model.Secondly, we applied these extracted features along with combine pixel array of the dcom series of each patient to the numerous deep learning model along with InceptionV3 (GoogleNet) model which provides the best accuracy. To optimize the model's performance, we experimented with different combinations of loss functions, optimizer functions, and activation functions. Lastly, our classification results were subjected to validation using accuracy, AUC, Sensitivity, Specificity and F1 Score. These evaluation metrics provided a robust assessment of the model's performance and ensured the reliability of our findings.
RESULTS: The successful combination of advanced feature extraction, utilization of the InceptionV3 model with tailored hyperparameters, and thorough validation using cutting-edge techniques significantly enhanced the accuracy and reliability of our pCR classification study. By adopting a collaborative approach that involved both radiologists and the computer-aided system, we achieved superior predictive performance for pCR, as evidenced by the impressive values obtained for the area under the curve (AUC) at 0.91 having an accuracy of .92.
CONCLUSION: Overall, the combination of advanced feature extraction, leveraging the InceptionV3 model with customized hyperparameters, and rigorous validation using state-of-the-art techniques contributed to the accuracy and credibility of our pCR classification study.
PMID:40739101 | DOI:10.1038/s41598-025-08565-3
Classification of Brain Tumors in MRI Images with Brain-CNXSAMNet: Integrating Hybrid ConvNeXt and Spatial Attention Module Networks
Interdiscip Sci. 2025 Jul 30. doi: 10.1007/s12539-025-00743-1. Online ahead of print.
ABSTRACT
Brain tumors (BT) can cause fatal outcomes by affecting body functions, making precise early detection via magnetic resonance imaging (MRI) examinations critical. The complex variations found in cells of BT may pose challenges in identifying the type of tumor and selecting the most suitable treatment strategy, potentially resulting in different assessments by doctors. As a result, in recent years, AI-powered diagnostic systems have been created to accurately and efficiently identify different types of BT using MRI images. Notably, state-of-the-art deep learning architectures, which have demonstrated efficacy in diverse domains, are now being employed effectively for classifying of brain MRI images. This research presents a hybrid model that integrates spatial attention mechanism (SAM) with ConvNeXt to classify three types of BT: meningioma, pituitary, and glioma. The hybrid model integrates ConvNeXt to enhance the receptive field, capturing information from a broader spatial context, crucial for recognizing tumor patterns spanning multiple pixels. SAM is applied after ConvNeXt, enabling the network to selectively focus on informative regions, thereby improving the model's ability to distinguish BT types and capture complex spatial relationships. Tested on BSF and Figshare datasets, the proposed model achieves a remarkable accuracy of 99.39% and 98.86%, respectively, outperforming the results of recent studies by achieving these results in fewer training periods. This hybrid model marks a major step forward in the automatic classification of BT, demonstrating superior performance in accuracy with efficient training.
PMID:40739060 | DOI:10.1007/s12539-025-00743-1
Trabecular bone analysis: ultra-high-resolution CT goes far beyond high-resolution CT and gets closer to micro-CT (a study using Canon Medical CT devices)
Skeletal Radiol. 2025 Jul 30. doi: 10.1007/s00256-025-05001-5. Online ahead of print.
ABSTRACT
OBJECTIVE: High-resolution CT (HR-CT) cannot image trabecular bone due to insufficient spatial resolution. Ultra-high-resolution CT may be a valuable alternative. We aimed to describe the accuracy of Canon Medical HR, super-high-resolution (SHR), and ultra-high-resolution (UHR)-CT in measuring trabecular bone microarchitectural parameters using micro-CT as a reference.
MATERIAL AND METHODS: Sixteen cadaveric distal tibial epiphyses were enrolled in this pre-clinical study. Images were acquired with HR-CT (i.e., 0.5 mm slice thickness/5122 matrix) and SHR-CT (i.e., 0.25 mm slice thickness and 10242 matrix) with and without deep learning reconstruction (DLR) and UHR-CT (i.e., 0.25 mm slice thickness/20482 matrix) without DLR. Trabecular bone parameters were compared.
RESULTS: Trabecular thickness was closest with UHR-CT but remained 1.37 times that of micro-CT (P < 0.001). With SHR-CT without and with DLR, it was 1.75 and 1.79 times that of micro-CT, respectively (P < 0.001), and 3.58 and 3.68 times that of micro-CT with HR-CT without and with DLR, respectively (P < 0.001). Trabecular separation was 0.7 times that of micro-CT with UHR-CT (P < 0.001), 0.93 and 0.94 times that of micro-CT with SHR-CT without and with DLR (P = 0.36 and 0.79, respectively), and 1.52 and 1.36 times that of micro-CT with HR-CT without and with DLR (P < 0.001). Bone volume/total volume was overestimated (i.e., 1.66 to 1.92 times that of micro-CT) by all techniques (P < 0.001). However, HR-CT values were superior to UHR-CT values (P = 0.03 and 0.01, without and with DLR, respectively).
CONCLUSION: UHR and SHR-CT were the closest techniques to micro-CT and surpassed HR-CT.
PMID:40738977 | DOI:10.1007/s00256-025-05001-5
Compressive strength modelling of cenosphere and copper slag-based geopolymer concrete using deep learning model
Sci Rep. 2025 Jul 30;15(1):27849. doi: 10.1038/s41598-025-13176-z.
ABSTRACT
Geopolymer concrete (GPC) is an eco-friendly alternative for conventional concrete. It exploits industrial by-products in production to reduce the environmental impact and improve sustainability. This study focuses on envisaging the 28-day compressive strength of cenosphere-based geopolymer concrete incorporating copper slag using Artificial Neural Networks (ANN). The assimilation of ANN models in predicting the compressive strength of cenosphere-based geopolymer concrete with copper slag offers a promising approach to sustainable construction. By precisely forecasting the compressive strength of concrete based on the ingredient proportions, these models can rationalise the design process. The test results signposted that the developed model gives higher accuracy (> 98.6%), capability and flexibility in predicting the compressive strength of geo-polymer concrete incorporated with cenosphere and copper slag.
PMID:40738956 | DOI:10.1038/s41598-025-13176-z
A hybrid deep learning model for sentiment analysis of COVID-19 tweets with class balancing
Sci Rep. 2025 Jul 30;15(1):27788. doi: 10.1038/s41598-025-97778-7.
ABSTRACT
The widespread dissemination of misinformation and the diverse public sentiment observed during the COVID-19 pandemic highlight the necessity for accurate sentiment analysis of social media discourse. This study proposes a hybrid deep learning (DL) model that integrates Bidirectional Encoder Representations from Transformers (BERT) for contextual feature extraction with Long Short-Term Memory (LSTM) networks for sequential learning to classify COVID-19-related sentiments. To enhance data quality, advanced text preprocessing techniques, including Unicode normalization, contraction expansion, and emoji conversion, are applied. Additionally, to mitigate class imbalance, Random OverSampling (ROS) is employed, leading to significant improvements in model performance. Before applying ROS, the model exhibited lower accuracy and inconsistent performance across sentiment categories. After balancing the dataset, accuracy for binary classification increased to 92.10%, with corresponding precision, sensitivity, and specificity of 92.10%, 92.10%, and 91.50%, respectively. For three-class sentiment classification, accuracy improved to 89.47%, with precision, sensitivity, and specificity of 89.80%, 89.47%, and 94.10%, respectively. In five-class sentiment classification, accuracy reached 81.78%, with precision, sensitivity, and specificity of 82.19%, 81.78%, and 95.28%, respectively. These findings demonstrate the efficacy of combining deep learning-based sentiment analysis with advanced text preprocessing and class balancing techniques for accurately classifying public sentiment related to COVID-19 across multiple sentiment categories.
PMID:40738947 | DOI:10.1038/s41598-025-97778-7
Optimizing Thyroid Nodule Management With Artificial Intelligence: Multicenter Retrospective Study on Reducing Unnecessary Fine Needle Aspirations
JMIR Med Inform. 2025 Jul 30;13:e71740. doi: 10.2196/71740.
ABSTRACT
BACKGROUND: Most artificial intelligence (AI) models for thyroid nodules are designed to screen for malignancy to guide further interventions; however, these models have not yet been fully implemented in clinical practice.
OBJECTIVE: This study aimed to evaluate AI in real clinical settings for identifying potentially benign thyroid nodules initially deemed to be at risk for malignancy by radiologists, reducing unnecessary fine needle aspiration (FNA) and optimizing management.
METHODS: We retrospectively collected a validation cohort of thyroid nodules that had undergone FNA. These nodules were initially assessed as "suspicious for malignancy" by radiologists based on ultrasound features, following standard clinical practice, which prompted further FNA procedures. Ultrasound images of these nodules were re-evaluated using a deep learning-based AI system, and its diagnostic performance was assessed in terms of correct identification of benign nodules and error identification of malignant nodules. Performance metrics such as sensitivity, specificity, and the area under the receiver operating characteristic curve were calculated. In addition, a separate comparison cohort was retrospectively assembled to compare the AI system's ability to correctly identify benign thyroid nodules with that of radiologists.
RESULTS: The validation cohort comprised 4572 thyroid nodules (benign: n=3134, 68.5%; malignant: n=1438, 31.5%). AI correctly identified 2719 (86.8% among benign nodules) and reduced unnecessary FNAs from 68.5% (3134/4572) to 9.1% (415/4572). However, 123 malignant nodules (8.6% of malignant cases) were mistakenly identified as benign, with the majority of these being of low or intermediate suspicion. In the comparison cohort, AI successfully identified 81.4% (96/118) of benign nodules. It outperformed junior and senior radiologists, who identified only 40% and 55%, respectively. The area under the curve (AUC) for the AI model was 0.88 (95% CI 0.85-0.91), demonstrating a superior AUC compared with that of the junior radiologists (AUC=0.43, 95% CI 0.36-0.50; P=.002) and senior radiologists (AUC=0.63, 95% CI 0.55-0.70; P=.003).
CONCLUSIONS: Compared with radiologists, AI can better serve as a "goalkeeper" in reducing unnecessary FNAs by identifying benign nodules that are initially assessed as malignant by radiologists. However, active surveillance is still necessary for all these nodules since a very small number of low-aggressiveness malignant nodules may be mistakenly identified.
PMID:40737551 | DOI:10.2196/71740
"Digital Clinicians" Performing Obesity Medication Self-Injection Education: Feasibility Randomized Controlled Trial
JMIR Diabetes. 2025 Jul 30;10:e63503. doi: 10.2196/63503.
ABSTRACT
BACKGROUND: Artificial intelligence (AI) chatbots have shown competency in a range of areas, including clinical note taking, diagnosis, research, and emotional support. An obesity epidemic, alongside a growth in novel injectable pharmacological solutions, has put a strain on limited resources.
OBJECTIVE: This study aimed to investigate the use of a chatbot integrated with a digital avatar to create a "digital clinician." This was used to provide mandatory patient education for those beginning semaglutide once-weekly self-administered injections for the treatment of overweight and obesity at a national center.
METHODS: A "digital clinician" with facial and vocal recognition technology was generated with a bespoke 10- to 15-minute clinician-validated tutorial. A feasibility randomized controlled noninferiority trial compared knowledge test scores, self-efficacy, consultation satisfaction, and trust levels between those using the AI-powered clinician avatar onsite and those receiving conventional semaglutide education from nursing staff. Attitudes were recorded immediately after the intervention and again at 2 weeks after the education session.
RESULTS: A total of 43 participants were recruited, 27 to the intervention group and 16 to the control group. Patients in the "digital clinician" group were significantly more knowledgeable postconsultation (median 10, IQR 10-11 vs median 8, IQR 7-9.3; P<.001). Patients in the control group were more satisfied with their consultation (median 7, IQR 6-7 vs median 7, IQR 7-7; P<.001) and had more trust in their education provider (median 7, IQR 4.8-7 vs median 7, IQR 7-7; P<.001). There was no significant difference in reported levels of self-efficacy (P=.57). 81% (22/27) participants in the intervention group said they would use the resource in their own time.
CONCLUSIONS: Bespoke AI chatbots integrated with digital avatars to create a "digital clinician" may perform health care education in a clinical environment. They can ensure higher levels of knowledge transfer yet are not as trusted as their human counterparts. "Digital clinicians" may have the potential to aid the redistribution of resources, alleviating pressure on bariatric services and health care systems, the extent to which remains to be determined in future studies.
PMID:40737494 | DOI:10.2196/63503
Machine learning approaches for predicting the link of the global trade network of liquefied natural gas
PLoS One. 2025 Jul 30;20(7):e0326952. doi: 10.1371/journal.pone.0326952. eCollection 2025.
ABSTRACT
With the rising geopolitical tensions, predicting future trade partners has become a critical topic for the global community. Liquefied natural gas (LNG), recognized as the cleanest burning hydrocarbon, plays a significant role in the transition to a cleaner energy future. As international trade in LNG becomes increasingly volatile, it is essential to assist governments in identifying potential trade partners and analyzing the trade network. Traditionally, forecasts of future mineral and energy resource trade networks have relied on similarity indicators (e.g., CN, AA). This study employs complex network theory to illustrate the characteristics of nodes and edges, as well as the evolution of global LNG trade networks from 2001 to 2020. Utilizing node and edge data from these networks, this research applies machine learning algorithms to predict future links based on local and global similarity-based indices (e.g., CN, JA, PA). The findings indicate that random forest and decision tree algorithms, when used with local similarity-based indices, demonstrate strong predictive performance. The reliability of these algorithms is validated through the Receiver Operating Characteristic Curve (ROC). Additionally, a graph attention network model is developed to predict potential links using edge and motif data. The results indicate robust predictive performance. This study demonstrates that machine learning algorithms-specifically random forest and decision tree-outperform in predicting links within the global LNG trade network based on local information proximity, while the graph attention network, a deep learning model, exhibits stable optimization and effective feature learning. These findings suggest that machine learning approaches hold significant promise for mineral trade network analysis.
PMID:40737339 | DOI:10.1371/journal.pone.0326952
Low-cost computation for isolated sign language video recognition with multiple reservoir computing
PLoS One. 2025 Jul 30;20(7):e0322717. doi: 10.1371/journal.pone.0322717. eCollection 2025.
ABSTRACT
Sign language recognition (SLR) has the potential to bridge communication gaps and empower hearing-impaired communities. To ensure the portability and accessibility of the SLR system, its implementation on a portable, server-independent device becomes imperative. This approach facilitates usage in areas without internet connectivity, addressing the need for data privacy protection. Although deep neural network models are potent, their efficacy is hindered by computational constraints on edge devices. This study delves into reservoir computing (RC), which is renowned for its edge-friendly characteristics. Through leveraging RC, our objective is to craft a cost-effective SLR system optimized for operation on edge devices with limited resources. To enhance the recognition capabilities of RC, we introduce multiple reservoirs with distinct leak rates, extracting diverse features from input videos. Prior to feeding sign language videos into the RC, we employ preprocessing via MediaPipe. This step involves extracting the coordinates of the signer's body and hand locations, referred to as keypoints, and normalizing their spatial positions. This combined approach, which incorporates keypoint extraction via MediaPipe and normalization during preprocessing, enhances the SLR system's robustness against complex background effects and varying signer positions. Experimental results demonstrate that the integration of MediaPipe and multiple reservoirs yields competitive outcomes compared with deep recurrent neural and echo state networks and promises significantly lower training times. Our proposed MRC achieved accuracies of 60.35%, 84.65%, and 91.51% for the top-1, top-5, and top-10, respectively, on the WLASL100 dataset, outperforming the deep learning-based approaches Pose-TGCN and Pose-GRU. Furthermore, because of the RC characteristics, the training time was shortened to 52.7 s, compared with 20 h for I3D and the competitive inference time.
PMID:40737309 | DOI:10.1371/journal.pone.0322717
Divide-and-conquer routing for learning heterogeneous individualized capsules
PLoS One. 2025 Jul 30;20(7):e0329202. doi: 10.1371/journal.pone.0329202. eCollection 2025.
ABSTRACT
Capsule Networks (CapsNets) have demonstrated an enhanced ability to capture spatial relationships and preserve hierarchical feature representations compared to Convolutional Neural Networks (CNNs). However, the dynamic routing mechanism in CapsNets introduces substantial computational costs and limits scalability. In this paper, we propose a divide-and-conquer routing algorithm that groups primary capsules, enabling the model to leverage independent feature subspaces for more precise and efficient feature learning. By partitioning the primary capsules, the initialization of coupling coefficients is aligned with the hierarchical structure of the capsules, addressing the limitations of existing initialization strategies that either disrupt feature aggregation or lead to excessively small activation values. Additionally, the grouped routing mechanism simplifies the iterative process, reducing computational overhead and improving scalability. Extensive experiments on benchmark image classification datasets demonstrate that our approach consistently outperforms the original dynamic routing algorithm as well as other state-of-the-art routing strategies, resulting in improved feature learning and classification accuracy. Our code is available at: https://github.com/rqfzpy/DC-CapsNet.
PMID:40737290 | DOI:10.1371/journal.pone.0329202
Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence
PLoS One. 2025 Jul 30;20(7):e0326936. doi: 10.1371/journal.pone.0326936. eCollection 2025.
ABSTRACT
This paper presents the findings of the research aimed at investigating the influence of visual content, posted on social media in shaping users' sentiments towards specific sociopolitical events. The study analyzed various sociopolitical topics by examining posts containing relevant hashtags and keywords, along with their associated images and comments. Using advanced machine learning and deep learning methods for sentiment analysis, textual data were classified to determine the expressed sentiments. Additionally, the correlation between posted visual content and user sentiments has been studied. A particular emphasis was placed on understanding how these visuals impact users' attitudes toward the events. The research resulted in a comprehensive dataset comprising labeled images and their comments, offering valuable insights into the dynamics of public opinion formation through social media. This study investigates the influence of social media images on user sentiment toward sociopolitical events using deep learning-based sentiment analysis. By analyzing posts from movements such as Black Lives Matter, Women's March, Climate Change Protests, and Anti-war Demonstrations, we identified a strong correlation between visual content and public sentiment. Our results reveal that Anti-war Demonstrations exhibit the highest correlation (PLCC: 0.709, SROCC: 0.723), while Climate Change Protests display the lowest alignment (PLCC: 0.531, SROCC: 0.611). Overall, the study finds a consistent positive correlation (PLCC range: 0.615-0.709, SROCC: 0.611-0.723) across movements, indicating the significant role of visual content in shaping the public opinion.
PMID:40737276 | DOI:10.1371/journal.pone.0326936
Contour Flow Constraint: Preserving Global Shape Similarity for Deep Learning based Image Segmentation
IEEE Trans Image Process. 2025 Jul 30;PP. doi: 10.1109/TIP.2025.3592545. Online ahead of print.
ABSTRACT
For effective image segmentation, it is crucial to employ constraints informed by prior knowledge about the characteristics of the areas to be segmented to yield favorable segmentation outcomes. However, the existing methods have primarily focused on priors of specific properties or shapes, lacking consideration of the general global shape similarity from a Contour Flow perspective. Furthermore, naturally integrating this contour flow prior image segmentation model into the activation functions of deep convolutional networks through mathematical methods is currently unexplored. In this paper, we establish a concept of global shape similarity based on the premise that two shapes exhibit comparable contours. Furthermore, we mathematically derive a contour flow constraint that ensures the preservation of global shape similarity. We propose two implementations to integrate the constraint with deep neural networks. Firstly, the constraint is converted to a shape loss, which can be seamlessly incorporated into the training phase for any learning-based segmentation framework. Secondly, we add the constraint into a variational segmentation model and derive its iterative schemes for solution. The scheme is then unrolled to get the architecture of the proposed CFSSnet. Validation experiments on diverse datasets are conducted on classic benchmark deep network segmentation models. The results indicate a great improvement in segmentation accuracy and shape similarity for the proposed shape loss, showcasing the general adaptability of the proposed loss term regardless of specific network architectures. CFSSnet shows robustness in segmenting noise-contaminated images, and inherent capability to preserve global shape similarity.
PMID:40737153 | DOI:10.1109/TIP.2025.3592545
Dynamic Personalized Federated Learning for Cross-spectral Palmprint Recognition
IEEE Trans Image Process. 2025 Jul 30;PP. doi: 10.1109/TIP.2025.3592508. Online ahead of print.
ABSTRACT
Palmprint recognition has recently garnered attention due to its high accuracy, strong robustness, and high security. Existing deep learning-based palmprint recognition methods usually require large amounts of data for centralized training, facing the challenge of privacy disclosure. In addition, the non-independent and identically distributed (non-IID) issue in the multi-spectral palmprint images generally leads to the degradation of recognition performance. To tackle these problems, this paper proposes a dynamic personalized federated learning model for cross-spectral palmprint recognition, called DPFed-Palm. Specifically, for each client's local training, we present a new combination of loss functions to enforce the constraints of local models and effectively enhance the feature representation capability of models. Subsequently, DPFed-Palm aggregates the above-trained local models by using the combined aggregation strategies of the Federated Averaging (FedAvg) and Personalized Federated Learning (PFL) to obtain the best personalized global model of each client. For the selection of the best personalized global model, we develop a dynamic weight selection strategy to obtain the optimal weights of the local and global models by cross-spectral (cross-client) testing. Extensive experimental results on three public PolyU multispectral, IITD, and CASIA datasets show that the proposed method outperforms the existing techniques in privacy-preserving and recognition performance.
PMID:40737152 | DOI:10.1109/TIP.2025.3592508
Incremental Learning-Enabled Fault Diagnosis of Dynamic Systems: A Comprehensive Review
IEEE Trans Cybern. 2025 Jul 30;PP. doi: 10.1109/TCYB.2025.3586643. Online ahead of print.
ABSTRACT
Effective fault diagnosis is crucial for maintaining the reliability and safety of industrial systems. Incremental learning, which enables models to continuously update and adapt to new data or emerging fault classes without complete retraining, has recently gained attention as a promising solution for addressing nonstationary data streams in fault diagnosis applications. Nevertheless, most existing review articles on fault diagnosis adopt a broad perspective, primarily discussing general techniques such as deep learning and transfer learning, without providing a dedicated focus on incremental learning strategies. To the best of our knowledge, it is the first review focusing specifically on incremental learning-enabled fault diagnosis methods. In this work, state-of-the-art incremental learning-enabled fault diagnosis are systematically reviewed. These methods are categorized into distinct groups based on their incremental learning strategies and application contexts. In addition, major challenges associated with applying incremental learning to fault diagnosis, including concept drift and catastrophic forgetting, are discussed, along with emerging solutions proposed to address these issues. A novel taxonomy and perspective on incremental learning-enabled fault diagnosis approaches is presented, providing a timely and comprehensive reference for researchers and practitioners in this evolving field.
PMID:40737141 | DOI:10.1109/TCYB.2025.3586643
HybridKla: a hybrid deep learning framework for lactylation site prediction
Brief Bioinform. 2025 Jul 2;26(4):bbaf375. doi: 10.1093/bib/bbaf375.
ABSTRACT
Lysine lactylation (Kla), a novel lactate-derived post-translational modification, is involved in a myriad of biological processes and complex diseases. While several computational methods have been developed to identify Kla sites, these approaches still suffer from small datasets. In this work, we collected 23 984 Kla sites in 7297 proteins from the literature to construct the benchmark dataset. Leveraging recent advances in feature encoding, we tailored a multi-feature hybrid system, which integrated eight complementary feature-encoding strategies derived from two automated encoders and a composition-based module. Combining the hybrid system with deep learning, we presented our newly designed predictor named HybridKla, achieving an area under the curve (AUC) value of 0.8460. Compared to existing tools, HybridKla achieved >28.90% improvement of the AUC value (0.8460 versus 0.6563). we also conducted a proteome-wide search and provided a systematic prediction of Kla sites. The friendly online service of HybridKla is freely accessible for academic research at http://transkla.zzu.edu.cn/.
PMID:40736746 | DOI:10.1093/bib/bbaf375
An optimized multi-scale dilated attention layer for keratoconus disease classification
Int Ophthalmol. 2025 Jul 30;45(1):318. doi: 10.1007/s10792-025-03688-y.
ABSTRACT
INTRODUCTION: Keratoconus (KCN) is a progressive and non-inflammatory corneal disorder characterized by thinning and conical deformation of the cornea, resulting in visual impairment. Early and accurate detection is crucial to prevent disease progression. Conventional diagnostic methods are time-consuming and depend on expert evaluation. This study introduces an advanced deep learning (DL) model aimed at automating KCN detection using corneal topography images.
MATERIALS AND METHODS: The proposed model, Optimized MSDALNet, integrates a Multi-Scale Dilated Attention Layer (MSDAL) to capture local and global corneal features at varying spatial resolutions. Training is optimized using Arctic Puffin Optimization (APO), a metaheuristic algorithm inspired by puffin foraging behavior. The model includes Explainable AI (XAI) capabilities using Grad-CAM for visual interpretability. Experiments were conducted using a public KCN dataset with over 1,100 labeled corneal topography images categorized into Normal, Suspect, and KCN classes. Standard pre-processing, data augmentation, and performance evaluation metrics (accuracy, precision, recall, specificity, FNR, MCC, AUC) were applied.
RESULTS: The Optimized MSDALNet achieved superior classification performance with an accuracy of 99.5%, precision of 99.4%, and specificity of 98.4%. The proposed model outperformed existing methods such as CNN, ViT, and Swin Transformer in terms of accuracy, computational cost (1.2 GFLOPs), and inference speed (8.4 ms/image). Grad-CAM visualization confirmed the model's focus on clinically relevant corneal regions. An ablation study demonstrated the impact of each component in the proposed framework.
CONCLUSION: The Optimized MSDALNet combined with APO delivers an effective and interpretable solution for KCN detection. The model excels in feature extraction, computational efficiency, and clinical transparency. Limitations include dataset size and lack of multimodal inputs. Future work will focus on incorporating diverse datasets and additional patient data to enhance generalizability and diagnostic robustness.
PMID:40736610 | DOI:10.1007/s10792-025-03688-y