Deep learning
Diagnosis accuracy of machine learning for idiopathic pulmonary fibrosis: a systematic review and meta-analysis
Eur J Med Res. 2025 Apr 15;30(1):288. doi: 10.1186/s40001-025-02501-x.
ABSTRACT
BACKGROUND: The diagnosis of idiopathic pulmonary fibrosis (IPF) is complex, which requires lung biopsy, if necessary, and multidisciplinary discussions with specialists. Clinical diagnosis of the two ailments is particularly challenging due to the impact of interobserver variability. Several studies have endeavored to utilize image-based machine learning to diagnose IPF and its subtype of usual interstitial pneumonia (UIP). However, the diagnostic accuracy of this approach lacks evidence-based support.
OBJECTIVE: We conducted a systematic review and meta-analysis to explore the diagnostic efficiency of image-based machine learning (ML) for IPF.
DATA SOURCES AND METHODS: We comprehensively searched PubMed, Cochrane, Embase, and Web of Science databases up to August 24, 2024. During the meta-analysis, we carried out subgroup analyses by imaging source (computed radiography/computed tomography) and modeling type (deep learning/other) to evaluate its diagnostic performance for IPF.
RESULTS: The meta-analysis findings indicated that in the diagnosis of IPF, the C-index, sensitivity, and specificity of ML were 0.93 (95% CI 0.89-0.97), 0.79 (95% CI 0.73-0.83), and 0.84 (95% CI 0.79-0.88), respectively. The sensitivity of radiologists/clinicians in diagnosing IPF was 0.69 (95% CI 0.56-0.79), with a specificity of 0.93 (95% CI 0.74-0.98). For UIP diagnosis, the C-index of ML was 0.91 (95% CI 0.87-0.94), with a sensitivity of 0.92 (95% CI 0.80-0.97) and a specificity of 0.92 (95%CI 0.82-0.97). In contrast, the sensitivity of radiologists/clinicians in diagnosing UIP was 0.69 (95% CI 0.50-0.84), with a specificity of 0.90 (95% CI 0.82-0.94).
CONCLUSIONS: Image-based machine learning techniques demonstrate robust data processing and recognition capabilities, providing strong support for accurate diagnosis of idiopathic pulmonary fibrosis and usual interstitial pneumonia. Future multicenter large-scale studies are warranted to develop more intelligent evaluation tools to further enhance clinical diagnostic efficiency. Trial registration This study protocol was registered with PROSPERO (CRD42022383162).
PMID:40235000 | DOI:10.1186/s40001-025-02501-x
Prediction of postoperative intensive care unit admission with artificial intelligence models in non-small cell lung carcinoma
Eur J Med Res. 2025 Apr 15;30(1):293. doi: 10.1186/s40001-025-02553-z.
ABSTRACT
BACKGROUND: There is no standard practice for intensive care admission after non-small cell lung cancer surgery. In this study, we aimed to determine the need for intensive care admission after non-small cell lung cancer surgery with deep learning models.
METHODS: The data of 953 patients who were operated for non-small cell lung cancer between January 2001 and 2023 was analyzed. Clinical, laboratory, respiratory, tumor's radiological and surgical features were included as input data in the study. The outcome data was intensive care unit admission. Deep learning was performed with the Fully Connected Neural Network algorithm and k-fold cross validation method.
RESULTS: The training accuracy value was 92.0%, the training F1 1 score of the algorithm was 86.7%, the training F1 0 value was 94.2%, and the training F1 average score was 90.5%. The test sensitivity value of the algorithm was 67.7%, the test positive predictive value was 84.0%, and the test accuracy value was 85.3%. Test F1 1 score was 75.0%, test F1 0 score was 89.5%, and test F1 average score was 82.3%. The AUC in the ROC curve created for the success analysis of the algorithm's test data was 0.83.
CONCLUSIONS: Using our method deep learning models predicted the need for intensive care unit admission with high success and confidence values. The use of artificial intelligence algorithms for the necessity of intensive care hospitalization will ensure that postoperative processes are carried out safely using objective decision mechanisms.
PMID:40234958 | DOI:10.1186/s40001-025-02553-z
CRISP: A causal relationships-guided deep learning framework for advanced ICU mortality prediction
BMC Med Inform Decis Mak. 2025 Apr 15;25(1):165. doi: 10.1186/s12911-025-02981-1.
ABSTRACT
BACKGROUND: Mortality prediction is critical in clinical care, particularly in intensive care units (ICUs), where early identification of high-risk patients can inform treatment decisions. While deep learning (DL) models have demonstrated significant potential in this task, most suffer from limited generalizability, which hinders their widespread clinical application. Additionally, the class imbalance in electronic health records (EHRs) complicates model training. This study aims to develop a causally-informed prediction model that incorporates underlying causal relationships to mitigate class imbalance, enabling more stable mortality predictions.
METHODS: This study introduces the CRISP model (Causal Relationship Informed Superior Prediction), which leverages native counterfactuals to augment the minority class and constructs patient representations by incorporating causal structures to enhance mortality prediction. Patient data were obtained from the public MIMIC-III and MIMIC-IV databases, as well as an additional dataset from the West China Hospital of Sichuan University (WCHSU).
RESULTS: A total of 69,190 ICU cases were included, with 30,844 cases from MIMIC-III, 27,362 cases from MIMIC-IV, and 10,984 cases from WCHSU. The CRISP model demonstrated stable performance in mortality prediction across the 3 datasets, achieving AUROC (0.9042-0.9480) and AUPRC (0.4771-0.7611). CRISP's data augmentation module showed predictive performance comparable to commonly used interpolation-based oversampling techniques.
CONCLUSION: CRISP achieves better generalizability across different patient groups, compared to various baseline algorithms, thereby enhancing the practical application of DL in clinical decision support.
TRIAL REGISTRATION: Trial registration information for the WCHSU data is available on the Chinese Clinical Trial Registry website ( http://www.chictr.org.cn ), with the registration number ChiCTR1900025160. The recruitment period for the data was from August 5, 2019, to August 31, 2021.
PMID:40234903 | DOI:10.1186/s12911-025-02981-1
PFLO: a high-throughput pose estimation model for field maize based on YOLO architecture
Plant Methods. 2025 Apr 15;21(1):51. doi: 10.1186/s13007-025-01369-6.
ABSTRACT
Posture is a critical phenotypic trait that reflects crop growth and serves as an essential indicator for both agricultural production and scientific research. Accurate pose estimation enables real-time tracking of crop growth processes, but in field environments, challenges such as variable backgrounds, dense planting, occlusions, and morphological changes hinder precise posture analysis. To address these challenges, we propose PFLO (Pose Estimation Model of Field Maize Based on YOLO Architecture), an end-to-end model for maize pose estimation, coupled with a novel data processing method to generate bounding boxes and pose skeleton data from a"keypoint-line"annotated phenotypic database which could mitigate the effects of uneven manual annotations and biases. PFLO also incorporates advanced architectural enhancements to optimize feature extraction and selection, enabling robust performance in complex conditions such as dense arrangements and severe occlusions. On a fivefold validation set of 1,862 images, PFLO achieved 72.2% pose estimation mean average precision (mAP50) and 91.6% object detection mean average precision (mAP50), outperforming current state-of-the-art models. The model demonstrates improved detection of occluded, edge, and small targets, accurately reconstructing skeletal poses of maize crops. PFLO provides a powerful tool for real-time phenotypic analysis, advancing automated crop monitoring in precision agriculture.
PMID:40234900 | DOI:10.1186/s13007-025-01369-6
Predicting axial load capacity in elliptical fiber reinforced polymer concrete steel double skin columns using machine learning
Sci Rep. 2025 Apr 15;15(1):12899. doi: 10.1038/s41598-025-97258-y.
ABSTRACT
The current study investigates the application of artificial intelligence (AI) techniques, including machine learning (ML) and deep learning (DL), in predicting the ultimate load-carrying capacity and ultimate strain ofboth hollow and solid hybrid elliptical fiber-reinforced polymer (FRP)-concrete-steel double-skin tubular columns (DSTCs) under axial loading. Implemented AI techniques include five ML models - Gene Expression Programming (GEP), Artificial Neural Network (ANN), Random Forest (RF), Adaptive Boosting (ADB), and eXtreme Gradient Boosting (XGBoost) - and one DL model - Deep Neural Network (DNN).Due to the scarcity of experimental data on hybrid elliptical DSTCs, an accurate finite element (FE) model was developed to provide additional numerical insights. The reliability of the proposed nonlinear FE model was validated against existing experimental results. The validated model was then employed in a parametric study to generate 112 data points.The parametric study examined the impact of concrete strength, the cross-sectional size of the inner steel tube, and FRP thickness on the ultimate load-carrying capacity and ultimate strain of both hollow and solid hybrid elliptical DSTCs.The effectiveness of the AI application was assessed by comparing the models' predictions with FE results.Among the models, XGBoost and RF achieved the best performance in both training and testing with respect to the determination coefficient (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) values. The study provided insights into the contributions of individual features to predictions using the SHapley Additive exPlanations (SHAP) approach. The results from SHAP, based on the best prediction performance of the XGBoost model, indicate that the area of the concrete core has the most significant effect on the load-carrying capacity of hybrid elliptical DSTCs, followed by the unconfined concrete strength and the total thickness of FRP multiplied by its elastic modulus. Additionally, a user interface platform was developed to streamline the practical application of the proposed AI models in predicting the axial capacity of DSTCs.
PMID:40234698 | DOI:10.1038/s41598-025-97258-y
A prediction model of pediatric bone density from plain spine radiographs using deep learning
Sci Rep. 2025 Apr 15;15(1):13039. doi: 10.1038/s41598-025-96949-w.
ABSTRACT
Osteoporosis, a bone disease characterized by decreased bone mineral density (BMD) resulting in decreased mechanical strength and an increased fracture risk, remains poorly understood in children. Herein, we developed/validated a deep learning-based model to predict pediatric BMD using plain spine radiographs. Using a two-stage model, Yolov8 was applied for vertebral body detection to predict BMD values using a regression model based on ResNet-18, from which a low-BMD group was classified based on Z-scores of predicted BMD. Patients aged 10-20-years who underwent dual-energy X-ray absorptiometry and radiography within 6 months at our hospital were enrolled. Ultimately, 601 patients (mean age, 14 years 4 months [SD 2 years]; 276 males) were included. The model achieved robust performance in detecting vertebral bodies (average precision [AP] 50 = 0.97, AP [50:95] = 0.68) and predicting BMD, with significant correlation (r = 0.72), showing consistency across different vertebral segments and agreement (intraclass correlation coefficient: 0.64). Moreover, it successfully classified low-BMD groups (area under the receiver operating characteristic curve = 0.85) with high sensitivity (0.76) and specificity (0.87). This deep-learning approach shows promise for BMD prediction and classification, with potential to enhance early detection and streamline bone health management in high-risk pediatric populations.
PMID:40234697 | DOI:10.1038/s41598-025-96949-w
A deep learning approach for blood glucose monitoring and hypoglycemia prediction in glycogen storage disease
Sci Rep. 2025 Apr 15;15(1):13032. doi: 10.1038/s41598-025-97391-8.
ABSTRACT
Glycogen storage disease (GSD) is a group of rare inherited metabolic disorders characterized by abnormal glycogen storage and breakdown. These disorders are caused by mutations in G6PC1, which is essential for proper glucose storage and metabolism. With the advent of continuous glucose monitoring systems, development of algorithms to analyze and predict glucose levels has gained considerable attention, with the aim of preemptively managing fluctuations before they become problematic. However, there is a lack of research focusing specifically on patients with GSD. Therefore, this study aimed to forecast glucose levels in patients with GSD using state-of-the-art deep-learning (DL) algorithms. This retrospective study utilized blood glucose data from patients with GSD who were either hospitalized or managed at Yonsei University Wonju Severance Christian Hospital, Korea, between August 2020 and February 2024. In this study, three state-of-the-art DL models for time-series forecasting were employed: PatchTST, LTSF N-Linear, and TS Mixer. First, the models were used to predict the patients' Glucose levels for the next hour. Second, a binary classification task was performed to assess whether hypoglycemia could be predicted alongside direct glucose levels. Consequently, this is the first study to demonstrate the capability of forecasting glucose levels in patients with GSD using continuous glucose-monitoring data and DL models. Our model provides patients with GSD with a more accessible tool for managing glucose levels. This study has a broader effect, potentially serving as a foundation for improving the care of patients with rare diseases using DL-based solutions.
PMID:40234688 | DOI:10.1038/s41598-025-97391-8
Rapid diagnosis of membranous nephropathy based on kidney tissue Raman spectroscopy and deep learning
Sci Rep. 2025 Apr 15;15(1):13038. doi: 10.1038/s41598-025-97351-2.
ABSTRACT
Membranous nephropathy (MN) is one of the most common glomerular diseases. Although the diagnostic method based on serum PLA2R antibodies has gradually been applied in clinical practice, only 52-86% of PLA2R-associated MN patients show positive anti-PLA2R antibodies. Therefore, renal biopsy remains the gold standard for diagnosing MN. However, the renal biopsy procedure is highly complex and involves multiple steps, including tissue sampling, fixation, dehydration, embedding, sectioning, PAS staining, Masson trichrome staining, and silver staining. Each step requires precise technique from laboratory personnel, as any error can affect the quality of the final tissue sections and, consequently, the diagnosis. As a result, there is an urgent need to develop a method that enables rapid diagnosis after renal biopsy. Previous studies have shown that Raman spectroscopy offers promising results for diagnosing MN, exhibiting high sensitivity and specificity when applied to human serum and urine samples. In this study, we propose a rapid diagnostic method combining Raman spectroscopy of mouse kidney tissue with a CNN-BiLSTM deep learning model. The model achieved 98% accuracy, with specificity and sensitivity of 98.3%, providing a novel auxiliary tool for the pathological diagnosis of MN.
PMID:40234682 | DOI:10.1038/s41598-025-97351-2
Advanced lightweight deep learning vision framework for efficient pavement damage identification
Sci Rep. 2025 Apr 15;15(1):12966. doi: 10.1038/s41598-025-97132-x.
ABSTRACT
Pavement crack serves as a crucial indicator of road condition, directly associated with subsequent pavement deterioration. To address the demand for large-scale real-time pavement damage assessment, this study proposes a lightweight pavement damage detection model based on YOLOv5s (LPDD-YOLO). Initially, a lightweight feature extraction network, FasterNet, is adopted to reduce the number of parameters and computational complexity. Secondly, to mitigate the reduction in accuracy resulting from the usage of lightweight network, the attention-based downsampling module and the neural network cognitive module are introduced. These modules aim to enhance the feature extraction robustness and to eliminate interference from irrelevant features. In addition, considering the significant variation in aspect ratios and diverse morphologies of pavement damages, K-Means clustering and the deformable convolution module are employed. These mechanisms ensure dynamic anchor feature selection and extend the scope of deformation ability, respectively. According to the ablation experiment on a self-built dataset, LPDD-YOLO demonstrates notable improvements in both accuracy and efficiency compared to the original model. Specifically, the mAP increases by 4.1%, and the F1 score rises by 5.3%. Moreover, LPDD-YOLO can obtain a 47.3% reduction in parameters and a 54.4% decrease in GFLOPs. It is noteworthy that LPDD-YOLO achieves real-time and accurate damage detection, with a speed of up to 85 FPS. The effectiveness and superiority of LPDD-YOLO are further substantiated through comparisons with other state-of-the-art algorithms.
PMID:40234635 | DOI:10.1038/s41598-025-97132-x
Maize yield estimation in Northeast China's black soil region using a deep learning model with attention mechanism and remote sensing
Sci Rep. 2025 Apr 15;15(1):12927. doi: 10.1038/s41598-025-97563-6.
ABSTRACT
Accurate prediction of maize yields is crucial for effective crop management. In this paper, we propose a novel deep learning framework (CNNAtBiGRU) for estimating maize yield, which is applied to typical black soil areas in Northeast China. This framework integrates a one-dimensional convolutional neural network (1D-CNN), bidirectional gated recurrent units (BiGRU), and an attention mechanism to effectively characterize and weight key segments of input data. In the predictions for the most recent year, the model demonstrated high accuracy (R² = 0.896, RMSE = 908.33 kg/ha) and exhibited strong robustness in both earlier years and during extreme climatic events. Unlike traditional yield estimation methods that primarily rely on remote sensing vegetation indices, phenological data, meteorological data, and soil characteristics, this study innovatively incorporates anthropogenic factors, such as Degree of Cultivation Mechanization (DCM), reflecting the rapid advancement of agricultural modernization. The relative importance analysis of input variables revealed that Enhanced Vegetation Index (EVI), Sun-Induced Chlorophyll Fluorescence (SIF), and DCM were the most influential factors in yield prediction. Furthermore, our framework enables maize yield prediction 1-2 months in advance by leveraging historical patterns of environmental and agricultural variables, providing valuable lead time for decision-making. This predictive capability does not rely on forecasting future weather conditions but rather captures yield-relevant signals embedded in early-season data.
PMID:40234562 | DOI:10.1038/s41598-025-97563-6
Lag-Net: Lag correction for cone-beam CT via a convolutional neural network
Comput Methods Programs Biomed. 2025 Apr 12;266:108753. doi: 10.1016/j.cmpb.2025.108753. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: Due to the presence of charge traps in amorphous silicon flat-panel detectors, lag signals are generated in consecutively captured projections. These signals lead to ghosting in projection images and severe lag artifacts in cone-beam computed tomography (CBCT) reconstructions. Traditional Linear Time-Invariant (LTI) correction need to measure lag correction factors (LCF) and may leave residual lag artifacts. This incomplete correction is partly attributed to the lack of consideration for exposure dependency.
METHODS: To measure the lag signals more accurately and suppress lag artifacts, we develop a novel hardware correction method. This method requires two scans of the same object, with adjustments to the operating timing of the CT instrumentation during the second scan to measure the lag signal from the first. While this hardware correction significantly mitigates lag artifacts, it is complex to implement and imposes high demands on the CT instrumentation. To enhance the process, We introduce a deep learning method called Lag-Net to remove lag signal, utilizing the nearly lag-free results from hardware correction as training targets for the network.
RESULTS: Qualitative and quantitative analyses of experimental results on both simulated and real datasets demonstrate that deep learning correction significantly outperforms traditional LTI correction in terms of lag artifact suppression and image quality enhancement. Furthermore, the deep learning method achieves reconstruction results comparable to those obtained from hardware correction while avoiding the operational complexities associated with the hardware correction approach.
CONCLUSION: The proposed hardware correction method, despite its operational complexity, demonstrates superior artifact suppression performance compared to the LTI algorithm, particularly under low-exposure conditions. The introduced Lag-Net, which utilizes the results of the hardware correction method as training targets, leverages the end-to-end nature of deep learning to circumvent the intricate operational drawbacks associated with hardware correction. Furthermore, the network's correction efficacy surpasses that of the LTI algorithm in low-exposure scenarios.
PMID:40233441 | DOI:10.1016/j.cmpb.2025.108753
A comparative study of explainability methods for whole slide classification of lymph node metastases using vision transformers
PLOS Digit Health. 2025 Apr 15;4(4):e0000792. doi: 10.1371/journal.pdig.0000792. eCollection 2025 Apr.
ABSTRACT
Recent advancements in deep learning have shown promise in enhancing the performance of medical image analysis. In pathology, automated whole slide imaging has transformed clinical workflows by streamlining routine tasks and diagnostic and prognostic support. However, the lack of transparency of deep learning models, often described as black boxes, poses a significant barrier to their clinical adoption. This study evaluates various explainability methods for Vision Transformers, assessing their effectiveness in explaining the rationale behind their classification predictions on histopathological images. Using a Vision Transformer trained on the publicly available CAMELYON16 dataset comprising of 399 whole slide images of lymph node metastases of patients with breast cancer, we conducted a comparative analysis of a diverse range of state-of-the-art techniques for generating explanations through heatmaps, including Attention Rollout, Integrated Gradients, RISE, and ViT-Shapley. Our findings reveal that Attention Rollout and Integrated Gradients are prone to artifacts, while RISE and particularly ViT-Shapley generate more reliable and interpretable heatmaps. ViT-Shapley also demonstrated faster runtime and superior performance in insertion and deletion metrics. These results suggest that integrating ViT-Shapley-based heatmaps into pathology reports could enhance trust and scalability in clinical workflows, facilitating the adoption of explainable artificial intelligence in pathology.
PMID:40233316 | DOI:10.1371/journal.pdig.0000792
Can Super Resolution via Deep Learning Improve Classification Accuracy in Dental
Dentomaxillofac Radiol. 2025 Apr 15:twaf029. doi: 10.1093/dmfr/twaf029. Online ahead of print.
ABSTRACT
OBJECTIVES: Deep Learning-driven Super Resolution (SR) aims to enhance the quality and resolution of images, offering potential benefits in dental imaging. Although extensive research has focused on deep learning based dental classification tasks, the impact of applying super-resolution techniques on classification remains underexplored. This study seeks to address this gap by evaluating and comparing the performance of deep learning classification models on dental images with and without super-resolution enhancement.
METHODS: An open-source dental image dataset was utilized to investigate the impact of SR on image classification performance. SR was applied by two models with a scaling ratio of 2 and 4, while classification was performed by four deep learning models. Performances were evaluated by well-accepted metrics like SSIM, PSNR, accuracy, recall, precision, and F1-score. The effect of SR on classification performance is interpreted through two different approaches.
RESULTS: Two SR models yielded average SSIM and PSNR values of 0.904 and 36.71 for increasing resolution with two scaling ratios. Average accuracy and F-1 score for the classification trained and tested with two SR-generated images were 0.859 and 0.873. In the first of the comparisons carried out with two different approaches, it was observed that the accuracy increased in at least half of the cases (8 out of 16) when different models and scaling ratios were considered, while in the second approach, SR showed a significantly higher performance for almost all cases (12 out of 16).
CONCLUSION: This study demonstrated that the classification with SR-generated images significantly improved outcomes.
ADVANCES IN KNOWLEDGE: For the first time, the classification performance of dental radiographs with improved resolution by SR has been investigated. Significant performance improvement was observed compared to the case without SR.
PMID:40233244 | DOI:10.1093/dmfr/twaf029
Viral escape-inspired framework for structure-guided dual bait protein biosensor design
PLoS Comput Biol. 2025 Apr 15;21(4):e1012964. doi: 10.1371/journal.pcbi.1012964. Online ahead of print.
ABSTRACT
A generalizable computational platform, CTRL-V (Computational TRacking of Likely Variants), is introduced to design selective binding (dual bait) biosensor proteins. The iteratively evolving receptor binding domain (RBD) of SARS-CoV-2 spike protein has been construed as a model dual bait biosensor which has iteratively evolved to distinguish and selectively bind to human entry receptors and avoid binding neutralizing antibodies. Spike RBD prioritizes mutations that reduce antibody binding while enhancing/ retaining binding with the ACE2 receptor. CTRL-V's through iterative design cycles was shown to pinpoint 20% (of the 39) reported SARS-CoV-2 point mutations across 30 circulating, infective strains as responsible for immune escape from commercial antibody LY-CoV1404. CTRL-V successfully identifies ~70% (five out of seven) single point mutations (371F, 373P, 440K, 445H, 456L) in the latest circulating KP.2 variant and offers detailed structural insights to the escape mechanism. While other data-driven viral escape variant predictor tools have shown promise in predicting potential future viral variants, they require massive amounts of data to bypass the need for physics of explicit biochemical interactions. Consequently, they cannot be generalized for other protein design applications. The publicly availably viral escape data was leveraged as in vivo anchors to streamline a computational workflow that can be generalized for dual bait biosensor design tasks as exemplified by identifying key mutational loci in Raf kinase that enables it to selectively bind Ras and Rap1a GTP. We demonstrate three versions of CTRL-V which use a combination of integer optimization, stochastic sampling by PyRosetta, and deep learning-based ProteinMPNN for structure-guided biosensor design.
PMID:40233103 | DOI:10.1371/journal.pcbi.1012964
ProtNote: a multimodal method for protein-function annotation
Bioinformatics. 2025 Apr 15:btaf170. doi: 10.1093/bioinformatics/btaf170. Online ahead of print.
ABSTRACT
MOTIVATION: Understanding the protein sequence-function relationship is essential for advancing protein biology and engineering. However, less than 1% of known protein sequences have human-verified functions. While deep learning methods have demonstrated promise for protein function prediction, current models are limited to predicting only those functions on which they were trained.
RESULTS: Here, we introduce ProtNote, a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its training set, but also generalizes to unseen and novel functions in zero-shot test settings. ProtNote demonstrates superior performance in prediction of novel GO annotations and EC numbers compared to baseline models by capturing nuanced sequence-function relationships that unlock a range of biological use cases inaccessible to prior models. We envision that ProtNote will enhance protein function discovery by enabling scientists to use free text inputs without restriction to predefined labels - a necessary capability for navigating the dynamic landscape of protein biology.
AVAILABILITY AND IMPLEMENTATION: The code is available on GitHub: https://github.com/microsoft/protnote; model weights, datasets, and evaluation metrics are provided via Zenodo: https://zenodo.org/records/13897920.
SUPPLEMENTARY INFORMATION: Supplementary Information is available at Bioinformatics online.
PMID:40233101 | DOI:10.1093/bioinformatics/btaf170
Protocol for deep-learning-driven cell type label transfer in single-cell RNA sequencing data
STAR Protoc. 2025 Apr 14;6(2):103768. doi: 10.1016/j.xpro.2025.103768. Online ahead of print.
ABSTRACT
Here, we present a protocol for using SIMS (scalable, interpretable machine learning for single cell) to transfer cell type labels in single-cell RNA sequencing data. This protocol outlines data preparation, model training with labeled data or inference using pretrained models, and methods for visualizing, downloading, and interpreting predictions. We provide stepwise instructions for accessing SIMS through the application programming interface (API), GitHub Codespaces, and a web application. For complete details on the use and execution of this protocol, please refer to Gonzalez-Ferrer et al.1.
PMID:40232935 | DOI:10.1016/j.xpro.2025.103768
Heterogeneous Mutual Knowledge Distillation for Wearable Human Activity Recognition
IEEE Trans Neural Netw Learn Syst. 2025 Apr 15;PP. doi: 10.1109/TNNLS.2025.3556317. Online ahead of print.
ABSTRACT
Recently, numerous deep learning algorithms have addressed wearable human activity recognition (HAR), but they often struggle with efficient knowledge transfer to lightweight models for mobile devices. Knowledge distillation (KD) is a popular technique for model compression, transferring knowledge from a complex teacher to a compact student. Most existing KD algorithms consider homogeneous architectures, hindering performance in heterogeneous setups. This is an under-explored area in wearable HAR. To bridge this gap, we propose a heterogeneous mutual KD (HMKD) framework for wearable HAR. HMKD establishes mutual learning within the intermediate and output layers of both teacher and student models. To accommodate substantial structural differences between teacher and student, we employ a weighted ensemble feature approach to merge the features from their intermediate layers, enhancing knowledge exchange within them. Experimental results on the HAPT, WISDM, and UCI_HAR datasets show HMKD outperforms ten state-of-the-art KD algorithms in terms of classification accuracy. Notably, with ResNetLSTMaN as the teacher and MLP as the student, HMKD increases by 9.19% in MLP's $F_{1}$ score on the HAPT dataset.
PMID:40232930 | DOI:10.1109/TNNLS.2025.3556317
FLINT: Learning-based Flow Estimation and Temporal Interpolation for Scientific Ensemble Visualization
IEEE Trans Vis Comput Graph. 2025 Apr 15;PP. doi: 10.1109/TVCG.2025.3561091. Online ahead of print.
ABSTRACT
We present FLINT (learning-based FLow estimation and temporal INTerpolation), a novel deep learning-based approach to estimate flow fields for 2D+time and 3D+time scientific ensemble data. FLINT can flexibly handle different types of scenarios with (1) a flow field being partially available for some members (e.g., omitted due to space constraints) or (2) no flow field being available at all (e.g., because it could not be acquired during an experiment). The design of our architecture allows to flexibly cater to both cases simply by adapting our modular loss functions, effectively treating the different scenarios as flow-supervised and flow-unsupervised problems, respectively (with respect to the presence or absence of ground-truth flow). To the best of our knowledge, FLINT is the first approach to perform flow estimation from scientific ensembles, generating a corresponding flow field for each discrete timestep, even in the absence of original flow information. Additionally, FLINT produces high-quality temporal interpolants between scalar fields. FLINT employs several neural blocks, each featuring several convolutional and deconvolutional layers. We demonstrate performance and accuracy for different usage scenarios with scientific ensembles from both simulations and experiments.
PMID:40232923 | DOI:10.1109/TVCG.2025.3561091
VibTac: A High-Resolution High-Bandwidth Tactile Sensing Finger for Multi-Modal Perception in Robotic Manipulation
IEEE Trans Haptics. 2025 Apr 15;PP. doi: 10.1109/TOH.2025.3561049. Online ahead of print.
ABSTRACT
Tactile sensing is pivotal for enhancing robot manipulation abilities by providing crucial feedback for localized information. However, existing sensors often lack the necessary resolution and bandwidth required for intricate tasks. To address this gap, we introduce VibTac, a novel multi-modal tactile sensing finger designed to offer high-resolution and high-bandwidth tactile sensing simultaneously. VibTac seamlessly integrates vision-based and vibration-based tactile sensing modes to achieve high-resolution and high-bandwidth tactile sensing respectively, leveraging a streamlined human-inspired design for versatility in tasks. This paper outlines the key design elements of VibTac and its fabrication methods, highlighting the significance of the Elastomer Gel Pad (EGP) in its sensing mechanism. The sensor's multi-modal performance is validated through 3D reconstruction and spectral analysis to discern tactile stimuli effectively. In experimental trials, VibTac demonstrates its efficacy by achieving over 90% accuracy in insertion tasks involving objects emitting distinct sounds, such as ethernet connectors. Leveraging vision-based tactile sensing for object localization and employing a deep learning model for "click" sound classification, VibTac showcases its robustness in real-world scenarios. Video of the sensor working can be accessed at https://youtu.be/kmKIUlXGroo.
PMID:40232917 | DOI:10.1109/TOH.2025.3561049
Learning to Learn Transferable Generative Attack for Person Re-Identification
IEEE Trans Image Process. 2025 Apr 15;PP. doi: 10.1109/TIP.2025.3558434. Online ahead of print.
ABSTRACT
Deep learning-based person re-identification (reid) models are widely employed in surveillance systems and inevitably inherit the vulnerability of deep networks to adversarial attacks. Existing attacks merely consider cross-dataset and cross-model transferability, ignoring the cross-test capability to perturb models trained in different domains. To powerfully examine the robustness of real-world re-id models, the Meta Transferable Generative Attack (MTGA) method is proposed, which adopts meta-learning optimization to promote the generative attacker producing highly transferable adversarial examples by learning comprehensively simulated transfer-based crossmodel&dataset&test black-box meta attack tasks. Specifically, cross-model&dataset black-box attack tasks are first mimicked by selecting different re-id models and datasets for meta-train and meta-test attack processes. As different models may focus on different feature regions, the Perturbation Random Erasing module is further devised to prevent the attacker from learning to only corrupt model-specific features. To boost the attacker learning to possess cross-test transferability, the Normalization Mix strategy is introduced to imitate diverse feature embedding spaces by mixing multi-domain statistics of target models. Extensive experiments show the superiority of MTGA, especially in cross-model&dataset and cross-model&dataset&test attacks, our MTGA outperforms the SOTA methods by 20.0% and 11.3% on mean mAP drop rate, respectively. The source codes are available at https://github.com/yuanbianGit/MTGA.
PMID:40232916 | DOI:10.1109/TIP.2025.3558434