Deep learning
Differentiation of COVID-19 from other types of viral pneumonia and severity scoring on baseline chest radiographs: Comparison of deep learning with multi-reader evaluation
PLoS One. 2025 Jul 29;20(7):e0328061. doi: 10.1371/journal.pone.0328061. eCollection 2025.
ABSTRACT
Chest X-ray (CXR) imaging plays a pivotal role in the diagnosis and prognosis of viral pneumonia. However, distinguishing COVID-19 CXRs from other viral infections remains challenging due to highly similar radiographic features. Most existing deep learning (DL) models focus on differentiating COVID-19 from community-acquired pneumonia (CAP) rather than other viral pneumonias and often overlook baseline CXRs, missing the critical window for early detection and intervention. Moreover, manual severity scoring of COVID-19 CXRs by radiologists is subjective and time-intensive, highlighting the need for automated systems. This study introduces a DL system for distinguishing COVID-19 from other viral pneumonias on baseline CXRs acquired within three days of PCR testing, and for automated severity scoring of COVID-19 CXRs. The system was developed using a dataset of 2,547 patients (808 COVID-19, 936 non-COVID viral pneumonia, and 803 normal cases) and validated externally on several publicly accessible datasets. Compared to four experienced radiologists, the model achieved higher diagnostic accuracy (76.4% vs. 71.8%) and enhanced COVID-19 identification (F1-score: 74.1% vs. 61.3%), with an AUC of 93% for distinguishing between viral pneumonia and normal cases, and 89.8% for differentiating COVID-19 from other viral pneumonias. The severity-scoring module exhibited a high Pearson correlation of 93% and a low mean absolute error (MAE) of 2.35 compared to the radiologists' consensus. External validation on independent public datasets confirmed the model's generalizability. Subgroup analyses stratified by patient age, sex, and severity levels further demonstrated consistent performance, supporting the system's robustness across diverse clinical populations. These findings suggest that the proposed DL system could assist radiologists in the early diagnosis and severity assessment of COVID-19 from baseline CXRs, particularly in resource-limited settings.
PMID:40729327 | DOI:10.1371/journal.pone.0328061
Survival Prediction in Stomach Cancer with Deep Learning: Unveiling Model Decisions with LIME and SHAP
Asian Pac J Cancer Prev. 2025 Jul 1;26(7):2669-2677. doi: 10.31557/APJCP.2025.26.7.2669.
ABSTRACT
OBJECTIVE: Stomach cancer is anticipated to remain a significant global health concern, underscoring the urgent need for sophisticated prognostic models. The aim of the study is to build an intuitive deep learning model for predicting survival probabilities in stomach cancer patients, validating it with external data and merging SHAP and LIME to improve the therapeutic relevance and reliability.
METHODS: A deep learning survival model was developed with multilayer perceptron, on 1,350 documented stomach cancer cases from the AIIMS, Bhubaneswar Cancer Registry (2018-2022). The model was refined utilizing the Adam optimizer (learning rate = 0.002) with dropout regularization. External validation was performed on an independent cohort of 388 patients from Hi-Tech Medical College and Hospital. Performance was assessed using accuracy, precision, sensitivity, specificity, F1-score, balanced accuracy, Matthews correlation coefficient, concordance index, and AUROC score. LIME and SHAP were utilized to improve interpretability by evaluating both local and global feature contributions.
RESULT: Complex interactions between important prognostic factors such as age, stage, treatment approaches, and socioeconomic level were well explained by LIME and SHAP, thus exposing important elements impacting survival results. Performance measures of the model measured through various metrics showed good generalizability over several datasets.
CONCLUSION: This article focused on interpretable artificial intelligence models in the prognosis for stomach cancer with patient-specific survival projections. Artificial intelligence techniques such as LIME and SHAP improves clinician trust, hence promoting patient specific treatment recommendations.
PMID:40729090 | DOI:10.31557/APJCP.2025.26.7.2669
RGCN-BA: relational graph convolutional network with batch awareness for single-cell RNA sequencing clustering
Brief Bioinform. 2025 Jul 2;26(4):bbaf378. doi: 10.1093/bib/bbaf378.
ABSTRACT
Single-cell RNA sequencing (scRNA-seq) technology has opened new frontiers in biomedical research, offering insights into cellular heterogeneity. Accurate cell clustering and batch effect correction are essential in single-cell RNA sequencing (scRNA-seq) data analysis, forming the foundation for downstream steps. However, most methods handle these tasks separately, limiting their applicability across diverse datasets. To address these challenges, we introduce Relational Graph Convolutional Network with Batch Awareness (RGCN-BA), a deep learning framework that integrates cell clustering and batch effect correction into a unified model. For multi-batch datasets, RGCN-BA leverages relational graph convolutional network to process batch information as distinct edge types, followed by a batch correction layer for global alignment. For single-batch data, it functions with a single edge type. Experiments on both multi-batch and single-batch datasets demonstrate that RGCN-BA outperforms both specialized clustering methods and batch effect correction methods. This versatility in handling both tasks positions RGCN-BA as a powerful tool for enhancing scRNA-seq data analysis.
PMID:40728858 | DOI:10.1093/bib/bbaf378
Diabetes and longitudinal changes in deep learning-derived measures of vertebral bone mineral density using conventional CT: the Multi-Ethnic Study of Atherosclerosis
Skeletal Radiol. 2025 Jul 29. doi: 10.1007/s00256-025-04995-2. Online ahead of print.
ABSTRACT
OBJECTIVE: To investigate the longitudinal association between diabetes and changes in vertebral bone mineral density (BMD) derived from conventional chest CT and to evaluate whether kidney function (estimated glomerular filtration rate (eGFR)) modifies this relationship.
MATERIALS AND METHODS: This longitudinal study included 1046 participants from the Multi-Ethnic Study of Atherosclerosis Lung Study with vertebral BMD measurements from chest CTs at Exam 5 (2010-2012) and Exam 6 (2016-2018). Diabetes was classified based on the American Diabetes Association criteria, and those with impaired fasting glucose (i.e., prediabetes) were excluded. Volumetric BMD was derived using a validated deep learning model to segment trabecular bone of thoracic vertebrae. Linear mixed-effects models estimated the association between diabetes and BMD changes over time. Following a significant interaction between diabetes status and eGFR, additional stratified analyses examined the impact of kidney function (i.e., diabetic nephropathy), categorized by eGFR (≥ 60 vs. < 60 mL/min/body surface area).
RESULTS: Participants with diabetes had a higher baseline vertebral BMD than those without (202 vs. 190 mg/cm3) and experienced a significant increase over a median followpup of 6.2 years (β = 0.62 mg/cm3/year; 95% CI 0.26, 0.98). This increase was more pronounced among individuals with diabetes and reduced kidney function (β = 1.52 mg/cm3/year; 95% CI 0.66, 2.39) compared to the diabetic individuals with preserved kidney function (β = 0.48 mg/cm3/year; 95% CI 0.10, 0.85).
CONCLUSION: Individuals with diabetes exhibited an increase in vertebral BMD over time in comparison to the non-diabetes group which is more pronounced in those with diabetic nephropathy. These findings suggest that conventional BMD measurements may not fully capture the well-known fracture risk in diabetes. Further studies incorporating bone microarchitecture using advanced imaging and fracture outcomes are needed to refine skeletal health assessments in the diabetic population.
PMID:40728733 | DOI:10.1007/s00256-025-04995-2
Prediction of Intrinsically Disordered Lipid Binding Residues with DisoLipPred
Methods Mol Biol. 2025;2947:301-312. doi: 10.1007/978-1-0716-4662-5_17.
ABSTRACT
DisoLipPred is a state-of-the-art predictor of intrinsically disordered lipid-binding residues in protein sequences. This method relies on a modern deep neural network model, produces accurate results, and is available as a convenient web server. We provide a practical and detailed introduction to the DisoLipPred's web server. We describe the underlying predictive process, which is fully automated and performed on the server side, and offer instructions for interactions with DisoLipPred's web interface. We also discuss how to obtain, read, and interpret results produced by this server using a case study that analyzes results generated for the vacuolar-sorting protein SNF8. The web server is freely available at http://biomine.cs.vcu.edu/servers/DisoLipPred/ .
PMID:40728621 | DOI:10.1007/978-1-0716-4662-5_17
A Benchmarking Platform for Assessing Protein Language Models on Function-Related Prediction Tasks
Methods Mol Biol. 2025;2947:241-268. doi: 10.1007/978-1-0716-4662-5_14.
ABSTRACT
Proteins play a crucial role in almost all biological processes, serving as the building blocks of life and mediating various cellular functions, from enzymatic reactions to immune responses. Accurate annotation of protein functions is essential for advancing our understanding of biological systems and developing innovative biotechnological applications and therapeutic strategies. To predict protein function, researchers primarily rely on classical homology-based methods, which use evolutionary relationships, and increasingly on machine learning (ML) approaches. Lately, protein language models (PLMs) have gained prominence; these models leverage specialized deep learning architectures to effectively capture intricate relationships between sequence, structure, and function. We recently conducted a comprehensive benchmarking study to evaluate diverse protein representations (i.e., classical approaches and PLMs) and discuss their trade-offs. The current work introduces the Protein Representation Benchmark-PROBE tool, a benchmarking framework designed to evaluate protein representations on function-related prediction tasks. Here, we provide a detailed protocol for running the framework via the GitHub repository and accessing our newly developed user-friendly web service. PROBE encompasses four core tasks: semantic similarity inference, ontology-based function prediction, drug target family classification, and protein-protein binding affinity estimation. We demonstrate PROBE's usage through a new use case evaluating ESM2 and three recent multimodal PLMs-ESM3, ProstT5, and SaProt-highlighting their ability to integrate diverse data types, including sequence and structural information. This study underscores the potential of protein language models in advancing protein function prediction and serves as a valuable tool for both PLM developers and users.
PMID:40728618 | DOI:10.1007/978-1-0716-4662-5_14
Comprehensive Prediction of Protein Localization and Signal Peptides Using MULocDeep
Methods Mol Biol. 2025;2947:223-239. doi: 10.1007/978-1-0716-4662-5_13.
ABSTRACT
Localization of a protein within a cell encompasses various processes and signaling events, including guidance of signal peptides. Accurate prediction of subcellular and suborganellar protein localization, as well as signal peptides, is crucial for understanding protein function and provides valuable insights into cellular mechanisms. Although many computational methods can predict either general protein localization, suborganellar localization, or signal peptides, the lack of comprehensive and intuitive interpretation, insufficient coverage of localization types, and issues related to ease of use are some common limitations. In this chapter, we introduce MULocDeep, an advanced web server designed for the prediction of protein localization at both subcellular and suborganellar levels, as well as the identification of signal peptides and their corresponding cleavage sites. This web server integrates a sophisticated protein large language model, enabling highly accurate predictions and facilitating the interpretation of results. The server also includes multiple interactive interfaces that enhance the clarity and accessibility of predictions, particularly concerning motif patterns within protein sequences. Furthermore, we demonstrate the practical functionality of the MULocDeep web server, providing detailed instructions on how to utilize the server and interpret the results for both localization and signal peptide prediction. The MULocDeep web server is publicly available at https://www.mu-loc.org/ .
PMID:40728617 | DOI:10.1007/978-1-0716-4662-5_13
Annotating genomes with DeepGO protein function prediction tools
Methods Mol Biol. 2025;2947:171-189. doi: 10.1007/978-1-0716-4662-5_10.
ABSTRACT
This chapter explores the evolution of DeepGO, a suite of deep learning-based tools for protein function prediction, in the form of Gene Ontology (GO) terms, and their applications in genome annotation. We provide a comprehensive overview of the different versions of DeepGO, highlighting key advancements introduced by each method. To demonstrate the practical application of these tools, we present a case study on the annotation of a bacterial genome using the latest Deep GO model, DeepGO-SE. We showcase the efficiency and accuracy of DeepGO-SE in predicting protein functions and discuss the model's parameters. This chapter serves as a guide for researchers looking to enhance their genomic analyses using deep learning-based function prediction methods.
PMID:40728614 | DOI:10.1007/978-1-0716-4662-5_10
Integrating Gene Ontology Relationships for Protein Function Prediction Using PFresGO
Methods Mol Biol. 2025;2947:161-169. doi: 10.1007/978-1-0716-4662-5_9.
ABSTRACT
Efficient computational methods for protein functional annotation help bridge the gap between high-throughput sequence data and unknown protein functions. While many data-driven methods predict protein functions based on protein-level information, they often overlook the relationships between different functions. In this work, we introduce PFresGO, an attention-based deep learning approach that utilizes the hierarchical structure of gene ontology (GO) graphs to predict multiple protein functions in a high-throughput manner. PFresGO is available for academic use at https://github.com/BioColLab/PFresGO . We provide an overview of our predictor, discuss its ability to accurately predict protein functions, and demonstrate how to interpret the results.
PMID:40728613 | DOI:10.1007/978-1-0716-4662-5_9
Predicting Protein Functions with Function-Aware Domain Embeddings Using Domain-PFP
Methods Mol Biol. 2025;2947:151-160. doi: 10.1007/978-1-0716-4662-5_8.
ABSTRACT
Protein function prediction has long been a significant challenge in protein bioinformatics. Protein domains, as the structural and functional units, carry strong functional signatures. However, the vast number of domains and the limited availability of functional annotations introduce the issues of high dimensionality and sparsity when developing in-silico protein function prediction methods. To address these challenges, we have developed Domain-PFP, which leverages self-supervised learning to generate functionally aware representations of protein domains, effectively overcoming these limitations. By employing a lightweight shallow neural network, Domain-PFP captures the associations and co-occurrence relationships between protein domains and Gene Ontology (GO) terms, resulting in functionally informative domain embeddings. These embeddings demonstrate substantial functional relevance, as confirmed by multiple assessments, and are highly competitive in protein function prediction, outperforming current state-of-the-art methods. Additionally, we have created a Google Colab web service for Domain-PFP, allowing users to analyze domain-GO co-occurrence likelihoods, extract functionally aware protein representations, and predict protein functions through a user-friendly interface.
PMID:40728612 | DOI:10.1007/978-1-0716-4662-5_8
A Survey of Deep Learning Methods and Tools for Protein Binding Site Prediction
Methods Mol Biol. 2025;2947:89-108. doi: 10.1007/978-1-0716-4662-5_5.
ABSTRACT
Proteins are essential to various cellular functions by interacting with various ligands, including peptides, small molecules, ions, and nucleic acids. Accurate prediction of protein binding sites is essential for understanding these interactions and their biological significance. Recent advancements in deep learning (DL) have greatly enhanced the accuracy of protein binding site prediction. This chapter provides a comprehensive review of state-of-the-art DL methods for predicting protein binding sites, offering a detailed guide for developing and implementing DL models. It compiles and evaluates methods from recent literature, presenting key details such as model types, input/output data, databases, and evaluation metrics. Additionally, the chapter discusses the data collection and preprocessing steps required for training DL models, highlighting the utility of databases like UniProt, Dockground, PDBbind, and so on. The application of prominent DL architectures, including Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs), is explored, along with recent advancements in these approaches. By providing a thorough overview of these methods and resources, this chapter equips researchers with the tools to advance AI-driven protein binding site prediction.
PMID:40728609 | DOI:10.1007/978-1-0716-4662-5_5
Multitask Learning-Based Approaches for Protein Function Prediction
Methods Mol Biol. 2025;2947:75-88. doi: 10.1007/978-1-0716-4662-5_4.
ABSTRACT
Advancements in sequencing technologies have resulted in a massive growth in the number of sequences available. Only a small fraction of the proteins in UniProtKB have been functionally annotated. Understanding the roles and studying the mechanisms of newly discovered proteins is one of the most important biological problems in the post-genomic era. To address the sequence-function gap many computational methods have been developed. This chapter reviews Multitask Learning (MTL)-based approaches for protein function prediction, highlighting its potential to enhance both predictive accuracy and computational efficiency in bioinformatics. MTL utilizes shared representations to leverage common information across related tasks, improving predictive performance. Key findings reveal that MTL improves predictive performance by integrating shared features across related tasks.
PMID:40728608 | DOI:10.1007/978-1-0716-4662-5_4
MCST-AFN: A Multichannel Spatiotemporal Feature Adaptive Fusion Network Framework Based on a Low-Fidelity Molecular Dynamics Model
ACS Omega. 2025 Jul 11;10(28):30232-30249. doi: 10.1021/acsomega.5c01443. eCollection 2025 Jul 22.
ABSTRACT
The capability of predicting molecular properties plays a crucial role in drug development, and the learning of molecular representations stands as the primary step in tasks aimed at predicting molecular properties. Static three-dimensional (3D) structural information has been shown to significantly aid in molecular representation; however, molecules are in constant motion and change, implying that their properties should be closely linked with dynamic molecular conformations. Traditional four-dimensional (4D) Quantitative Structure-Property Relationship (QSPR) methods, while incorporating time as a dimension, have high computational costs and fail to fully integrate the temporal dimension, leading to ineffective integration of molecular conformation ensembles. Inspired by deep learning-based molecular dynamics (DLMD) techniques and multifidelity learning (MFL) strategies, in this work, a multichannel spatiotemporal feature adaptive fusion network framework (MCST-AFN) based on a low-fidelity molecular dynamics model is proposed. This framework integrates deep learning technology with molecular dynamics (MD) simulations, effectively enhancing molecular representation while significantly reducing computational costs. Initially, a low-fidelity molecular dynamics simulation model is trained using real molecular dynamics simulation data. Compared to existing tools such as Amber, this low-fidelity model can update atomic coordinates at a lower computational cost and output multichannel atom-level embeddings that encapsulate information across different time scales. Subsequently, an attention-based network is constructed to achieve adaptive fusion of multichannel spatiotemporal features, and a self-supervised learning task for atom masking prediction is designed to further enhance molecular representation. The MCST-AFN was tested on 13 benchmark data sets for molecular property prediction, achieving an average performance improvement of 2.10% across 12 data sets. The most significant enhancement was seen in the ESOL data set, with a performance boost of 19.70%.
PMID:40727795 | PMC:PMC12290669 | DOI:10.1021/acsomega.5c01443
Stacking Ensemble Neural Network for Chemical Safety Assessment: A Case Study of Thyroid Peroxidase and Natural Product Screening
ACS Omega. 2025 Jul 10;10(28):30450-30466. doi: 10.1021/acsomega.5c02188. eCollection 2025 Jul 22.
ABSTRACT
Stacking ensemble learning is a method to improve model generalization and robustness. Deep neural networks have demonstrated significant potential for predicting chemical properties due to their effectiveness in learning complex patterns within the chemical space. Nevertheless, an individual model may rely on a single molecular feature set that might not explicitly explain all of the relationships between drugs and targets. Integrating a stacking ensemble with deep learning (DL) and various molecular features could potentially enhance the learning process and improve the ability to capture complex relationships between molecular structures and bioactivities. Chemicals binding to thyroid peroxidase (TPO) are associated with thyroid dysfunction, highlighting the importance of assessing their potential risks to human health and the environment. In this study, we developed a novel stacking ensemble neural network model to predict TPO inhibitory activity. This model integrates convolutional neural networks, bidirectional long short-term memory networks, and attention mechanisms combined with top-performing molecular fingerprints to generate three probability features. These features were used as inputs in a meta-decision model, enhancing learning probability. The meta-model was validated through y-randomization, ensuring that the model does not produce outputs randomly. The applicability domain of this model was also assessed to affirm the reliability and trustworthiness of each prediction. The final attention-based meta-model achieved a recall of 0.55, specificity of 0.95, Matthews correlation coefficient of 0.56, area under the curve of 0.85, balanced accuracy of 0.75, and precision of 0.70. Furthermore, the developed model was generalized to other external test sets, effectively predicting TPO inhibition and identifying potentially toxic compounds from a selected Thai indigenous vegetable. These findings will contribute to the application of stacking ensemble neural networks in the toxicity screening of chemical compounds, enhancing their learning ability to capture more diverse chemical risk assessments.
PMID:40727784 | PMC:PMC12290627 | DOI:10.1021/acsomega.5c02188
Reinforcement Learning-Based Nonlinear Model Predictive Controller for a Jacketed Reactor: A Machine Learning Concept Validation Using Jetson Orin
ACS Omega. 2025 Jul 9;10(28):30864-30878. doi: 10.1021/acsomega.5c03219. eCollection 2025 Jul 22.
ABSTRACT
In this research work authors have experimentally validated a blend of Machine Learning and Nonlinear Model Predictive Control (NMPC) framework designed to track the temperature profile in a Batch Reactor (BR) with an actor-critic reinforcement learning (A2CRL) methodology for dynamic weight updates. Recurrent Neural Network (RNN)-based approach for modeling is used for the open loop data collected from the lab scale batch reactor. Batch reactors are extensively utilized in industries like specialty chemicals, pharmaceuticals, and food processing because of their adaptability, especially for small-to-medium-scale production, intricate reaction dynamics, and diverse operational conditions. Thermal runaway in batch reactor is still an open-ended problem in process industry to address. The actor-critic method proficiently integrates policy optimization and value function estimates to dynamically regulate the heat produced by exothermic reactions. RNNs are employed to capture temporal dependencies in the system dynamics, enabling more accurate predictions and efficient control actions. The proposed framework is trained using open-loop experimental data and optimized to dynamically adjust the coolant flow rate, ensuring precise temperature regulation and stability. Compared to existing deep learning-based NMPC implementations, the proposed actor-critic methodology enhances NMPC controller performance by balancing prediction accuracy and real-time computational efficiency. Results demonstrate significant improvements in process efficiency, energy consumption reduction, and operational safety, validating the potential of this approach for deployment in industrial-scale batch reactor systems.
PMID:40727728 | PMC:PMC12290641 | DOI:10.1021/acsomega.5c03219
CPI-MIF: Compound-Protein Interaction Prediction with Multiview Information Fusion
ACS Omega. 2025 Jul 13;10(28):30155-30166. doi: 10.1021/acsomega.5c00113. eCollection 2025 Jul 22.
ABSTRACT
Compound-protein interaction (CPI) prediction is a critical step in the drug discovery process. Deep learning approaches have played a significant role in CPI prediction in recent years. However, existing studies often overlook the role of proteins in CPI recognition and fail to incorporate the complex interaction information between substructures. To this end, we propose a multiview information fusion model named CPI-MIF, which mines the structural information on compounds and biological information on proteins, and uses the multiview interaction module to aggregate compound and protein information from both the micro and macro views. In the micro view, CPI-MIF focuses on the mechanism of interaction between compound atoms and protein amino acids, while in the macro view, it explores the relationship between compound sequences and protein sequences, enabling the aggregation of multilevel feature information and relationship prediction. We conducted CPI prediction experiments on three real-world data sets and demonstrated that CPI-MIF outperforms existing CPI prediction methods in accuracy, AUC, and AUPR, while exhibiting strong stability on imbalanced data sets.
PMID:40727722 | PMC:PMC12290966 | DOI:10.1021/acsomega.5c00113
Glo-In-One-v2: holistic identification of glomerular cells, tissues, and lesions in human and mouse histopathology
J Med Imaging (Bellingham). 2025 Nov;12(6):061406. doi: 10.1117/1.JMI.12.6.061406. Epub 2025 Jul 28.
ABSTRACT
PURPOSE: Segmenting intraglomerular tissue and glomerular lesions traditionally depends on detailed morphological evaluations by expert nephropathologists, a labor-intensive process susceptible to interobserver variability. Our group previously developed the Glo-In-One toolkit for integrated glomerulus detection and segmentation. We leverage the Glo-In-One toolkit to version 2 (Glo-In-One-v2), which adds fine-grained segmentation capabilities. We curated 14 distinct labels spanning tissue regions, cells, and lesions across 23,529 annotated glomeruli from human and mouse histopathology data. To our knowledge, this dataset is among the largest of its kind to date.
APPROACH: We present a single dynamic-head deep learning architecture for segmenting 14 classes within partially labeled images from human and mouse kidney pathology. The model was trained on data derived from 368 annotated kidney whole-slide images with five key intraglomerular tissue types and nine glomerular lesion types.
RESULTS: The glomerulus segmentation model achieved a decent performance compared with baselines and achieved a 76.5% average Dice similarity coefficient. In addition, transfer learning from rodent to human for the glomerular lesion segmentation model has enhanced the average segmentation accuracy across different types of lesions by more than 3%, as measured by Dice scores.
CONCLUSIONS: We introduce a convolutional neural network for multiclass segmentation of intraglomerular tissue and lesions. The Glo-In-One-v2 model and pretrained weight are publicly available at https://github.com/hrlblab/Glo-In-One_v2.
PMID:40727720 | PMC:PMC12303538 | DOI:10.1117/1.JMI.12.6.061406
Risk level prediction for problematic internet use: A digital health perspective
Internet Interv. 2025 Jul 21;41:100863. doi: 10.1016/j.invent.2025.100863. eCollection 2025 Sep.
ABSTRACT
Problematic Internet Usage (PIU) research has long been a topic of interest across disciplines, and numerous theoretical and empirical studies have been conducted over the past decade. This study systematically reviews the existing literature to identify key research objectives, datasets, methodologies, and applications, and to highlight important gaps and challenges. To improve understanding and detection of PIU, we designed a comprehensive machine learning pipeline that combines detailed preprocessing, feature extraction, modeling, and performance validation strategies. Systematic evaluations demonstrate that model performance is significantly improved by addressing missing values and data imbalance. In particular, we identified key predictive features such as physiological indicators, physical activity, sleep quality, and Internet usage patterns, and clearly elucidated the differences in the positive or negative impact of these key features on PIU detection at different severity levels. These results have practical implications, especially for promoting early detection and enabling tailored interventions. Ultimately, this study contributes to digital health initiatives by providing actionable insights for the development of effective Internet addiction prevention and intervention programs.
PMID:40727605 | PMC:PMC12301810 | DOI:10.1016/j.invent.2025.100863
Ubigo-X: Protein ubiquitination site prediction using ensemble learning with image-based feature representation and weighted voting
Comput Struct Biotechnol J. 2025 Jul 14;27:3137-3146. doi: 10.1016/j.csbj.2025.07.025. eCollection 2025.
ABSTRACT
Accurate ubiquitination identification is crucial in biological function analysis. We developed Ubigo-X, a novel protein ubiquitination prediction tool. Our training data, sourced from the Protein Lysine Modification Database (PLMD 3.0), comprised 53,338 ubiquitination and 71,399 non-ubiquitination sites, retained after CD-HIT and CD-HIT-2d sequence filtering. Three sub-models: Single-Type sequence-based features (Single-Type SBF), k-mer sequence-based features (Co-Type SBF), and structure-based and function-based features (S-FBF), were developed. Single-Type SBF used amino acid composition (AAC), amino acid index (AAindex), and one-hot encoding; Co-Type SBF used Single-Type SBF via k-mer encoding; and S-FBF used secondary structure, relative solvent accessibility (RSA)/absolute solvent-accessible area (ASA), and signal peptide cleavage sites. S-FBF was trained using XGBoost, while Single-Type SBF and Co-Type SBF were transformed into image-based features and trained using Resnet34. Ubigo-X was developed by combining the three models via a weighted voting strategy. Independent testing using PhosphoSitePlus data (65,421 ubiquitination and 61,222 non-ubiquitination sites) retained after filtering yielded 0.85, 0.79, and 0.58 for area under the curve (AUC), accuracy (ACC), and Matthews correlation coefficient (MCC), respectively. Further testing on imbalanced PhosphoSitePlus data (1:8 positive-to-negative sample ratio) yielded 0.94 AUC, 0.85 ACC, and 0.55 MCC. Using the GPS-Uber data, the AUC, ACC, and MCC were 0.81, 0.59, and 0.27, respectively. In conclusion, Ubigo-X outperformed existing tools in MCC (for both balanced and unbalanced data) and AUC and ACC (for balanced data), highlighting the efficacy of integrating image-based feature representation and weighted voting in ubiquitination prediction. Ubigo-X is a potential species-neutral ubiquitination site prediction tool, accessible at http://merlin.nchu.edu.tw/ubigox/.
PMID:40727425 | PMC:PMC12303043 | DOI:10.1016/j.csbj.2025.07.025
A Deep-Learning Approach to Detect and Classify Heavy-Duty Trucks in Satellite Images
IEEE trans Intell Transp Syst. 2024 Oct;25(10):13323-13338. doi: 10.1109/tits.2024.3431452. Epub 2024 Aug 29.
ABSTRACT
Heavy-duty trucks serve as the backbone of the supply chain and have a tremendous effect on the economy. However, they severely impact the environment and public health. This study presents a novel truck detection framework by combining satellite imagery with Geographic Information System (GIS)-based OpenStreetMap data to capture the distribution of heavy-duty trucks and shipping containers in both on-road and off-road locations with extensive spatial coverage. The framework involves modifying the CenterNet detection algorithm to detect randomly oriented trucks in satellite images and enhancing the model through ensembling with Mask RCNN, a segmentation-based algorithm. GIS information refines and improves the model's prediction results. Applied to part of Southern California, including the Port of Los Angeles and Long Beach, the framework helps assess the environmental impact of heavy-duty trucks in port-adjacent communities and understand truck density patterns along major freight corridors. This research has implications for policy, practice, and future research.
PMID:40727422 | PMC:PMC12302943 | DOI:10.1109/tits.2024.3431452