Deep learning
Heterogeneous virus classification using a functional deep learning model based on transmission electron microscopy images
Sci Rep. 2024 Nov 22;14(1):28954. doi: 10.1038/s41598-024-80013-0.
ABSTRACT
Viruses are submicroscopic agents that can infect other lifeforms and use their hosts' cells to replicate themselves. Despite having simplistic genetic structures among all living beings, viruses are highly adaptable, resilient, and capable of causing severe complications in their hosts' bodies. Due to their multiple transmission pathways, high contagion rate, and lethality, viruses pose the biggest biological threat both animal and plant species face. It is often challenging to promptly detect a virus in a host and accurately determine its type using manual examination techniques. However, computer-based automatic diagnosis methods, especially the ones using Transmission Electron Microscopy (TEM) images, have proven effective in instant virus identification. Using TEM images collected from a recent dataset, this article proposes a deep learning-based classification model to identify the virus type within those images. The methodology of this study includes two coherent image processing techniques to reduce the noise present in raw microscopy images and a functional Convolutional Neural Network (CNN) model for classification. Experimental results show that it can differentiate among 14 types of viruses with a maximum of 97.44% classification accuracy and F1-score, which asserts the effectiveness and reliability of the proposed method. Implementing this scheme will impart a fast and dependable virus identification scheme subsidiary to the thorough diagnostic procedures.
PMID:39578636 | DOI:10.1038/s41598-024-80013-0
CelloType: a unified model for segmentation and classification of tissue images
Nat Methods. 2024 Nov 22. doi: 10.1038/s41592-024-02513-1. Online ahead of print.
ABSTRACT
Cell segmentation and classification are critical tasks in spatial omics data analysis. Here we introduce CelloType, an end-to-end model designed for cell segmentation and classification for image-based spatial omics data. Unlike the traditional two-stage approach of segmentation followed by classification, CelloType adopts a multitask learning strategy that integrates these tasks, simultaneously enhancing the performance of both. CelloType leverages transformer-based deep learning techniques for improved accuracy in object detection, segmentation and classification. It outperforms existing segmentation methods on a variety of multiplexed fluorescence and spatial transcriptomic images. In terms of cell type classification, CelloType surpasses a model composed of state-of-the-art methods for individual tasks and a high-performance instance segmentation model. Using multiplexed tissue images, we further demonstrate the utility of CelloType for multiscale segmentation and classification of both cellular and noncellular elements in a tissue. The enhanced accuracy and multitask learning ability of CelloType facilitate automated annotation of rapidly growing spatial omics data.
PMID:39578628 | DOI:10.1038/s41592-024-02513-1
Attention-based multi-residual network for lung segmentation in diseased lungs with custom data augmentation
Sci Rep. 2024 Nov 22;14(1):28983. doi: 10.1038/s41598-024-79494-w.
ABSTRACT
Lung disease analysis in chest X-rays (CXR) using deep learning presents significant challenges due to the wide variation in lung appearance caused by disease progression and differing X-ray settings. While deep learning models have shown remarkable success in segmenting lungs from CXR images with normal or mildly abnormal findings, their performance declines when faced with complex structures, such as pulmonary opacifications. In this study, we propose AMRU++, an attention-based multi-residual UNet++ network designed for robust and accurate lung segmentation in CXR images with both normal and severe abnormalities. The model incorporates attention modules to capture relevant spatial information and multi-residual blocks to extract rich contextual and discriminative features of lung regions. To further enhance segmentation performance, we introduce a data augmentation technique that simulates the features and characteristics of CXR pathologies, addressing the issue of limited annotated data. Extensive experiments on public and private datasets comprising 350 cases of pneumoconiosis, COVID-19, and tuberculosis validate the effectiveness of our proposed framework and data augmentation technique.
PMID:39578613 | DOI:10.1038/s41598-024-79494-w
HDBind: encoding of molecular structure with hyperdimensional binary representations
Sci Rep. 2024 Nov 23;14(1):29025. doi: 10.1038/s41598-024-80009-w.
ABSTRACT
Traditional methods for identifying "hit" molecules from a large collection of potential drug-like candidates rely on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between the drug and its protein target. These approaches have a significant limitation in that they require exceptional computing capabilities for even relatively small collections of molecules. Increasingly large and complex state-of-the-art deep learning approaches have gained popularity with the promise to improve the productivity of drug design, notorious for its numerous failures. However, as deep learning models increase in their size and complexity, their acceleration at the hardware level becomes more challenging. Hyperdimensional Computing (HDC) has recently gained attention in the computer hardware community due to its algorithmic simplicity relative to deep learning approaches. The HDC learning paradigm, which represents data with high-dimension binary vectors, allows the use of low-precision binary vector arithmetic to create models of the data that can be learned without the need for the gradient-based optimization required in many conventional machine learning and deep learning methods. This algorithmic simplicity allows for acceleration in hardware that has been previously demonstrated in a range of application areas (computer vision, bioinformatics, mass spectrometery, remote sensing, edge devices, etc.). To the best of our knowledge, our work is the first to consider HDC for the task of fast and efficient screening of modern drug-like compound libraries. We also propose the first HDC graph-based encoding methods for molecular data, demonstrating consistent and substantial improvement over previous work. We compare our approaches to alternative approaches on the well-studied MoleculeNet dataset and the recently proposed LIT-PCBA dataset derived from high quality PubChem assays. We demonstrate our methods on multiple target hardware platforms, including Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), showing at least an order of magnitude improvement in energy efficiency versus even our smallest neural network baseline model with a single hidden layer. Our work thus motivates further investigation into molecular representation learning to develop ultra-efficient pre-screening tools. We make our code publicly available at https://github.com/LLNL/hdbind .
PMID:39578580 | DOI:10.1038/s41598-024-80009-w
The development of an attention mechanism enhanced deep learning model and its application for body composition assessment with L3 CT images
Sci Rep. 2024 Nov 22;14(1):28953. doi: 10.1038/s41598-024-79915-w.
ABSTRACT
Body composition assessment is very useful for evaluating a patient's status in the clinic, but recognizing, labeling, and calculating the body compositions would be burdensome. This study aims to develop a web-based service that could automate calculating the areas of skeleton muscle (SM), visceral adipose tissue (VAT), and subcutaneous adipose tissue (SAT) according to L3 computed tomography (CT) images. 1500 L3 CT images were gathered from Xuzhou Central Hospital. Of these, 70% were used as the training dataset, while the remaining 30% were used as the validating dataset. The UNet framework was combined with attention gate (AG), Squeeze and Excitation block (SEblock), and Atrous Spatial Pyramid Pooling (ASSP) modules to construct the segmentation deep learning model. The model's efficacy was externally validated using two other test datasets with multiple metrics, the consistency test and manual result checking. A graphic user interface was also created and deployed using the Streamlit Python package. The custom deep learning model named L3 Body Composition Segmentation Model (L3BCSM) was constructed. The model's Median Dice is 0.954(0.930, 0.963)(SATA), 0.849(0.774,0.901)(VATA), and 0.920(0.901, 0.936)(SMA), which is equal to or better than classic models, including UNETR and AHNet. L3BCSM also achieved satisfactory metrics in two external test datasets, consistent with the qualified label. An internet-based application was developed using L3BCSM, which has four functional modules: population analysis, time series analysis, consistency analysis, and manual result checking. The body composition assessment application was well developed, which would benefit the clinical practice and related research.
PMID:39578556 | DOI:10.1038/s41598-024-79915-w
A multi-perspective deep learning framework for enhancer characterization and identification
Comput Biol Chem. 2024 Nov 19;114:108284. doi: 10.1016/j.compbiolchem.2024.108284. Online ahead of print.
ABSTRACT
Enhancers are vital elements in the genome that boost the transcriptional activity of neighboring genes and are essential in regulating cell-specific gene expression. Therefore, accurately identifying and characterizing enhancers is essential for comprehending gene regulatory networks and the development of related diseases. This study introduces MPDL-Enhancer, a novel multi-perspective deep learning framework aimed at enhancer characterization and identification. In this study, enhancer sequences are encoded using the dna2vec model along with features derived from the structural properties of DNA sequences. Subsequently, these representations are processed through a novel dual-scale deep neural network designed to discern subtle correlations and extended interactions embedded within the semantic content of DNA. The predictive phase of our methodology employs a Support Vector Machine classifier to render the final classification. To rigorously assess the efficacy of our approach, a comprehensive evaluation was executed utilizing an independent test dataset, thereby substantiating the robustness and accuracy of our model. Our methodology demonstrated superior performance over existing computational techniques, with an accuracy (ACC) of 81.00 %, a sensitivity (SN) of 79.00 %, and specificity (SP) of 83.00 %. The innovative dual-scale deep neural network and the unique feature representation strategy contributed to this performance improvement. MPDL-Enhancer has effectively characterized enhancer sequences and achieved excellent predictive performance. Building upon this foundation, we conducted an interpretability analysis of the model, which can assist researchers in identifying key features and patterns that affect the functionality of enhancers, thereby promoting a deeper understanding of gene regulatory networks.
PMID:39577030 | DOI:10.1016/j.compbiolchem.2024.108284
Improved Prediction of Ligand-Protein Binding Affinities by Meta-modeling
J Chem Inf Model. 2024 Nov 22. doi: 10.1021/acs.jcim.4c01116. Online ahead of print.
ABSTRACT
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.
PMID:39576762 | DOI:10.1021/acs.jcim.4c01116
Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction
Bioinformatics. 2024 Nov 22:btae621. doi: 10.1093/bioinformatics/btae621. Online ahead of print.
ABSTRACT
MOTIVATION: Exhaustive experimental annotation of the effect of all known protein variants remains daunting and expensive, stressing the need for scalable effect predictions. We introduce VespaG, a blazingly fast missense amino acid variant effect predictor, leveraging protein Language Model (pLM) embeddings as input to a minimal deep learning model.
RESULTS: To overcome the sparsity of experimental training data, we created a dataset of 39 million single amino acid variants from the human proteome applying the multiple sequence alignment-based effect predictor GEMME as a pseudo standard-of-truth. This setup increases interpretability compared to the baseline pLM and is easily retrainable with novel or updated pLMs. Assessed against the ProteinGym benchmark(217 multiplex assays of variant effect- MAVE- with 2.5 million variants), VespaG achieved a mean Spearman correlation of 0.48±0.02, matching top-performing methods evaluated on the same data. VespaG has the advantage of being orders of magnitude faster, predicting all mutational landscapes of all proteins in proteomes such as Homo sapiens or Drosophila melanogaster in under 30 minutes on a consumer laptop (12-core CPU, 16 GB RAM).
AVAILABILITY: VespaG is available freely at https://github.com/jschlensok/vespag. The associated training data and predictions are available at https://doi.org/10.5281/zenodo.11085958.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID:39576695 | DOI:10.1093/bioinformatics/btae621
Computed tomography-based radiomics and body composition model for predicting hepatic decompensation
Oncotarget. 2024 Nov 22;15:809-813. doi: 10.18632/oncotarget.28673.
ABSTRACT
Primary sclerosing cholangitis (PSC) is a chronic liver disease characterized by inflammation and scarring of the bile ducts, which can lead to cirrhosis and hepatic decompensation. The study aimed to explore the potential value of computational radiomics, a field that extracts quantitative features from medical images, in predicting whether or not PSC patients had hepatic decompensation. We used an in-house developed deep learning model called the body composition model, which quantifies body composition from computed tomography (CT) into four compartments: subcutaneous adipose tissue (SAT), skeletal muscle (SKM), visceral adipose tissue (VAT), and intermuscular adipose tissue (IMAT). We extracted radiomics features from all four body composition compartments and used them to build a predictive model in the training cohort. The predictive model demonstrated good performance in validation cohorts for predicting hepatic decompensation, with an accuracy score of 0.97, a precision score of 1.0, and an area under the curve (AUC) score of 0.97. Computational radiomics using CT images shows promise in predicting hepatic decompensation in primary sclerosing cholangitis patients. Our model achieved high accuracy, but predicting future events remains challenging. Further research is needed to validate clinical utility and limitations.
PMID:39576671 | DOI:10.18632/oncotarget.28673
MCNN-AAPT: accurate classification and functional prediction of amino acid and peptide transporters in secondary active transporters using protein language models and multi-window deep learning
J Biomol Struct Dyn. 2024 Nov 22:1-10. doi: 10.1080/07391102.2024.2431664. Online ahead of print.
ABSTRACT
Secondary active transporters play a crucial role in cellular physiology by facilitating the movement of molecules across cell membranes. Identifying the functional classes of these transporters, particularly amino acid and peptide transporters, is essential for understanding their involvement in various physiological processes and disease pathways, including cancer. This study aims to develop a robust computational framework that integrates pre-trained protein language models and deep learning techniques to classify amino acid and peptide transporters within the secondary active transporter (SAT) family and predict their functional association with solute carrier (SLC) proteins. The study leverages a comprehensive dataset of 448 secondary active transporters, including 36 solute carrier proteins, obtained from UniProt and the Transporter Classification Database (TCDB). Three state-of-the-art protein language models, ProtTrans, ESM-1b, and ESM-2, are evaluated within a deep learning neural network architecture that employs a multi-window scanning technique to capture local and global sequence patterns. The ProtTrans-based feature set demonstrates exceptional performance, achieving a classification accuracy of 98.21% with 87.32% sensitivity and 99.76% specificity for distinguishing amino acid and peptide transporters from other SATs. Furthermore, the model maintains strong predictive ability for SLC proteins, with an overall accuracy of 88.89% and a Matthews Correlation Coefficient (MCC) of 0.7750. This study showcases the power of integrating pre-trained protein language models and deep learning techniques for the functional classification of secondary active transporters and the prediction of associated solute carrier proteins. The findings have significant implications for drug development, disease research, and the broader understanding of cellular transport mechanisms.
PMID:39576667 | DOI:10.1080/07391102.2024.2431664
Predicting mortality in hospitalized influenza patients: integration of deep learning-based chest X-ray severity score (FluDeep-XR) and clinical variables
J Am Med Inform Assoc. 2024 Nov 22:ocae286. doi: 10.1093/jamia/ocae286. Online ahead of print.
ABSTRACT
OBJECTIVES: To pioneer the first artificial intelligence system integrating radiological and objective clinical data, simulating the clinical reasoning process, for the early prediction of high-risk influenza patients.
MATERIALS AND METHODS: Our system was developed using a cohort from National Taiwan University Hospital in Taiwan, with external validation data from ASST Grande Ospedale Metropolitano Niguarda in Italy. Convolutional neural networks pretrained on ImageNet were regressively trained using a 5-point scale to develop the influenza chest X-ray (CXR) severity scoring model, FluDeep-XR. Early, late, and joint fusion structures, incorporating varying weights of CXR severity with clinical data, were designed to predict 30-day mortality and compared with models using only CXR or clinical data. The best-performing model was designated as FluDeep. The explainability of FluDeep-XR and FluDeep was illustrated through activation maps and SHapley Additive exPlanations (SHAP).
RESULTS: The Xception-based model, FluDeep-XR, achieved a mean square error of 0.738 in the external validation dataset. The Random Forest-based late fusion model, FluDeep, outperformed all the other models, achieving an area under the receiver operating curve of 0.818 and a sensitivity of 0.706 in the external dataset. Activation maps highlighted clear lung fields. Shapley additive explanations identified age, C-reactive protein, hematocrit, heart rate, and respiratory rate as the top 5 important clinical features.
DISCUSSION: The integration of medical imaging with objective clinical data outperformed single-modality models to predict 30-day mortality in influenza patients. We ensured the explainability of our models aligned with clinical knowledge and validated its applicability across foreign institutions.
CONCLUSION: FluDeep highlights the potential of combining radiological and clinical information in late fusion design, enhancing diagnostic accuracy and offering an explainable, and generalizable decision support system.
PMID:39576664 | DOI:10.1093/jamia/ocae286
LMPTMSite: A Platform for PTM Site Prediction in Proteins Leveraging Transformer-Based Protein Language Models
Methods Mol Biol. 2025;2867:261-297. doi: 10.1007/978-1-0716-4196-5_16.
ABSTRACT
Protein post-translational modifications (PTMs) introduce new functionalities and play a critical role in the regulation of protein functions. Characterizing these modifications, especially PTM sites, is essential for unraveling complex biological systems. However, traditional experimental approaches, such as mass spectrometry, are time-consuming and expensive. Machine learning and deep learning techniques offer promising alternatives for predicting PTM sites. In this chapter, we introduce our LMPTMSite (language model-based post-translational modification site predictor) platform, which emphasizes two transformer-based protein language model (pLM) approaches: pLMSNOSite and LMSuccSite, for the prediction of S-nitrosylation sites and succinylation sites in proteins, respectively. We highlight the various methods of using pLM-based sequence encoding, explain the underlying deep learning architectures, and discuss the superior efficacy of these tools compared to other state-of-the-art tools. Subsequently, we present an analysis of runtime and memory usage for pLMSNOSite, with a focus on CPU and RAM usage as the input sequence length is scaled up. Finally, we showcase a case study predicting succinylation sites in proteins active within the tricarboxylic acid (TCA) cycle pathway using LMSuccSite, demonstrating its potential utility and efficiency in real-world biological contexts. The LMPTMSite platform, inclusive of pLMSNOSite and LMSuccSite, is freely available both as a web server ( http://kcdukkalab.org/pLMSNOSite/ and http://kcdukkalab.org/LMSuccSite/ ) and as standalone packages ( https://github.com/KCLabMTU/pLMSNOSite and https://github.com/KCLabMTU/LMSuccSite ), providing valuable tools for researchers in the field.
PMID:39576587 | DOI:10.1007/978-1-0716-4196-5_16
Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn
Methods Mol Biol. 2025;2867:201-218. doi: 10.1007/978-1-0716-4196-5_12.
ABSTRACT
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
PMID:39576583 | DOI:10.1007/978-1-0716-4196-5_12
Protein Secondary Structure and DNA/RNA Detection for Cryo-EM and Cryo-ET Using Emap2sec and Emap2sec<sup></sup>
Methods Mol Biol. 2025;2867:105-120. doi: 10.1007/978-1-0716-4196-5_6.
ABSTRACT
Cryo-electron microscopy (cryo-EM) has become a powerful tool for determining the structures of macromolecules, such as proteins and DNA/RNA complexes. While high-resolution cryo-EM maps are increasingly available, there is still a substantial number of maps determined at intermediate or low resolution. These maps present challenges when it comes to extracting structural information. In response to this, two computational methods, Emap2sec and Emap2sec+, have been developed by our group to address these challenges and benefit the analysis of cryo-EM maps. In this chapter, we describe how to use the web servers of two of our structure analysis software for cryo-EM, Emap2sec and Emapsec+. Both methods identify local structures in medium-resolution EM maps of 5-10 Å to help find and fit protein and DNA/RNA structures in EM maps. Emap2sec identifies the secondary structures of proteins, while Emap2sec+ also identifies DNA/RNA locations in cryo-EM maps. As cryo-electron tomogram (cryo-ET) has started to produce data of this resolution, these methods would be useful for cryo-ET, too. Both methods are available in the form of webservers and source code at https://kiharalab.org/emsuites/ .
PMID:39576577 | DOI:10.1007/978-1-0716-4196-5_6
Machine Learning Techniques to Infer Protein Structure and Function from Sequences: A Comprehensive Review
Methods Mol Biol. 2025;2867:79-104. doi: 10.1007/978-1-0716-4196-5_5.
ABSTRACT
The elucidation of protein structure and function plays a pivotal role in understanding biological processes and facilitating drug discovery. With the exponential growth of protein sequence data, machine learning techniques have emerged as powerful tools for predicting protein characteristics from sequences alone. This review provides a comprehensive overview of the importance and application of machine learning in inferring protein structure and function. We discuss various machine learning approaches, primarily focusing on convolutional neural networks and natural language processing, and their utilization in predicting protein secondary and tertiary structures, residue-residue contacts, protein function, and subcellular localization. Furthermore, we highlight the challenges associated with using machine learning techniques in this context, such as the availability of high-quality training datasets and the interpretability of models. We also delve into the latest progress in the field concerning the advancements made in the development of intricate deep learning architectures. Overall, this review underscores the significance of machine learning in advancing our understanding of protein structure and function, and its potential to revolutionize drug discovery and personalized medicine.
PMID:39576576 | DOI:10.1007/978-1-0716-4196-5_5
The Iconic α-Helix: From Pauling to the Present
Methods Mol Biol. 2025;2867:1-17. doi: 10.1007/978-1-0716-4196-5_1.
ABSTRACT
The protein folding problem dates back to Pauling's insights almost a century ago, but the first venture into actual protein structure was the Pauling-Corey-Brandson α-helix in 1951, a proposed model that was confirmed almost immediately using X-ray crystallography. Many subsequent efforts to predict protein helices from the amino acid sequence met with only partial success, as discussed here. Surprisingly, in 2021, these efforts were superseded by deep-learning artificial intelligence, especially AlphaFold2, a machine learning program based on neural nets. This approach can predict most protein structures successfully at or near atomic resolution. Deservedly, deep-learning artificial intelligence was named Science magazine's 2021 "breakthrough of the year." Today, ~200 million predicted protein structures can be downloaded from the AlphaFold2 Protein Structure Database. Deep learning represents a deep conundrum because these successfully predicted macromolecular structures are based on methods that are completely devoid of a hypothesis or of any physical chemistry. Perhaps we are now poised to transcend five centuries of reductive science.
PMID:39576572 | DOI:10.1007/978-1-0716-4196-5_1
Artificial intelligence improves risk prediction in cardiovascular disease
Geroscience. 2024 Nov 22. doi: 10.1007/s11357-024-01438-z. Online ahead of print.
ABSTRACT
Cardiovascular disease (CVD) represents a major public health issue, claiming numerous lives. This study aimed to demonstrate the advantages of employing artificial intelligence (AI) models to improve the prediction of CVD risk using a large cohort of relatively healthy adults aged 70 years or more. In this study, deep learning (DL) models provide enhanced predictions (DeepSurv: C-index = 0.662, Integrated Brier Score (IBS) = 0.046; Neural Multi-Task Logistic Regression (NMTLR): C-index = 0.660, IBS = 0.047), as compared to the conventional (Cox: C-index = 0.634, IBS = 0.048) and machine learning (Random Survival Forest (RSF): C-index = 0.641, IBS = 0.048) models. The risk scores generated by the DL models also demonstrated superior performance. Moreover, AI models (NMTLR, DeepSurv, and RSF) were more effective, requiring the treatment of only 9 to 10 patients to prevent one CVD event, compared to the conventional model requiring treatment of nearly four times higher number of patients (NNT = 38). In summary, AI models, particularly DL models, possess superior predictive capabilities that can enhance patient treatment in a more cost-effective manner. Nonetheless, AI tools should serve to complement and assist healthcare professionals, rather than supplant them. The DeepSurv model, selected due to its relatively superior performance, is deployed in the form of web application locally, and is accessible on GitHub ( https://github.com/Robidar/Chuchu_Depl ). Finally, as we have demonstrated the benefit of using AI for reassessment of an existing CVD risk score, we recommend other infamous risk scores undergo similar reassessment.
PMID:39576563 | DOI:10.1007/s11357-024-01438-z
Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions
Proteins. 2024 Nov 22. doi: 10.1002/prot.26767. Online ahead of print.
ABSTRACT
Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.
PMID:39575640 | DOI:10.1002/prot.26767
A self-driven ESN-DSS approach for effective COVID-19 time series prediction and modelling
Epidemiol Infect. 2024 Nov 22;152:e146. doi: 10.1017/S0950268824000992.
ABSTRACT
Since the outbreak of the COVID-19 epidemic, it has posed a great crisis to the health and economy of the world. The objective is to provide a simple deep-learning approach for predicting, modelling, and evaluating the time evolutions of the COVID-19 epidemic. The Dove Swarm Search (DSS) algorithm is integrated with the echo state network (ESN) to optimize the weight. The ESN-DSS model is constructed to predict the evolution of the COVID-19 time series. Specifically, the self-driven ESN-DSS is created to form a closed feedback loop by replacing the input with the output. The prediction results, which involve COVID-19 temporal evolutions of multiple countries worldwide, indicate the excellent prediction performances of our model compared with several artificial intelligence prediction methods from the literature (e.g., recurrent neural network, long short-term memory, gated recurrent units, variational auto encoder) at the same time scale. Moreover, the model parameters of the self-driven ESN-DSS are determined which acts as a significant impact on the prediction performance. As a result, the network parameters are adjusted to improve the prediction accuracy. The prediction results can be used as proposals to help governments and medical institutions formulate pertinent precautionary measures to prevent further spread. In addition, this study is not only limited to COVID-19 time series forecasting but also applicable to other nonlinear time series prediction problems.
PMID:39575546 | DOI:10.1017/S0950268824000992
Artificial intelligence application in the diagnosis and treatment of bladder cancer: advance, challenges, and opportunities
Front Oncol. 2024 Nov 7;14:1487676. doi: 10.3389/fonc.2024.1487676. eCollection 2024.
ABSTRACT
Bladder cancer (BC) is a serious and common malignant tumor of the urinary system. Accurate and convenient diagnosis and treatment of BC is a major challenge for the medical community. Due to the limited medical resources, the existing diagnosis and treatment protocols for BC without the assistance of artificial intelligence (AI) still have certain shortcomings. In recent years, with the development of AI technologies such as deep learning and machine learning, the maturity of AI has made it more and more applied to the medical field, including improving the speed and accuracy of BC diagnosis and providing more powerful treatment options and recommendations related to prognosis. Advances in medical imaging technology and molecular-level research have also contributed to the further development of such AI applications. However, due to differences in the sources of training information and algorithm design issues, there is still room for improvement in terms of accuracy and transparency for the broader use of AI in clinical practice. With the popularization of digitization of clinical information and the proposal of new algorithms, artificial intelligence is expected to learn more effectively and analyze similar cases more accurately and reliably, promoting the development of precision medicine, reducing resource consumption, and speeding up diagnosis and treatment. This review focuses on the application of artificial intelligence in the diagnosis and treatment of BC, points out some of the challenges it faces, and looks forward to its future development.
PMID:39575423 | PMC:PMC11578829 | DOI:10.3389/fonc.2024.1487676