Deep learning
LMPTMSite: A Platform for PTM Site Prediction in Proteins Leveraging Transformer-Based Protein Language Models
Methods Mol Biol. 2025;2867:261-297. doi: 10.1007/978-1-0716-4196-5_16.
ABSTRACT
Protein post-translational modifications (PTMs) introduce new functionalities and play a critical role in the regulation of protein functions. Characterizing these modifications, especially PTM sites, is essential for unraveling complex biological systems. However, traditional experimental approaches, such as mass spectrometry, are time-consuming and expensive. Machine learning and deep learning techniques offer promising alternatives for predicting PTM sites. In this chapter, we introduce our LMPTMSite (language model-based post-translational modification site predictor) platform, which emphasizes two transformer-based protein language model (pLM) approaches: pLMSNOSite and LMSuccSite, for the prediction of S-nitrosylation sites and succinylation sites in proteins, respectively. We highlight the various methods of using pLM-based sequence encoding, explain the underlying deep learning architectures, and discuss the superior efficacy of these tools compared to other state-of-the-art tools. Subsequently, we present an analysis of runtime and memory usage for pLMSNOSite, with a focus on CPU and RAM usage as the input sequence length is scaled up. Finally, we showcase a case study predicting succinylation sites in proteins active within the tricarboxylic acid (TCA) cycle pathway using LMSuccSite, demonstrating its potential utility and efficiency in real-world biological contexts. The LMPTMSite platform, inclusive of pLMSNOSite and LMSuccSite, is freely available both as a web server ( http://kcdukkalab.org/pLMSNOSite/ and http://kcdukkalab.org/LMSuccSite/ ) and as standalone packages ( https://github.com/KCLabMTU/pLMSNOSite and https://github.com/KCLabMTU/LMSuccSite ), providing valuable tools for researchers in the field.
PMID:39576587 | DOI:10.1007/978-1-0716-4196-5_16
Accurate and Fast Prediction of Intrinsic Disorder Using flDPnn
Methods Mol Biol. 2025;2867:201-218. doi: 10.1007/978-1-0716-4196-5_12.
ABSTRACT
Intrinsically disordered proteins (IDPs) that include one or more intrinsically disordered regions (IDRs) are abundant across all domains of life and viruses and play numerous functional roles in various cellular processes. Due to a relatively low throughput and high cost of experimental techniques for identifying IDRs, there is a growing need for fast and accurate computational algorithms that accurately predict IDRs/IDPs from protein sequences. We describe one of the leading disorder predictors, flDPnn. Results from a recent community-organized Critical Assessment of Intrinsic Disorder (CAID) experiment show that flDPnn provides fast and state-of-the-art predictions of disorder, which are supplemented with the predictions of several major disorder functions. This chapter provides a practical guide to flDPnn, which includes a brief explanation of its predictive model, descriptions of its web server and standalone versions, and a case study that showcases how to read and understand flDPnn's predictions.
PMID:39576583 | DOI:10.1007/978-1-0716-4196-5_12
Protein Secondary Structure and DNA/RNA Detection for Cryo-EM and Cryo-ET Using Emap2sec and Emap2sec<sup></sup>
Methods Mol Biol. 2025;2867:105-120. doi: 10.1007/978-1-0716-4196-5_6.
ABSTRACT
Cryo-electron microscopy (cryo-EM) has become a powerful tool for determining the structures of macromolecules, such as proteins and DNA/RNA complexes. While high-resolution cryo-EM maps are increasingly available, there is still a substantial number of maps determined at intermediate or low resolution. These maps present challenges when it comes to extracting structural information. In response to this, two computational methods, Emap2sec and Emap2sec+, have been developed by our group to address these challenges and benefit the analysis of cryo-EM maps. In this chapter, we describe how to use the web servers of two of our structure analysis software for cryo-EM, Emap2sec and Emapsec+. Both methods identify local structures in medium-resolution EM maps of 5-10 Å to help find and fit protein and DNA/RNA structures in EM maps. Emap2sec identifies the secondary structures of proteins, while Emap2sec+ also identifies DNA/RNA locations in cryo-EM maps. As cryo-electron tomogram (cryo-ET) has started to produce data of this resolution, these methods would be useful for cryo-ET, too. Both methods are available in the form of webservers and source code at https://kiharalab.org/emsuites/ .
PMID:39576577 | DOI:10.1007/978-1-0716-4196-5_6
Machine Learning Techniques to Infer Protein Structure and Function from Sequences: A Comprehensive Review
Methods Mol Biol. 2025;2867:79-104. doi: 10.1007/978-1-0716-4196-5_5.
ABSTRACT
The elucidation of protein structure and function plays a pivotal role in understanding biological processes and facilitating drug discovery. With the exponential growth of protein sequence data, machine learning techniques have emerged as powerful tools for predicting protein characteristics from sequences alone. This review provides a comprehensive overview of the importance and application of machine learning in inferring protein structure and function. We discuss various machine learning approaches, primarily focusing on convolutional neural networks and natural language processing, and their utilization in predicting protein secondary and tertiary structures, residue-residue contacts, protein function, and subcellular localization. Furthermore, we highlight the challenges associated with using machine learning techniques in this context, such as the availability of high-quality training datasets and the interpretability of models. We also delve into the latest progress in the field concerning the advancements made in the development of intricate deep learning architectures. Overall, this review underscores the significance of machine learning in advancing our understanding of protein structure and function, and its potential to revolutionize drug discovery and personalized medicine.
PMID:39576576 | DOI:10.1007/978-1-0716-4196-5_5
The Iconic α-Helix: From Pauling to the Present
Methods Mol Biol. 2025;2867:1-17. doi: 10.1007/978-1-0716-4196-5_1.
ABSTRACT
The protein folding problem dates back to Pauling's insights almost a century ago, but the first venture into actual protein structure was the Pauling-Corey-Brandson α-helix in 1951, a proposed model that was confirmed almost immediately using X-ray crystallography. Many subsequent efforts to predict protein helices from the amino acid sequence met with only partial success, as discussed here. Surprisingly, in 2021, these efforts were superseded by deep-learning artificial intelligence, especially AlphaFold2, a machine learning program based on neural nets. This approach can predict most protein structures successfully at or near atomic resolution. Deservedly, deep-learning artificial intelligence was named Science magazine's 2021 "breakthrough of the year." Today, ~200 million predicted protein structures can be downloaded from the AlphaFold2 Protein Structure Database. Deep learning represents a deep conundrum because these successfully predicted macromolecular structures are based on methods that are completely devoid of a hypothesis or of any physical chemistry. Perhaps we are now poised to transcend five centuries of reductive science.
PMID:39576572 | DOI:10.1007/978-1-0716-4196-5_1
Artificial intelligence improves risk prediction in cardiovascular disease
Geroscience. 2024 Nov 22. doi: 10.1007/s11357-024-01438-z. Online ahead of print.
ABSTRACT
Cardiovascular disease (CVD) represents a major public health issue, claiming numerous lives. This study aimed to demonstrate the advantages of employing artificial intelligence (AI) models to improve the prediction of CVD risk using a large cohort of relatively healthy adults aged 70 years or more. In this study, deep learning (DL) models provide enhanced predictions (DeepSurv: C-index = 0.662, Integrated Brier Score (IBS) = 0.046; Neural Multi-Task Logistic Regression (NMTLR): C-index = 0.660, IBS = 0.047), as compared to the conventional (Cox: C-index = 0.634, IBS = 0.048) and machine learning (Random Survival Forest (RSF): C-index = 0.641, IBS = 0.048) models. The risk scores generated by the DL models also demonstrated superior performance. Moreover, AI models (NMTLR, DeepSurv, and RSF) were more effective, requiring the treatment of only 9 to 10 patients to prevent one CVD event, compared to the conventional model requiring treatment of nearly four times higher number of patients (NNT = 38). In summary, AI models, particularly DL models, possess superior predictive capabilities that can enhance patient treatment in a more cost-effective manner. Nonetheless, AI tools should serve to complement and assist healthcare professionals, rather than supplant them. The DeepSurv model, selected due to its relatively superior performance, is deployed in the form of web application locally, and is accessible on GitHub ( https://github.com/Robidar/Chuchu_Depl ). Finally, as we have demonstrated the benefit of using AI for reassessment of an existing CVD risk score, we recommend other infamous risk scores undergo similar reassessment.
PMID:39576563 | DOI:10.1007/s11357-024-01438-z
Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions
Proteins. 2024 Nov 22. doi: 10.1002/prot.26767. Online ahead of print.
ABSTRACT
Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.
PMID:39575640 | DOI:10.1002/prot.26767
A self-driven ESN-DSS approach for effective COVID-19 time series prediction and modelling
Epidemiol Infect. 2024 Nov 22;152:e146. doi: 10.1017/S0950268824000992.
ABSTRACT
Since the outbreak of the COVID-19 epidemic, it has posed a great crisis to the health and economy of the world. The objective is to provide a simple deep-learning approach for predicting, modelling, and evaluating the time evolutions of the COVID-19 epidemic. The Dove Swarm Search (DSS) algorithm is integrated with the echo state network (ESN) to optimize the weight. The ESN-DSS model is constructed to predict the evolution of the COVID-19 time series. Specifically, the self-driven ESN-DSS is created to form a closed feedback loop by replacing the input with the output. The prediction results, which involve COVID-19 temporal evolutions of multiple countries worldwide, indicate the excellent prediction performances of our model compared with several artificial intelligence prediction methods from the literature (e.g., recurrent neural network, long short-term memory, gated recurrent units, variational auto encoder) at the same time scale. Moreover, the model parameters of the self-driven ESN-DSS are determined which acts as a significant impact on the prediction performance. As a result, the network parameters are adjusted to improve the prediction accuracy. The prediction results can be used as proposals to help governments and medical institutions formulate pertinent precautionary measures to prevent further spread. In addition, this study is not only limited to COVID-19 time series forecasting but also applicable to other nonlinear time series prediction problems.
PMID:39575546 | DOI:10.1017/S0950268824000992
Artificial intelligence application in the diagnosis and treatment of bladder cancer: advance, challenges, and opportunities
Front Oncol. 2024 Nov 7;14:1487676. doi: 10.3389/fonc.2024.1487676. eCollection 2024.
ABSTRACT
Bladder cancer (BC) is a serious and common malignant tumor of the urinary system. Accurate and convenient diagnosis and treatment of BC is a major challenge for the medical community. Due to the limited medical resources, the existing diagnosis and treatment protocols for BC without the assistance of artificial intelligence (AI) still have certain shortcomings. In recent years, with the development of AI technologies such as deep learning and machine learning, the maturity of AI has made it more and more applied to the medical field, including improving the speed and accuracy of BC diagnosis and providing more powerful treatment options and recommendations related to prognosis. Advances in medical imaging technology and molecular-level research have also contributed to the further development of such AI applications. However, due to differences in the sources of training information and algorithm design issues, there is still room for improvement in terms of accuracy and transparency for the broader use of AI in clinical practice. With the popularization of digitization of clinical information and the proposal of new algorithms, artificial intelligence is expected to learn more effectively and analyze similar cases more accurately and reliably, promoting the development of precision medicine, reducing resource consumption, and speeding up diagnosis and treatment. This review focuses on the application of artificial intelligence in the diagnosis and treatment of BC, points out some of the challenges it faces, and looks forward to its future development.
PMID:39575423 | PMC:PMC11578829 | DOI:10.3389/fonc.2024.1487676
Artificial intelligence-assisted delineation for postoperative radiotherapy in patients with lung cancer: a prospective, multi-center, cohort study
Front Oncol. 2024 Oct 22;14:1388297. doi: 10.3389/fonc.2024.1388297. eCollection 2024.
ABSTRACT
BACKGROUND: Postoperative radiotherapy (PORT) is an important treatment for lung cancer patients with poor prognostic features, but accurate delineation of the clinical target volume (CTV) and organs at risk (OARs) is challenging and time-consuming. Recently, deep learning-based artificial intelligent (AI) algorithms have shown promise in automating this process.
OBJECTIVE: To evaluate the clinical utility of a deep learning-based auto-segmentation model for AI-assisted delineating CTV and OARs in patients undergoing PORT, and to compare its accuracy and efficiency with manual delineation by radiation oncology residents from different levels of medical institutions.
METHODS: We previously developed an AI auto-segmentation model in 664 patients and validated its contouring performance in 149 patients. In this multi-center, validation trial, we prospectively involved 55 patients and compared the accuracy and efficiency of 3 contouring methods: (i) unmodified AI auto-segmentation, (ii) fully manual delineation by junior radiation oncology residents from different medical centers, and (iii) manual modifications based on AI segmentation model (AI-assisted delineation). The ground truth of CTV and OARs was delineated by 3 senior radiation oncologists. Contouring accuracy was evaluated by Dice similarity coefficient (DSC), Hausdorff distance (HD), and mean distance of agreement (MDA). Inter-observer consistency was assessed by volume and coefficient of variation (CV).
RESULTS: AI-assisted delineation achieved significantly higher accuracy compared to unmodified AI auto-contouring and fully manual delineation by radiation oncologists, with median HD, MDA, and DCS values of 20.03 vs. 21.55 mm, 2.57 vs. 3.06 mm, 0.745 vs. 0.703 (all P<0.05) for CTV, respectively. The results of OARs contours were similar. CV for OARs was reduced by approximately 50%. In addition to better contouring accuracy, the AI-assisted delineation significantly decreased the consuming time and improved the efficiency.
CONCLUSION: AI-assisted CTV and OARs delineation for PORT significantly improves the accuracy and efficiency in the real-world setting, compared with pure AI auto-segmentation or fully manual delineation by junior oncologists. AI-assisted approach has promising clinical potential to enhance the quality of radiotherapy planning and further improve treatment outcomes of patients with lung cancer.
PMID:39575415 | PMC:PMC11579590 | DOI:10.3389/fonc.2024.1388297
Implementation and evaluation of the three action teaching model with learning plan guidance in preventive medicine course
Front Psychol. 2024 Nov 7;15:1508432. doi: 10.3389/fpsyg.2024.1508432. eCollection 2024.
ABSTRACT
BACKGROUND: Toward the close of the 20th century, Chinese scholars introduced a novel pedagogical approach to education in China, distinguished by its divergence from conventional teaching methods. This instructional strategy assumes a pivotal role in imparting indispensable medical knowledge to students within a meticulously structured and all-encompassing framework.
OBJECTIVE: The objective of this study is to assess the effectiveness of a novel teaching approach that integrates the three action teaching model with learning plan guidance within a preventive medicine course. Through this investigation, empirical evidence will be provided regarding the impact of utilizing learning guided by the three action teaching model with learning plan guidance as an innovative instructional method, thereby shedding light on its potential to enhance students' autonomous learning in the field of preventive medicine.
METHODS: The control group consisted of 48 students from Class 2 of clinical medicine in grade 2021, who were taught using the traditional classroom teaching mode. Meanwhile, Class 1 served as the experimental group comprising 47 individuals, who received instruction through the three-action teaching model with learning plan guidance. Evaluation was conducted using course tests and questionnaires, and data analysis was performed utilizing t-tests, analysis of variance, and rank sum tests in SPSS software.
RESULTS: The average total score of the test group (79.44 ± 10.13) was significantly higher than that of the control group (70.00 ± 13.57) (t = 3.943, p < 0.001). Moreover, there were more experimental groups with total scores ranging from 80 to 89 and 90 to 100 compared to the control group (Z = 5.324, p = 0.002). The Subjective Evaluation System (SES) indicated that the experimental group (69.11 ± 8.39) outperformed the control group (61.23 ± 6.59) in terms of total scores (t = 5.095, p < 0.001), demonstrating superior performance in learning methods, emotions, engagement, and performance metrics (p < 0.05). Specifically, analysis using the Biggs study process questionnaire revealed that the experimental group exhibited higher levels of deep learning (t = 6.100, p < 0.001) and lower levels of superficial learning (t = -3.783, p < 0.001) when compared to the control group.
CONCLUSION: The implementation of a novel teaching approach that integrates the three-action teaching model with learning plan guidance significantly enhances students' academic achievements and fosters their intrinsic motivation for learning. The success of this pedagogical method can be attributed to the enhanced classroom efficiency exhibited by teachers as well as the heightened enthusiasm for learning displayed by students.
PMID:39575329 | PMC:PMC11578742 | DOI:10.3389/fpsyg.2024.1508432
Contributing to the prediction of prognosis for treated hepatocellular carcinoma: Imaging aspects that sculpt the future
World J Gastrointest Surg. 2024 Oct 27;16(10):3377-3380. doi: 10.4240/wjgs.v16.i10.3377.
ABSTRACT
A novel nomogram model to predict the prognosis of hepatocellular carcinoma (HCC) treated with radiofrequency ablation and transarterial chemoembolization was recently published in the World Journal of Gastrointestinal Surgery. This model includes clinical and laboratory factors, but emerging imaging aspects, particularly from magnetic resonance imaging (MRI) and radiomics, could enhance the predictive accuracy thereof. Multiparametric MRI and deep learning radiomics models significantly improve prognostic predictions for the treatment of HCC. Incorporating advanced imaging features, such as peritumoral hypointensity and radiomics scores, alongside clinical factors, can refine prognostic models, aiding in personalized treatment and better predicting outcomes. This letter underscores the importance of integrating novel imaging techniques into prognostic tools to better manage and treat HCC.
PMID:39575286 | PMC:PMC11577411 | DOI:10.4240/wjgs.v16.i10.3377
VINNA for neonates: Orientation independence through latent augmentations
Imaging Neurosci (Camb). 2024 May 30;2:1-26. doi: 10.1162/imag_a_00180. eCollection 2024 May 1.
ABSTRACT
A robust, fast, and accurate segmentation of neonatal brain images is highly desired to better understand and detect changes during development and disease, specifically considering the rise in imaging studies for this cohort. Yet, the limited availability of ground truth datasets, lack of standardized acquisition protocols, and wide variations of head positioning in the scanner pose challenges for method development. A few automated image analysis pipelines exist for newborn brain Magnetic Resonance Image (MRI) segmentation, but they often rely on time-consuming non-linear spatial registration procedures and require resampling to a common resolution, subject to loss of information due to interpolation and down-sampling. Without registration and image resampling, variations with respect to head positions and voxel resolutions have to be addressed differently. In deep learning, external augmentations such as rotation, translation, and scaling are traditionally used to artificially expand the representation of spatial variability, which subsequently increases both the training dataset size and robustness. However, these transformations in the image space still require resampling, reducing accuracy specifically in the context of label interpolation. We recently introduced the concept of resolution-independence with the Voxel-size Independent Neural Network framework, VINN. Here, we extend this concept by additionally shifting all rigid-transforms into the network architecture with a four degree of freedom (4-DOF) transform module, enabling resolution-aware internal augmentations (VINNA) for deep learning. In this work, we show that VINNA (i) significantly outperforms state-of-the-art external augmentation approaches, (ii) effectively addresses the head variations present specifically in newborn datasets, and (iii) retains high segmentation accuracy across a range of resolutions (0.5-1.0 mm). Furthermore, the 4-DOF transform module together with internal augmentations is a powerful, general approach to implement spatial augmentation without requiring image or label interpolation. The specific network application to newborns will be made publicly available as VINNA4neonates.
PMID:39575178 | PMC:PMC11576933 | DOI:10.1162/imag_a_00180
Geometric deep learning for diffusion MRI signal reconstruction with continuous samplings (DISCUS)
Imaging Neurosci (Camb). 2024 Apr 2;2:1-18. doi: 10.1162/imag_a_00121. eCollection 2024 Apr 1.
ABSTRACT
Diffusion-weighted magnetic resonance imaging (dMRI) permits a detailed in-vivo analysis of neuroanatomical microstructure, invaluable for clinical and population studies. However, many measurements with different diffusion-encoding directions and possibly b-values are necessary to infer the underlying tissue microstructure within different imaging voxels accurately. Two challenges particularly limit the utility of dMRI: long acquisition times limit feasible scans to only a few directional measurements, and the heterogeneity of acquisition schemes across studies makes it difficult to combine datasets. Left unaddressed by previous learning-based methods that only accept dMRI data adhering to the specific acquisition scheme used for training, there is a need for methods that accept and predict signals for arbitrary diffusion encodings. Addressing these challenges, we describe the first geometric deep learning method for continuous dMRI signal reconstruction for arbitrary diffusion sampling schemes for both the input and output. Our method combines the reconstruction accuracy and robustness of previous learning-based methods with the flexibility of model-based methods, for example, spherical harmonics or SHORE. We demonstrate that our method outperforms model-based methods and performs on par with discrete learning-based methods on single-, multi-shell, and grid-based diffusion MRI datasets. Relevant for dMRI-derived analyses, we show that our reconstruction translates to higher-quality estimates of frequently used microstructure models compared to other reconstruction methods, enabling high-quality analyses even from very short dMRI acquisitions.
PMID:39575177 | PMC:PMC11576935 | DOI:10.1162/imag_a_00121
Generative forecasting of brain activity enhances Alzheimer's classification and interpretation
ArXiv [Preprint]. 2024 Oct 30:arXiv:2410.23515v1.
ABSTRACT
Understanding the relationship between cognition and intrinsic brain activity through purely data-driven approaches remains a significant challenge in neuroscience. Resting-state functional magnetic resonance imaging (rs-fMRI) offers a non-invasive method to monitor regional neural activity, providing a rich and complex spatiotemporal data structure. Deep learning has shown promise in capturing these intricate representations. However, the limited availability of large datasets, especially for disease-specific groups such as Alzheimer's Disease (AD), constrains the generalizability of deep learning models. In this study, we focus on multivariate time series forecasting of independent component networks derived from rs-fMRI as a form of data augmentation, using both a conventional LSTM-based model and the novel Transformer-based BrainLM model. We assess their utility in AD classification, demonstrating how generative forecasting enhances classification performance. Post-hoc interpretation of BrainLM reveals class-specific brain network sensitivities associated with AD.
PMID:39575120 | PMC:PMC11581107
Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis
ArXiv [Preprint]. 2024 Oct 31:arXiv:2410.23595v1.
ABSTRACT
The success of machine learning models relies heavily on effectively representing high-dimensional data. However, ensuring data representations capture human-understandable concepts remains difficult, often requiring the incorporation of prior knowledge and decomposition of data into multiple subspaces. Traditional linear methods fall short in modeling more than one space, while more expressive deep learning approaches lack interpretability. Here, we introduce Supervised Independent Subspace Principal Component Analysis ($\texttt{sisPCA}$), a PCA extension designed for multi-subspace learning. Leveraging the Hilbert-Schmidt Independence Criterion (HSIC), $\texttt{sisPCA}$ incorporates supervision and simultaneously ensures subspace disentanglement. We demonstrate $\texttt{sisPCA}$'s connections with autoencoders and regularized linear regression and showcase its ability to identify and separate hidden data structures through extensive applications, including breast cancer diagnosis from image features, learning aging-associated DNA methylation changes, and single-cell analysis of malaria infection. Our results reveal distinct functional pathways associated with malaria colonization, underscoring the essentiality of explainable representation in high-dimensional data analysis.
PMID:39575118 | PMC:PMC11581103
Ion channel classification through machine learning and protein language model embeddings
J Integr Bioinform. 2024 Nov 25. doi: 10.1515/jib-2023-0047. Online ahead of print.
ABSTRACT
Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.
PMID:39572876 | DOI:10.1515/jib-2023-0047
Comparison Between Conventional and Artificial Intelligence-Assisted Setup for Digital Implant Planning: Accuracy, Time-Efficiency, and User Experience
Clin Oral Implants Res. 2024 Nov 21. doi: 10.1111/clr.14382. Online ahead of print.
ABSTRACT
OBJECTIVES: To investigate the reliability and time efficiency of the conventional compared to the automatic artificial intelligence (AI) segmentation of the mandibular canal and registration of the CBCT with the model scan data, in relation to clinician's experience.
MATERIALS AND METHODS: Twenty clinicians, 10 with a moderate and 10 with a high experience in computer-assisted implant planning, were asked to perform a bilateral localization of the mandibular canal, followed by a registration of the intraoral model scan with the CBCT. Subsequently, for each data set and each participant, the same operations were performed utilizing the AI tool. Statistical significance was assessed via a mixed model (using the PROC MIXED statement and the compound symmetry covariance structure).
RESULTS: The mean time for the segmentation of the mandibular canals and the registration of the models was 4.75 (2.03)min for the manual and 2.03 (0.36) min for the AI-automated operations (p < 0.001). The mean discrepancy in the mandibular canals was 0.71 (1.80) mm RMS error for the manual segmentation and 0.68 (0.36) RMS error for the AI-assisted segmentation (p > 0.05). For the registration between the CBCT and the intraoral scans, the mean discrepancy was 0.45 (0.16) mm for the manual and 0.37 (0.07) mm for the AI-assisted superimposition (p > 0.05).
CONCLUSIONS: AI-automated implant planning tools are feasible options that can lead to a similar or better accuracy compared to the conventional manual workflow, providing improved time efficiency for both experienced and less experienced users. Further research including a variety of software and data sets is required to be able to generalize the outcomes of the present study.
PMID:39572789 | DOI:10.1111/clr.14382
The diagnostic value of MRI segmentation technique for shoulder joint injuries based on deep learning
Sci Rep. 2024 Nov 21;14(1):28885. doi: 10.1038/s41598-024-80441-y.
ABSTRACT
This work is to investigate the diagnostic value of a deep learning-based magnetic resonance imaging (MRI) image segmentation (IS) technique for shoulder joint injuries (SJIs) in swimmers. A novel multi-scale feature fusion network (MSFFN) is developed by optimizing and integrating the AlexNet and U-Net algorithms for the segmentation of MRI images of the shoulder joint. The model is evaluated using metrics such as the Dice similarity coefficient (DSC), positive predictive value (PPV), and sensitivity (SE). A cohort of 52 swimmers with SJIs from Guangzhou Hospital serve as the subjects for this study, wherein the accuracy of the developed shoulder joint MRI IS model in diagnosing swimmers' SJIs is analyzed. The results reveal that the DSC for segmenting joint bones in MRI images based on the MSFFN algorithm is 92.65%, with PPV of 95.83% and SE of 96.30%. Similarly, the DSC for segmenting humerus bones in MRI images is 92.93%, with PPV of 95.56% and SE of 92.78%. The MRI IS algorithm exhibits an accuracy of 86.54% in diagnosing types of SJIs in swimmers, surpassing the conventional diagnostic accuracy of 71.15%. The consistency between the diagnostic results of complete tear, superior surface tear, inferior surface tear, and intratendinous tear of SJIs in swimmers and arthroscopic diagnostic results yield a Kappa value of 0.785 and an accuracy of 87.89%. These findings underscore the significant diagnostic value and potential of the MRI IS technique based on the MSFFN algorithm in diagnosing SJIs in swimmers.
PMID:39572780 | DOI:10.1038/s41598-024-80441-y
Large language modeling and deep learning shed light on RNA structure prediction
Nat Methods. 2024 Nov 21. doi: 10.1038/s41592-024-02488-z. Online ahead of print.
NO ABSTRACT
PMID:39572717 | DOI:10.1038/s41592-024-02488-z