Deep learning
What is the role of artificial intelligence in general surgery?
Ewha Med J. 2024 Apr;47(2):e22. doi: 10.12771/emj.2024.e22. Epub 2024 Apr 30.
ABSTRACT
The capabilities of artificial intelligence (AI) have recently surged, largely due to advancements in deep learning inspired by the structure and function of the neural networks of the human brain. In the medical field, the impact of AI spans from diagnostics and treatment recommendations to patient engagement and monitoring, considerably improving efficiency and outcomes. The clinical integration of AI has also been examined in specialties, including pathology, radiology, and oncology. General surgery primarily involves manual manipulation and includes preoperative, intraoperative, and postoperative care, all of which are critical for saving lives. Other fields have strived to utilize and adopt AI; nonetheless, general surgery appears to have retrogressed. In this review, we analyzed the published research, to understand how the application of AI in general surgery differs from that in other medical fields. Based on previous research in other fields, the application of AI in the preoperative stage is nearing feasibility. Ongoing research efforts aim to utilize AI to improve and predict operative outcomes, enhance performance, and improve patient care. However, the use of AI in the operating room remains significantly understudied. Moreover, ethical responsibilities are associated with such research, necessitating extensive work to gather evidence. By fostering interdisciplinary collaboration and leveraging lessons from AI success stories in other fields, AI tools could be specifically tailored for general surgery. Surgeons should be prepared for the integration of AI into clinical practice to achieve better outcomes; therefore, the time has come to consider ethical and legal implications.
PMID:40703691 | PMC:PMC12093534 | DOI:10.12771/emj.2024.e22
A hybrid model for detecting motion artifacts in ballistocardiogram signals
Biomed Eng Online. 2025 Jul 23;24(1):92. doi: 10.1186/s12938-025-01426-0.
ABSTRACT
BACKGROUND: The field of contactless health monitoring has witnessed significant advancements with the advent of piezoelectric sensing technology, which enables the monitoring of vital signs such as heart rate and respiration without requiring direct contact with the subject. This is especially advantageous for home sleep monitoring, where traditional wearable devices may be intrusive. However, the acquisition of piezoelectric signals is often impeded by motion artifacts, which are distortions caused by the subject of movements and can obscure the underlying physiological signals. These artifacts can significantly impair the reliability of signal analysis, necessitating effective identification and mitigation strategies. Various methods, including filtering techniques and machine learning approaches, have been employed to address this issue, but the challenge persists due to the complexity and variability of motion artifacts.
METHODS: This study introduces a hybrid model for detecting motion artifacts in ballistocardiogram (BCG) signals, utilizing a dual-channel approach. The first channel uses a deep learning model, specifically a temporal Bidirectional Gated Recurrent Unit combined with a Fully Convolutional Network (BiGRU-FCN), to identify motion artifacts. The second channel employs multi-scale standard deviation empirical thresholds to detect motion. The model was designed to address the randomness and complexity of motion artifacts by integrating deep learning capabilities with manual feature judgment. The data used for this study were collected from patients with sleep apnea using piezoelectric sensors, and the model's performance was evaluated using a set of predefined metrics.
RESULTS: This paper proposes and confirms through analysis that the proposed hybrid model exhibits exceptional accuracy in detecting motion artifacts in ballistocardiogram (BCG) signals. Employing a dual-channel approach, the model integrates multi-scale feature judgment with a BiGRU-FCN deep learning model. It achieved a classification accuracy of 98.61% and incurred only a 4.61% loss of valid signals in non-motion intervals. When tested on data from ten patients with sleep apnea, the model demonstrated robust performance, highlighting its potential for practical use in home sleep monitoring.
CONCLUSION: The proposed hybrid model presents a significant advancement in the detection of motion artifacts in BCG signals. Compared to existing methods such as the Alivar method [29], Enayati method [22], and Wiard method [20], our hybrid model achieves higher classification accuracy (98.61%) and lower valid signal loss ratio (4.61%). This demonstrates the effectiveness of integrating multi-scale standard deviation empirical thresholds with a deep learning model in enhancing the accuracy and robustness of motion artifact detection. This approach is particularly effective for home sleep monitoring, where motion artifacts can significantly impact the reliability of health monitoring data. The study findings suggest that the proposed hybrid model could serve as a valuable tool for improving the accuracy of motion artifact detection in various health monitoring applications.
PMID:40702570 | DOI:10.1186/s12938-025-01426-0
Advancing ADMET prediction for major CYP450 isoforms: graph-based models, limitations, and future directions
Biomed Eng Online. 2025 Jul 23;24(1):93. doi: 10.1186/s12938-025-01412-6.
ABSTRACT
Understanding Cytochrome P450 (CYP) enzyme-mediated metabolism is critical for accurate Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) predictions, which play a pivotal role in drug discovery. Traditional approaches, while foundational, often face challenges related to cost, scalability, and translatability. This review provides a comprehensive exploration of how graph-based computational techniques, including Graph Neural Networks (GNNs), Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs), have emerged as powerful tools for modeling complex CYP enzyme interactions and predicting ADMET properties with improved precision. Focusing on key CYP isoforms-CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4-we synthesize current research advancements and methodologies, emphasizing the integration of multi-task learning, attention mechanisms, and explainable AI (XAI) in enhancing the accuracy and interpretability of ADMET predictions. Furthermore, we address ongoing challenges, such as dataset variability and the generalization of models to novel chemical spaces. The review concludes by identifying future research opportunities, particularly in improving scalability, incorporating real-time experimental validation, and expanding focus on enzyme-specific interactions. These insights underscore the transformative potential of graph-based approaches in advancing drug development and optimizing safety evaluations.
PMID:40702475 | DOI:10.1186/s12938-025-01412-6
Geospatial analysis enables combined poultry-fish farm monitoring in the fragile state of Myanmar
Nat Food. 2025 Jul;6(7):664-667. doi: 10.1038/s43016-025-01192-1. Epub 2025 Jul 23.
ABSTRACT
Food security is challenging to measure in fragile contexts. Here we combine data from previous field surveys with remotely sensed images and apply deep-learning techniques to estimate changes in the number and area of chicken houses on integrated chicken-fish farms and the supply of chicken meat and eggs from 2010 to 2023 in Yangon region, Myanmar. Yangon's poultry sector grew ~10% annually from 2010 to 2020 but contracted ~8% annually from 2020 to 2023.
PMID:40702332 | DOI:10.1038/s43016-025-01192-1
Non-invasive meningitis screening in neonates and infants: multicentre international study
Pediatr Res. 2025 Jul 23. doi: 10.1038/s41390-025-04179-7. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVES: Meningitis diagnosis requires a lumbar puncture (LP) to obtain cerebrospinal fluid (CSF) for a laboratory-based analysis. In high-income settings, LPs are part of the systematic approach to screen for meningitis, and most yield negative results. In low- and middle-income settings, LPs are seldom performed, and suspected cases are often treated empirically. The aim of this study was to validate a non-invasive transfontanellar white blood cell (WBC) counter in CSF to screen for meningitis.
METHODS: We conducted a prospective study across three Spanish hospitals, one Mozambican and one Moroccan hospital (2020-2023). We included patients under 24 months with suspected meningitis, an open fontanelle, and a LP performed within 24 h from recruitment. High-resolution-ultrasound (HRUS) images of the CSF were obtained using a customized probe. A deep-learning model was trained to classify CSF patterns based on LPs WBC counts, using a 30cells/mm3 threshold.
RESULTS: The algorithm was applied to 3782 images from 76 patients. It correctly classified 17/18 CSFs with ≥ 30 WBC, and 55/58 controls (sensitivity 94.4%, specificity 94.8%). The only false negative was paired to a traumatic LP with 40 corrected WBC/mm3.
CONCLUSIONS: This non-invasive device could be an accurate tool for screening meningitis in neonates and young infants, modulating LP indications.
IMPACT: Our non-invasive, high-resolution ultrasound device achieved 94% accuracy in detecting elevated leukocyte counts in neonates and infants with suspected meningitis, compared to the gold standard (lumbar punctures and laboratory analysis). This first-in-class screening device introduces the first non-invasive method for neonatal and infant meningitis screening, potentially modulating lumbar puncture indications. This technology could substantially reduce lumbar punctures in low-suspicion cases and provides a viable alternative critically ill patients worldwide or in settings where lumbar punctures are unfeasible, especially in low-income countries).
PMID:40702208 | DOI:10.1038/s41390-025-04179-7
Multi-camera spatiotemporal deep learning framework for real-time abnormal behavior detection in dense urban environments
Sci Rep. 2025 Jul 23;15(1):26813. doi: 10.1038/s41598-025-12388-7.
ABSTRACT
The emerging density in today's urban environments requires a strong multi-camera architecture for real-time abnormality detection and behavior analysis. Most of the existing methods tend to fail in detecting unusual behaviors due to occlusion, dynamic scene changes and high computational inefficiency. These failures often result in high rates of false positives and poor generalization for unseen anomalies. Both traditional graphs based and even the current CNN-RNN systems fail to capture complex social interactions and spatiotemporal dependencies; therefore, much is limited in such scenarios where people crowd. To address those drawbacks, this research proposes a deep learning framework for abnormal behavior detection with multiple cameras using spatiotemporal information, integrating several new methodologies. Multi Scale Graph Attention Networks (MS-GAT) are used to achieve interaction-aware anomaly detection, which has resulted in up to 30% reduction in false positives. RL-DCAT or the Reinforcement Learning Based Dynamic Camera Attention Transformer works very efficiently for optimizing surveillance focus, which helps reduce 40% of the computational overhead and increases recall by 15%. Given this, STICL-Spatiotemporal Inverse Contrastive Learning, which uses an inverse contrastive anomaly memory, increases the generalization to unseen rare anomalies by 25% improved recall. Neuromorphic event-based encoding captures speed action analysis through spiking neural networks, lowering detection latency by 60%. Finally, the BGS-MFA synthesizes new abnormal behaviors using generative behavior synthesis and meta-learned few-shot adaptation to generalize anomaly detection by 35%. Evaluation on the UCF-Crime, ShanghaiTech and Avenue Datasets showed 40% better false alarm reduction, 50% computational demands lower and an impressive 98% real-time efficiency of this multi-faceted framework. This total framework will enable multi-camera crowd surveillance with adaptive scalability and resource provisioning for real-time dynamic behavioral anomaly detection in real-world settings.
PMID:40702170 | DOI:10.1038/s41598-025-12388-7
Diabetes diagnosis using a hybrid CNN LSTM MLP ensemble
Sci Rep. 2025 Jul 23;15(1):26765. doi: 10.1038/s41598-025-12151-y.
ABSTRACT
Diabetes is a chronic condition brought on by either an inability to use insulin effectively or a lack of insulin produced by the body. If left untreated, this illness can be lethal to a person. Diabetes can be treated and a good life can be led with early diagnosis. The conventional method of identifying diabetes utilizing clinical and physical data is laborious, hence an automated method is required. An ensemble deep learning model is presented in this research for the diagnosis of diabetes which includes three steps. Preprocessing is the first step, which includes cleaning, normalizing, and organizing the data so that it can be fed into deep learning models. The second step involves employing two neural networks to retrieve features. Convolutional neural network (CNN) is the first neural network utilized for extracting the spatial characteristics of the data, while Long Short-Term Memory (LSTM) networks-more specifically, an LSTM Stack-are used to comprehend the time-dependent flow of the data based on medical information from patients. The last step is combining the two feature sets that the CNN and LSTM models have acquired to create the input for the MLP (Multi-layer Perceptron) classifier. To diagnose sickness, the MLP model serves as a meta-learner to combine and convert the data from the two feature extraction algorithms into the target variable. According to the implementation results, the suggested approach outperformed the compared approaches in terms of average accuracy and precision, achieving 98.28% and 0.99%, respectively, indicating a very great capacity to identify diabetes.
PMID:40702146 | DOI:10.1038/s41598-025-12151-y
Achieving environmental sustainability via an integrated shampoo optimized BiLSTM-Transformer model for enhanced time-series forecasting
Sci Rep. 2025 Jul 23;15(1):26855. doi: 10.1038/s41598-025-11301-6.
ABSTRACT
Accurate forecasting plays a vital role in enhancing the efficiency of power systems, ensuring better resource management, and supporting strategic decision-making. This work presents BiLSTM-Transformer, a hybrid deep learning model that integrates Bidirectional Long Short-Term Memory (BiLSTM) networks with Transformer architecture to improve predictive performance in complex time-series tasks. The model employs a second-order optimization approach using Shampoo, which strengthens convergence stability and promotes better generalization during training. By effectively modeling both short-term variations and long-range dependencies in meteorological data, BiLSTM-Transformer achieves superior forecast accuracy across multiple evaluation benchmarks. The results highlight its potential as a reliable tool for supporting sustainable energy planning and smart grid operations.
PMID:40702122 | DOI:10.1038/s41598-025-11301-6
Histology image analysis of 13 healthy tissues reveals molecular-histological correlations
Sci Rep. 2025 Jul 23;15(1):26812. doi: 10.1038/s41598-025-11853-7.
ABSTRACT
Gene expression is an important process in which genes guide the synthesis of proteins, and molecular-level differences often lead to individual phenotypic variations. Combining molecular information at the nano-level with phenotypic information at the micron level can allow for the identification of a series of gene-level biomarkers related to image phenotypes and provide a more comprehensive way to understand the impact of genes on cell morphology. Currently, most studies in imaging genomics focus on tumors. However, tumor heterogeneity mitigates the reproducibility of gene-micro-correlations. Furthermore, research on the association between imaging features and gene expression patterns in multiple tissues is still lacking. This study aims to explore the correlations between the nuclear features of healthy tissue cells and RNA expression patterns. Based on 4306 samples of 13 organs from the largest human healthy tissue database, the Genotype-Tissue Expression (GTEx) project, a deep learning-based automatic analysis framework was constructed to investigate the geno-micro-correlations across tissues. The proposed framework was used to quantitatively evaluate the nuclear morphological features of each healthy organ and identify gene sets specific to nuclear features in functionally similar organs. It revealed the biological significance of these gene sets through a pathway analysis, including cell growth, development, metabolism, and immunity. The results show that differences in nuclear morphological features of healthy organs are associated with differential RNA expression. By analyzing the correlation of differential patterns in multiple healthy organs, this study revealed the associations between gene expressions and phenotypes in multiple organs.
PMID:40702078 | DOI:10.1038/s41598-025-11853-7
Development and validation of a deep learning image quality feedback system for infant fundus photography
Sci Rep. 2025 Jul 23;15(1):26852. doi: 10.1038/s41598-025-10859-5.
ABSTRACT
Retinopathy of prematurity (ROP) is a significant cause of childhood blindness. Many healthcare institutions face a shortage of well-trained ophthalmologists for conducting screenings. Hence, we have developed the Deep Learning Infant Fundus Quality Feedback System (DLIF-QFS) to assess the overall quality of infant retinal photographs and detect common operational errors to support ROP screening and diagnosis. Our DLIF-QFS has been developed and rigorously validated using datasets comprising 13,372 images. In terms of overall quality classification, the DLIF-QFS demonstrated remarkable performance. The area under the curve (AUC) values for discriminating poor quality, adequate quality, and excellent quality images in the external validation dataset were 0.802, 0.691, and 0.926, respectively. For most classification tasks related to identifying issues in adequate and poor quality images, the AUC values consistently exceeded 0.8. In expert diagnostic tests, the DLIF-QFS improved accuracy and enhanced consistency. Its capability to identify the causes of poor image quality, enhance image quality and assist clinicians in improving diagnostic efficiency makes it a valuable tool for advancing ROP diagnosis.
PMID:40702071 | DOI:10.1038/s41598-025-10859-5
Multilingual identification of nuanced dimensions of hope speech in social media texts
Sci Rep. 2025 Jul 23;15(1):26783. doi: 10.1038/s41598-025-10683-x.
ABSTRACT
Hope plays a crucial role in human psychology and well-being, yet its expression and detection across languages remain underexplored in natural language processing (NLP). This study presents MIND-HOPE, the first-ever multiclass hope speech detection datasets for Spanish and German, collected from Twitter. The annotated dataset comprise 19,183 Spanish tweets and 21,043 German tweets, categorized into four classes: Generalized Hope, Realistic Hope, Unrealistic Hope, and Not Hope. The paper also provides a comprehensive review of existing hope speech datasets and detection techniques, and conducts a comparative evaluation of traditional machine learning, deep learning, and transformer-based approaches. Experimental results, obtained using 5-fold cross-validation, show that monolingual transformer models (e.g., bert-base-german-dbmdz-uncased and bert-base-spanish-wwm-uncased) consistently outperform multilingual models (e.g., mBERT, XLM-RoBERTa) in both binary and multiclass hope detection tasks. These findings underscore the value of language-specific fine-tuning for nuanced affective computing tasks. This study advances sentiment analysis by addressing a novel and underrepresented affective dimension-hope, and proposes robust multilingual benchmarks for future research. Theoretically, it contributes to a deeper understanding of hope as a complex emotional state with practical implications for mental health monitoring, social well-being analysis, and positive content recommendation in online spaces. By modeling hope across languages and categories, this research opens new directions in affective NLP and cross-cultural computational social science.
PMID:40702008 | DOI:10.1038/s41598-025-10683-x
Decoding natural visual scenes via learnable representations of neural spiking sequences
Neural Netw. 2025 Jul 16;192:107863. doi: 10.1016/j.neunet.2025.107863. Online ahead of print.
ABSTRACT
Visual input underpins cognitive function by providing the brain with essential environmental information. Neural decoding of visual scenes seeks to reconstruct pixel-level images from neural activity, a vital capability for vision restoration via brain-computer interfaces. However, extracting visual content from time-resolved spiking activity remains a significant challenge. Here, we introduce the Wavelet-Informed Spike Augmentation (WISA) model, which applies multilevel wavelet transforms to spike trains to learn compact representations that can be directly fed into deep reconstruction networks. When tested on recorded retinal spike data responding to natural video stimuli, WISA substantially improves reconstruction accuracy, especially in recovering fine-grained details. These results emphasize the value of temporal spike patterns for high-fidelity visual decoding and demonstrate WISA as a promising model for visual decoding.
PMID:40700800 | DOI:10.1016/j.neunet.2025.107863
Imaging-aided diagnosis and treatment based on artificial intelligence for pulmonary nodules: A review
Phys Med. 2025 Jul 21;136:105050. doi: 10.1016/j.ejmp.2025.105050. Online ahead of print.
ABSTRACT
BACKGROUND: Pulmonary nodules are critical indicators for the early detection of lung cancer; however, their diagnosis and management pose significant challenges due to the variability in nodule characteristics, reader fatigue, and limited clinical expertise, often leading to diagnostic errors. The rapid advancement of artificial intelligence (AI) presents promising solutions to address these issues.
METHODS: This review compares traditional rule-based methods, handcrafted feature-based machine learning, radiomics, deep learning, and hybrid models incorporating Transformers or attention mechanisms. It systematically compares their methodologies, clinical applications (diagnosis, treatment, prognosis), and dataset usage to evaluate performance, applicability, and limitations in pulmonary nodule management.
RESULTS: AI advances have significantly improved pulmonary nodule management, with transformer-based models achieving leading accuracy in segmentation, classification, and subtyping. The fusion of multimodal imaging CT, PET, and MRI further enhances diagnostic precision. Additionally, AI aids treatment planning and prognosis prediction by integrating radiomics with clinical data. Despite these advances, challenges remain, including domain shift, high computational demands, limited interpretability, and variability across multi-center datasets.
CONCLUSION: Artificial intelligence (AI) has transformative potential in improving the diagnosis and treatment of lung nodules, especially in improving the accuracy of lung cancer treatment and patient prognosis, where significant progress has been made.
PMID:40700795 | DOI:10.1016/j.ejmp.2025.105050
Developing a multivariable deep learning model to predict psychiatric illness in patients with epilepsy
Epilepsy Behav. 2025 Jul 21;171:110596. doi: 10.1016/j.yebeh.2025.110596. Online ahead of print.
ABSTRACT
Psychiatric disorders, such as depression, anxiety, and psychosis, are significantly more common in individuals with epilepsy, affecting nearly one-third of patients. These comorbidities often exacerbate the burden of epilepsy, impacting quality of life, social interactions, and treatment outcomes. Understanding the risk factors and predicting the occurrence of psychiatric disorders in epilepsy is crucial for early intervention. This study aims to develop and validate neural network-based predictive models to estimate the risk of psychiatric disorders in patients with epilepsy, leveraging clinical and demographic data. Retrospective data of 2,258 epilepsy patients collected from 2013 to 2023 were analyzed. Neural network models were developed using the keras and neuralnet frameworks in R. Input variables included demographic data, seizure characteristics, antiseizure drug usage, and diagnostic findings. Model performance was evaluated using accuracy, sensitivity, specificity, and ROC-AUC. Feature importance was assessed using SHAP values. Among the cohort, 27.6% of patients had psychiatric disorders. The keras model achieved an accuracy of 92.4% (ROC-AUC: 0.973), while the neuralnet model demonstrated superior performance with an accuracy of 97.16% (ROC-AUC: 0.974). Key predictors included age of onset, seizure duration, and antiseizure drug profiles. SHAP analysis highlighted the contribution of specific features to risk predictions. Cross-validation confirmed model robustness, with consistent accuracy across folds. Neural network-based models demonstrated high accuracy in predicting psychiatric comorbidities in epilepsy patients. These models offer potential for early identification of at-risk individuals, enabling personalized care strategies to improve outcomes in epilepsy management.
PMID:40700773 | DOI:10.1016/j.yebeh.2025.110596
Deep-Learning Model for Real-Time Prediction of Recurrence in Early-Stage Non-Small Cell Lung Cancer: A Multimodal Approach (RADAR CARE Study)
JCO Precis Oncol. 2025 Jul;9:e2500172. doi: 10.1200/PO-25-00172. Epub 2025 Jul 23.
ABSTRACT
PURPOSE: The surveillance protocol for early-stage non-small cell lung cancer (NSCLC) is not contingent upon individualized risk factors for recurrence. This study aimed to use comprehensive data from clinical practice to develop a deep-learning model for practical longitudinal monitoring.
METHODS: A multimodal deep-learning model with transformers was developed for real-time recurrence prediction using baseline clinical, pathological, and molecular data with longitudinal laboratory and radiologic data collected during surveillance. Patients with NSCLC (stage I to III) who underwent surgery with curative intent between January 2008 and September 2022 were included. The primary outcome was predicting recurrence within 1 year after the monitoring point. This study demonstrates the timely provision of risk scores (RADAR score) and determined thresholds and the corresponding AUC.
RESULTS: A total of 14,177 patients were enrolled (10,262 with stage I, 2,380 with stage II, and 1,703 with stage III). The model incorporated 64 clinical-pathological-molecular factors at baseline, along with longitudinal laboratory and computed tomography imaging interpretation data. The mean baseline RADAR score was 0.324 (standard deviation [SD], 0.256) in stage I, 0.660 (SD, 0.210) in stage II, and 0.824 (SD, 0.140) in stage III. The AUC for predicting relapse within 1 year of the monitoring point was 0.854 across all stages, with a sensitivity of 86.0% and a specificity of 71.3% (AUC = 0.872 in stage I, AUC = 0.737 in stage II, and AUC = 0.724 in stage III).
CONCLUSION: This pilot study introduces a deep-learning model that uses multimodal data from routine clinical practice to predict relapses in early-stage NSCLC. It demonstrates the timely provision of RADAR risk scores to clinicians for recurrence prediction, potentially guiding risk-adapted surveillance strategies and aggressive adjuvant systemic treatment.
PMID:40700672 | DOI:10.1200/PO-25-00172
Deep Learning for Bidirectional Translation between Molecular Structures and Vibrational Spectra
J Am Chem Soc. 2025 Jul 23. doi: 10.1021/jacs.5c05010. Online ahead of print.
ABSTRACT
Two deep learning models, TranSpec and SpecGNN, were developed to establish a bidirectional mapping between molecular vibrational spectra and simplified molecular input line entry system (SMILES) representations, akin to a "translation" between the language of spectra and the language of molecular structures. Initially, TranSpec achieved accuracy rates of 55 and 63% for quantum chemistry (QC)-calculated IR and Raman spectral data sets, respectively, but its performance dropped to 11% for the NIST experimental IR data set. To address this, we combined IR and Raman spectra as input; augmented the data set; employed model fusion, transfer learning, and multisource learning; applied molecular mass filtering; and leveraged SpecGNN for spectral simulation and candidate reordering. These improvements boosted TranSpec's accuracy to 53.6% for the experimental IR data set. Notably, SpecGNN outperformed traditional QC methods in terms of both spectral accuracy and computational efficiency. Finally, we demonstrated TranSpec's ability to recognize functional groups and distinguish isomers or homologues. Together, TranSpec and SpecGNN models provide an efficient and accurate AI-driven framework for interpreting molecular structures and spectra, advancing applications in spectroscopy and cheminformatics.
PMID:40700648 | DOI:10.1021/jacs.5c05010
Implementation of convolutional neural networks for microbial colony recognition
Microbiol Spectr. 2025 Jul 23:e0288524. doi: 10.1128/spectrum.02885-24. Online ahead of print.
ABSTRACT
Initial classification of microorganisms based on visual identification of colonies remains challenging for skilled microbiologists and is influenced by the proficiency and subjective interpretation of professionals. To overcome these challenges, we applied deep learning to microbial colony recognition to provide microbial data to microbiologists to assist in clinical classification. Photographs of clinically isolated microbial colonies were captured to produce a 48 × 48 pixel colony data set, which was divided into training, validation, and test data sets. Eight convolutional neural networks (CNNs) were adapted to the colony classification task. The classification performance of the models was evaluated based on accuracy, precision, recall, and F1 score. The data set included five categories, namely gram-negative bacilli, gram-positive cocci, Candida, Aspergillus, and background of blood agar medium, with corresponding labels of 0, 1, 2, 3, and 4, respectively; each category contained 1,000 images. Among the trained CNNs, AlexNet showed the lowest performance, with an accuracy of 93.40%, whereas GoogLeNet had the highest performance, with an accuracy of 98.80%. MobileNet and ShuffleNet were more than 98% accurate. GoogLeNet misclassified only 6 of the 500 images in the test data set, and the algorithm was extremely capable of identifying both clinical and standard strains that were not included in the data set. The various trained CNNs demonstrated excellent performance in microbial colony recognition. These data-driven CNNs are expected to provide auxiliary decision-making tools for microbiologists.IMPORTANCECurrently, the classification of microorganisms is highly subjective because it is dependent upon the skill level of the microbiologist. In this study, we used deep learning for microbial colony recognition to provide objective information to guide the identification of microbial colonies. We used photographs of clinically isolated microbial colonies divided into training, validation, and test data sets. Eight convolutional neural networks (CNNs) were applied, and the classification performance of each model was evaluated using accuracy, precision, recall, and F1 scores. Our study confirmed that CNNs can classify colonies into four broad categories: gram-negative bacilli, gram-positive cocci, Candida, and Aspergillus, with excellent predictive performance. Our method does not require specialized photographic equipment and exhibits high generalization performance, even for unknown bacteria.
PMID:40700097 | DOI:10.1128/spectrum.02885-24
LncRNA Subcellular Localization Across Diverse Cell Lines: An Exploration Using Deep Learning with Inexact q-mers
Noncoding RNA. 2025 Jun 25;11(4):49. doi: 10.3390/ncrna11040049.
ABSTRACT
Background: Long non-coding Ribonucleic Acids (lncRNAs) can be localized to different cellular compartments, such as the nuclear and the cytoplasmic regions. Their biological functions are influenced by the region of the cell where they are located. Compared to the vast number of lncRNAs, only a relatively small proportion have annotations regarding their subcellular localization. It would be helpful if those few annotated lncRNAs could be leveraged to develop predictive models for localization of other lncRNAs. Methods: Conventional computational methods use q-mer profiles from lncRNA sequences and train machine learning models such as support vector machines and logistic regression with the profiles. These methods focus on the exact q-mer. Given possible sequence mutations and other uncertainties in genomic sequences and their role in biological function, a consideration of these variabilities might improve our ability to model lncRNAs and their localization. Thus, we build on inexact q-mers and use machine learning/deep learning techniques to study three specific problems in lncRNA subcellular localization, namely, prediction of lncRNA localization using inexact q-mers, the issue of whether lncRNA localization is cell-type-specific, and the notion of switching (lncRNA) genes. Results: We performed our analysis using data on lncRNA localization across 15 cell lines. Our results showed that using inexact q-mers (with q = 6) can improve the lncRNA localization prediction performance compared to using exact q-mers. Further, we showed that lncRNA localization, in general, is not cell-line-specific. We also identified a category of LncRNAs which switch cellular compartments between different cell lines (we call them switching lncRNAs). These switching lncRNAs complicate the problem of predicting lncRNA localization using machine learning models, showing that lncRNA localization is still a major challenge.
PMID:40700092 | DOI:10.3390/ncrna11040049
Progressive Training for Learning From Label Proportions
IEEE Trans Neural Netw Learn Syst. 2025 Jul 23;PP. doi: 10.1109/TNNLS.2025.3590131. Online ahead of print.
ABSTRACT
Learning from label proportions (LLPs), which aims to learn an instance-level classifier using proportion-based grouped training data, has garnered increasing attention in the field of machine learning. Existing deep learning-based LLP methods employ end-to-end pipelines to derive proportional loss functions via the Kullback-Leibler (KL) divergence between bag-level prior and posterior class distributions. However, the optimal solutions of these methods often struggle to conform to the given proportions, inevitably leading to degradation in the final classification performance. In this article, we address this issue by proposing a novel progressive training method for LLP, termed PT-LLP, which strives to meet the proportion constraints from the bag level to the instance level. Specifically, we first train a model by using the existing KL-divergence-based LLP methods that are consistent with bag-level proportion information. Then, we impose additional constraints on strict proportion consistency to the classifier to further move toward a more ideal solution by reformulating it as a constrained optimization problem, which can be efficiently solved using optimal transport (OT) algorithms. In particular, the knowledge distillation is employed as a transition stage to transfer the bag-level information to the instance level using a teacher-student framework. Finally, our framework is model-agnostic and demonstrates significant performance improvements through extensive experiments on different datasets when incorporated into other deep LLP methods as the first training stage.
PMID:40699973 | DOI:10.1109/TNNLS.2025.3590131
A Comprehensive Survey on 3D Single-View Object Reconstruction
IEEE Trans Vis Comput Graph. 2025 Jul 23;PP. doi: 10.1109/TVCG.2025.3591770. Online ahead of print.
ABSTRACT
Single-view 3D object reconstruction (SVOR) aims to recover the 3D shape of an object from a single 2D image. Despite advances in deep learning (DL), challenges such as incomplete image information, scarce 3D data annotation, and highly variable object shapes still limit the performance of SVOR. Meanwhile, with the rapid development of novel view synthesis (NVS) techniques, the SVOR field has received significant advancements. However, existing reviews have not comprehensively covered the rapid developments in NVS-based approaches. This paper aims to fill this gap by highlighting the latest progress in SVOR, particularly advancements related to NVS-based methods. Additionally, we observed discrepancies between existing quality evaluation metrics in SVOR and human visual perception. This is because some critical object parts are essential to consider during the evaluation. For example, when reconstructing airplanes, critical parts like the empennage and wings are often overlooked in evaluation metrics due to their smaller size compared to the fuselage. Consequently, poor reconstruction of these parts may not significantly affect overall evaluation scores. To address this issue, we propose a more comprehensive evaluation method that reflects human visual perception accurately. To achieve this, we introduce a weighted evaluation method that considers part saliency and proposes a novel technique for automatically perceiving reconstruction discrepancies. This study effectively enhances the accuracy and consistency of evaluations through these approaches, offering new insights and methodologies, filling a void in the existing literature, and providing valuable contributions to both research and practical applications in SVOR.
PMID:40699970 | DOI:10.1109/TVCG.2025.3591770