Deep learning
Artificial intelligence chatbots in endodontic education-Concepts and potential applications
Int Endod J. 2025 Mar 31. doi: 10.1111/iej.14231. Online ahead of print.
ABSTRACT
The integration of artificial intelligence (AI) into education is transforming learning across various domains, including dentistry. Endodontic education can significantly benefit from AI chatbots; however, knowledge gaps regarding their potential and limitations hinder their effective utilization. This narrative review aims to: (A) explain the core functionalities of AI chatbots, including their reliance on natural language processing (NLP), machine learning (ML), and deep learning (DL); (B) explore their applications in endodontic education for personalized learning, interactive training, and clinical decision support; (C) discuss the challenges posed by technical limitations, ethical considerations, and the potential for misinformation. The review highlights that AI chatbots provide learners with immediate access to knowledge, personalized educational experiences, and tools for developing clinical reasoning through case-based learning. Educators benefit from streamlined curriculum development, automated assessment creation, and evidence-based resource integration. Despite these advantages, concerns such as chatbot hallucinations, algorithmic biases, potential for plagiarism, and the spread of misinformation require careful consideration. Analysis of current research reveals limited endodontic-specific studies, emphasizing the need for tailored chatbot solutions validated for accuracy and relevance. Successful integration will require collaborative efforts among educators, developers, and professional organizations to address challenges, ensure ethical use, and establish evaluation frameworks.
PMID:40164964 | DOI:10.1111/iej.14231
LEyes: A lightweight framework for deep learning-based eye tracking using synthetic eye images
Behav Res Methods. 2025 Mar 31;57(5):129. doi: 10.3758/s13428-025-02645-y.
ABSTRACT
Deep learning methods have significantly advanced the field of gaze estimation, yet the development of these algorithms is often hindered by a lack of appropriate publicly accessible training datasets. Moreover, models trained on the few available datasets often fail to generalize to new datasets due to both discrepancies in hardware and biological diversity among subjects. To mitigate these challenges, the research community has frequently turned to synthetic datasets, although this approach also has drawbacks, such as the computational resource and labor-intensive nature of creating photorealistic representations of eye images to be used as training data. In response, we introduce "Light Eyes" (LEyes), a novel framework that diverges from traditional photorealistic methods by utilizing simple synthetic image generators to train neural networks for detecting key image features like pupils and corneal reflections, diverging from traditional photorealistic approaches. LEyes facilitates the generation of synthetic data on the fly that is adaptable to any recording device and enhances the efficiency of training neural networks for a wide range of gaze-estimation tasks. Presented evaluations show that LEyes, in many cases, outperforms existing methods in accurately identifying and localizing pupils and corneal reflections across diverse datasets. Additionally, models trained using LEyes data outperform standard eye trackers while employing more cost-effective hardware, offering a promising avenue to overcome the current limitations in gaze estimation technology.
PMID:40164925 | DOI:10.3758/s13428-025-02645-y
Peptide-functionalized nanoparticles for brain-targeted therapeutics
Drug Deliv Transl Res. 2025 Mar 31. doi: 10.1007/s13346-025-01840-w. Online ahead of print.
ABSTRACT
Despite the rapid development of nanoparticle (NP)-based drug delivery systems, intravenous delivery of drugs to the brain remains a major challenge due to various biological barriers. To achieve therapeutic effects, NP-encapsulated drugs must avoid accumulation in off-target organs and selectively deliver to the brain, successfully cross the blood-brain barrier (BBB), and reach the target cells in the brain. Conjugating receptor-specific ligands to the surface of NPs is a promising technique for engineering NPs to overcome these barriers. Specifically, peptides as brain-targeting ligands have been of increasing interest given their ease of synthesis, low cytotoxicity, and strong affinity to target proteins. The success of peptides as targeting ligands is largely due to the diverse strategies of designing and modifying peptides with favorable properties, including membrane permeability and multi-receptor targeting. Here, we review the design and implementation of peptide-functionalized NP systems for neurological disease applications. We also explore advances in rational peptide design strategies for brain targeting, including using generative deep-learning models to computationally design new peptides.
PMID:40164912 | DOI:10.1007/s13346-025-01840-w
Artificial intelligence for intraoperative video analysis in robotic-assisted esophagectomy
Surg Endosc. 2025 Mar 31. doi: 10.1007/s00464-025-11685-6. Online ahead of print.
ABSTRACT
BACKGROUND: Robotic-assisted minimally invasive esophagectomy (RAMIE) is a complex surgical procedure for treating esophageal cancer. Artificial intelligence (AI) is an uprising technology with increasing applications in the surgical field. This scoping review aimed to assess the current AI applications in RAMIE, with a focus on intraoperative video analysis.
METHODS: To identify all articles utilizing AI in RAMIE, a comprehensive literature search was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis for scoping reviews of Medline and Embase databases and the Cochrane Library. Two independent reviewers assessed articles for quality and inclusion.
RESULTS: One hundred and seventeen articles were identified, of which four were included in the final analysis. Results demonstrated that the main AI applications in RAMIE were intraoperative video assessment and the evaluation of surgical technical skills to evaluate surgical performance. AI was also used for surgical phase recognition to support clinical decision-making through intraoperative guidance and identify key anatomical landmarks. Various deep-learning networks were used to generate AI models, and there was a strong emphasis on using high-quality standardized video frames.
CONCLUSIONS: The use of AI in RAMIE, especially in intraoperative video analysis and surgical phase recognition, is still a relatively new field that should be further explored. The advantages of using AI algorithms to evaluate intraoperative videos in an automated manner may be harnessed to improve technical performance and intraoperative decision-making, achieve a higher quality of surgery, and improve postoperative outcomes.
PMID:40164839 | DOI:10.1007/s00464-025-11685-6
Vision Transformers in Medical Imaging: a Comprehensive Review of Advancements and Applications Across Multiple Diseases
J Imaging Inform Med. 2025 Mar 31. doi: 10.1007/s10278-025-01481-y. Online ahead of print.
ABSTRACT
The rapid advancement of artificial intelligence techniques, particularly deep learning, has transformed medical imaging. This paper presents a comprehensive review of recent research that leverage vision transformer (ViT) models for medical image classification across various disciplines. The medical fields of focus include breast cancer, skin lesions, magnetic resonance imaging brain tumors, lung diseases, retinal and eye analysis, COVID-19, heart diseases, colon cancer, brain disorders, diabetic retinopathy, skin diseases, kidney diseases, lymph node diseases, and bone analysis. Each work is critically analyzed and interpreted with respect to its performance, data preprocessing methodologies, model architecture, transfer learning techniques, model interpretability, and identified challenges. Our findings suggest that ViT shows promising results in the medical imaging domain, often outperforming traditional convolutional neural networks (CNN). A comprehensive overview is presented in the form of figures and tables summarizing the key findings from each field. This paper provides critical insights into the current state of medical image classification using ViT and highlights potential future directions for this rapidly evolving research area.
PMID:40164818 | DOI:10.1007/s10278-025-01481-y
Monocular depth estimation via a detail semantic collaborative network for indoor scenes
Sci Rep. 2025 Mar 31;15(1):10990. doi: 10.1038/s41598-025-96024-4.
ABSTRACT
Monocular image depth estimation is crucial for indoor scene reconstruction, and it plays a significant role in optimizing building energy efficiency, indoor environment modeling, and smart space design. However, the small depth variability of indoor scenes leads to weakly distinguishable detail features. Meanwhile, there are diverse types of indoor objects, and the expression of the correlation among different objects is complicated. Additionally, the robustness of recent models still needs further improvement given these indoor environments. To address these problems, a detail‒semantic collaborative network (DSCNet) is proposed for monocular depth estimation of indoor scenes. First, the contextual features contained in the images are fully captured via the hierarchical transformer structure. Second, a detail‒semantic collaborative structure is established, which establishes a selective attention feature map to extract details and semantic information from feature maps. The extracted features are subsequently fused to improve the perception ability of the network. Finally, the complex correlation among indoor objects is addressed by aggregating semantic and detailed features at different levels, and the model accuracy is effectively improved without increasing the number of parameters. The proposed model is tested on the NYU and SUN datasets. The proposed approach produces state-of-the-art results compared with the 14 performance results of recent optimal methods. In addition, the proposed approach is fully discussed and analyzed in terms of stability, robustness, ablation experiments and availability in indoor scenes.
PMID:40164814 | DOI:10.1038/s41598-025-96024-4
A streaming brain-to-voice neuroprosthesis to restore naturalistic communication
Nat Neurosci. 2025 Mar 31. doi: 10.1038/s41593-025-01905-6. Online ahead of print.
ABSTRACT
Natural spoken communication happens instantaneously. Speech delays longer than a few seconds can disrupt the natural flow of conversation. This makes it difficult for individuals with paralysis to participate in meaningful dialogue, potentially leading to feelings of isolation and frustration. Here we used high-density surface recordings of the speech sensorimotor cortex in a clinical trial participant with severe paralysis and anarthria to drive a continuously streaming naturalistic speech synthesizer. We designed and used deep learning recurrent neural network transducer models to achieve online large-vocabulary intelligible fluent speech synthesis personalized to the participant's preinjury voice with neural decoding in 80-ms increments. Offline, the models demonstrated implicit speech detection capabilities and could continuously decode speech indefinitely, enabling uninterrupted use of the decoder and further increasing speed. Our framework also successfully generalized to other silent-speech interfaces, including single-unit recordings and electromyography. Our findings introduce a speech-neuroprosthetic paradigm to restore naturalistic spoken communication to people with paralysis.
PMID:40164740 | DOI:10.1038/s41593-025-01905-6
Clinical implications of deep learning based image analysis of whole radical prostatectomy specimens
Sci Rep. 2025 Mar 31;15(1):11006. doi: 10.1038/s41598-025-95267-5.
ABSTRACT
Prostate cancer (PCa) diagnosis faces significant challenges due to its complex pathological characteristics and insufficient pathologist resources. While deep learning-based image analysis (DLIA) shows promise in enhancing diagnostic accuracy, its application to radical prostatectomy (RP) specimens remains underexplored. In this study, we evaluated the clinical feasibility and prognostic value of a DLIA algorithm for Gleason grading and tumor quantification on whole RP specimens. Using 29,646 digitized H&E-stained slides from 992 patients who underwent RP, we compared the case-level algorithm results with pathologist assessments for the International Society of Urological Pathology grade groups (GG), tumor volumes (TV), and percent tumor volumes (PTV). We also evaluated their prognostic performance in predicting biochemical progression-free survival (BPFS). Pathologists identified cancer in 986 cases and assigned GG in 980, while the DLIA algorithm identified cancer and assigned GG to all cases without omission. DLIA-assigned GG showed fair concordance with pathologist assessments (linear-weighted Cohen's kappa: 0.374) and demonstrated similar efficacy in predicting BPFS (c-index: 0.644 for DLIA vs. 0.654 for pathologists; p = 0.52). In tumor quantification, DLIA-measured TV and PTV were strongly correlated with pathologist-based measurements (Pearson's correlation coefficient: 0.830 and 0.846, respectively), but showed stronger efficacy in BPFS prediction, with c-index values of 0.657 and 0.672 compared to 0.622 and 0.641, respectively. Incorporating DLIA-derived PTV into the CAPRA-S score significantly improved its predictive accuracy for BCR (p = 0.006), increasing the c-index from 0.704 to 0.715. Our findings indicate that DLIA algorithms can enhance the accuracy of Gleason grading and tumor quantification in RP specimens, providing valuable support in clinical decision-making for PCa management.
PMID:40164701 | DOI:10.1038/s41598-025-95267-5
Deep graph learning of multimodal brain networks defines treatment-predictive signatures in major depression
Mol Psychiatry. 2025 Mar 31. doi: 10.1038/s41380-025-02974-6. Online ahead of print.
ABSTRACT
Major depressive disorder (MDD) presents a substantial health burden with low treatment response rates. Predicting antidepressant efficacy is challenging due to MDD's complex and varied neuropathology. Identifying biomarkers for antidepressant treatment requires thorough analysis of clinical trial data. Multimodal neuroimaging, combined with advanced data-driven methods, can enhance our understanding of the neurobiological processes influencing treatment outcomes. To address this, we analyzed resting-state fMRI and EEG connectivity data from 130 patients treated with sertraline and 135 patients with placebo from the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study. A deep learning framework was developed using graph neural networks to integrate data-augmented connectivity and cross-modality correlation, aiming to predict individual symptom changes by revealing multimodal brain network signatures. The results showed that our model demonstrated promising prediction accuracy, with an R2 value of 0.24 for sertraline and 0.20 for placebo. It also exhibited potential in transferring predictions using only EEG. Key brain regions identified for predicting sertraline response included the inferior temporal gyrus (fMRI) and posterior cingulate cortex (EEG), while for placebo response, the precuneus (fMRI) and supplementary motor area (EEG) were critical. Additionally, both modalities identified the superior temporal gyrus and posterior cingulate cortex as significant for sertraline response, while the anterior cingulate cortex and postcentral gyrus were common predictors in the placebo arm. Additionally, variations in the frontoparietal control, ventral attention, dorsal attention, and limbic networks were notably associated with MDD treatment. By integrating fMRI and EEG, our study established novel multimodal brain network signatures to predict individual responses to sertraline and placebo in MDD, providing interpretable neural circuit patterns that may guide future targeted interventions. Trial Registration: Establishing Moderators and Biosignatures of Antidepressant Response for Clinical Care for Depression (EMBARC) ClinicalTrials.gov Identifier: NCT#01407094.
PMID:40164695 | DOI:10.1038/s41380-025-02974-6
Well log data generation and imputation using sequence based generative adversarial networks
Sci Rep. 2025 Mar 31;15(1):11000. doi: 10.1038/s41598-025-95709-0.
ABSTRACT
Well log analysis is significant for hydrocarbon exploration, providing detailed insights into subsurface geological formations. However, gaps and inaccuracies in well log data, often due to equipment limitations, operational challenges, and harsh subsurface conditions, can introduce significant uncertainties in reservoir evaluation. Addressing these challenges requires effective methods for both synthetic data generation and precise imputation of missing data, ensuring data completeness and reliability. This study introduces a novel framework utilizing sequence-based generative adversarial networks (GANs) specifically designed for well log data generation and imputation. The framework integrates two distinct sequence-based GAN models: time series GAN (TSGAN) for generating synthetic well log data and sequence GAN (SeqGAN) for imputing missing data. Both models were tested on a dataset from the North Sea, Netherlands region. For the imputation task, the input comprises logs with missing values and the output is the corresponding imputed logs; for the synthetic data generation task, the input is complete real logs and the output is synthetic logs that mimic the statistical properties of the original data. All log measurements are normalized to a 0-1 range using min-max scaling, and error metrics are reported in these normalized units. Different sections of 5, 10, and 50 data points were used. Experimental results demonstrate that this approach achieves superior accuracy in filling data gaps compared to other deep learning models for spatial series analysis. The imputation method yielded [Formula: see text] values of 0.92, 0.86, and 0.57, with corresponding mean absolute percentage error (MAPE) values of 8.320, 0.005, and 166.6, and mean absolute error (MAE) values of 0.012, 0.002, and 0.03, respectively. The synthetic generation yielded [Formula: see text] of 0.92, MAE, of 0.35, and MRLE of 0.01. These results set a new benchmark for data integrity and utility in geosciences, particularly in well log data analysis.
PMID:40164658 | DOI:10.1038/s41598-025-95709-0
Kolmogorov-Arnold networks for genomic tasks
Brief Bioinform. 2025 Mar 4;26(2):bbaf129. doi: 10.1093/bib/bbaf129.
ABSTRACT
Kolmogorov-Arnold networks (KANs) emerged as a promising alternative for multilayer perceptrons (MLPs) in dense fully connected networks. Multiple attempts have been made to integrate KANs into various deep learning architectures in the domains of computer vision and natural language processing. Integrating KANs into deep learning models for genomic tasks has not been explored. Here, we tested linear KANs (LKANs) and convolutional KANs (CKANs) as a replacement for MLP in baseline deep learning architectures for classification and generation of genomic sequences. We used three genomic benchmark datasets: Genomic Benchmarks, Genome Understanding Evaluation, and Flipon Benchmark. We demonstrated that LKANs outperformed both baseline and CKANs on almost all datasets. CKANs can achieve comparable results but struggle with scaling over large number of parameters. Ablation analysis demonstrated that the number of KAN layers correlates with the model performance. Overall, linear KANs show promising results in improving the performance of deep learning models with relatively small number of parameters. Unleashing KAN potential in different state-of-the-art deep learning architectures currently used in genomics requires further research.
PMID:40163820 | DOI:10.1093/bib/bbaf129
An updated compendium and reevaluation of the evidence for nuclear transcription factor occupancy over the mitochondrial genome
PLoS One. 2025 Mar 31;20(3):e0318796. doi: 10.1371/journal.pone.0318796. eCollection 2025.
ABSTRACT
In most eukaryotes, mitochondrial organelles contain their own genome, usually circular, which is the remnant of the genome of the ancestral bacterial endosymbiont that gave rise to modern mitochondria. Mitochondrial genomes are dramatically reduced in their gene content due to the process of endosymbiotic gene transfer to the nucleus; as a result most mitochondrial proteins are encoded in the nucleus and imported into mitochondria. This includes the components of the dedicated mitochondrial transcription and replication systems and regulatory factors, which are entirely distinct from the information processing systems in the nucleus. However, since the 1990s several nuclear transcription factors have been reported to act in mitochondria, and previously we identified 8 human and 3 mouse transcription factors (TFs) with strong localized enrichment over the mitochondrial genome using ChIP-seq (Chromatin Immunoprecipitation) datasets from the second phase of the ENCODE (Encyclopedia of DNA Elements) Project Consortium. Here, we analyze the greatly expanded in the intervening decade ENCODE compendium of TF ChIP-seq datasets (a total of 6,153 ChIP experiments for 942 proteins, of which 763 are sequence-specific TFs) combined with interpretative deep learning models of TF occupancy to create a comprehensive compendium of nuclear TFs that show evidence of association with the mitochondrial genome. We find some evidence for chrM occupancy for 50 nuclear TFs and two other proteins, with bZIP TFs emerging as most likely to be playing a role in mitochondria. However, we also observe that in cases where the same TF has been assayed with multiple antibodies and ChIP protocols, evidence for its chrM occupancy is not always reproducible. In the light of these findings, we discuss the evidential criteria for establishing chrM occupancy and reevaluate the overall compendium of putative mitochondrial-acting nuclear TFs.
PMID:40163815 | DOI:10.1371/journal.pone.0318796
A Tunable Forced Alignment System Based on Deep Learning: Applications to Child Speech
J Speech Lang Hear Res. 2025 Mar 31:1-19. doi: 10.1044/2024_JSLHR-24-00347. Online ahead of print.
ABSTRACT
PURPOSE: Phonetic forced alignment has a multitude of applications in automated analysis of speech, particularly in studying nonstandard speech such as children's speech. Manual alignment is tedious but serves as the gold standard for clinical-grade alignment. Current tools do not support direct training on manual alignments. Thus, a trainable speaker adaptive phonetic forced alignment system, Wav2TextGrid, was developed for children's speech. The source code for the method is publicly available along with a graphical user interface at https://github.com/pkadambi/Wav2TextGrid.
METHOD: We propose a trainable, speaker-adaptive, neural forced aligner developed using a corpus of 42 neurotypical children from 3 to 6 years of age. Evaluation on both child speech and on the TIMIT corpus was performed to demonstrate aligner performance across age and dialectal variations.
RESULTS: The trainable alignment tool markedly improved accuracy over baseline for several alignment quality metrics, for all phoneme categories. Accuracy for plosives and affricates in children's speech improved more than 40% over baseline. Performance matched existing methods using approximately 13 min of labeled data, while approximately 45-60 min of labeled alignments yielded significant improvement.
CONCLUSION: The Wav2TextGrid tool allows alternate alignment workflows where the forced alignments, via training, are directly tailored to match clinical-grade, manually provided alignments.
SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.28593971.
PMID:40163771 | DOI:10.1044/2024_JSLHR-24-00347
Diagnosis of Oral Cancer With Deep Learning. A Comparative Test Accuracy Systematic Review
Oral Dis. 2025 Mar 31. doi: 10.1111/odi.15330. Online ahead of print.
ABSTRACT
OBJECTIVE: To directly compare the diagnostic accuracy of deep learning models with human experts and other diagnostic methods used for the clinical detection of oral cancer.
METHODS: Comparative diagnostic studies involving patients with photographic images of oral mucosal lesions (cancer or non-cancer) were included. Only studies using deep learning methods were eligible. Medline, EMBASE, Scopus, Google Scholar, and ClinicalTrials.gov were searched until September 2024. QUADAS-C assessed the risk of bias. A Bayesian meta-analysis compared diagnostic test accuracy.
RESULTS: Eight studies were included, none of which had a low risk of bias. Three studies compared deep learning versus human experts. The difference in sensitivity favored deep learning by 0.024 (95% CI: -0.093, 0.206), while the difference in specificity favored human experts by -0.041 (95% CI: -0.218, 0.038). Two studies compared deep learning versus postgraduate medical students. The differences in sensitivity and specificity favored deep learning by 0.108 (95% CI: -0.038, 0.324) and by 0.010 (95% CI: -0.119, 0.111), respectively. Both comparisons provided low-level evidence.
CONCLUSIONS: Deep learning models showed comparable sensitivity and specificity to human experts. These models outperformed postgraduate medical students in terms of sensitivity. Prospective clinical trials are needed to evaluate the real-world performance of deep learning models.
PMID:40163741 | DOI:10.1111/odi.15330
Childhood muscle growth: Reference curves for lower leg muscle volumes and their clinical application in cerebral palsy
Proc Natl Acad Sci U S A. 2025 Apr 8;122(14):e2416660122. doi: 10.1073/pnas.2416660122. Epub 2025 Mar 31.
ABSTRACT
Skeletal muscles grow substantially during childhood. However, quantitative information about the size of typically developing children's muscles is sparse. Here, the objective was to construct muscle-specific reference curves for lower leg muscle volumes in children aged 5 to 15 y. Volumes of 10 lower leg muscles were measured from magnetic resonance images of 208 typically developing children and 78 ambulant children with cerebral palsy. Deep learning was used to automatically segment the images. Reference curves for typical childhood muscle volumes were constructed with quantile regression. The median total leg muscle volume of a 15-y-old child is nearly five times that of a 5-y-old child. Between the ages of 5 and 15, boys typically have larger muscles than girls, both in absolute terms (medians are greater by 5 to 20%) and per unit of body weight (1 to 13%). Muscle volumes vary widely between children of a particular age: the range of volumes for the central 80% of the distribution (i.e., between the 10th and 90th centiles) is more than 40% of the median volume. Reference curves for individual muscle volumes have a similar shape to reference curves for total lower leg muscle volume. Confidence bands about the centile curves were wide, especially at the youngest and oldest ages. Nonetheless, the reference curves can be used with confidence to identify small-for-age muscles (centile < 10). We show that 56% of children with cerebral palsy in our cohort had total lower leg muscle volumes that were small-for-age and that 80% had at least one lower leg muscle that was small-for-age.
PMID:40163724 | DOI:10.1073/pnas.2416660122
Artificial Intelligence for Classification of Endoscopic Severity of Inflammatory Bowel Disease: A Systematic Review and Critical Appraisal
Inflamm Bowel Dis. 2025 Mar 31:izaf050. doi: 10.1093/ibd/izaf050. Online ahead of print.
ABSTRACT
BACKGROUND: Endoscopic scoring indices for ulcerative colitis and Crohn's disease are subject to inter-endoscopist variability. There is increasing interest in the development of deep learning models to standardize endoscopic assessment of intestinal diseases. Here, we summarize and critically appraise the literature on artificial intelligence-assisted endoscopic characterization of inflammatory bowel disease severity.
METHODS: A systematic search of Ovid MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, and IEEE Xplore was performed to identify reports of AI systems used for endoscopic severity classification of IBD. Selected studies were critically appraised for methodological and reporting quality using APPRAISE-AI.
RESULTS: Thirty-one studies published between 2019 and 2024 were included. Of 31 studies, 28 studies examined endoscopic classification of ulcerative colitis and 3 examined Crohn's disease. Researchers sought to accomplish a wide range of classification tasks, including binary and multilevel classification, based on still images or full-length colonoscopy videos. Overall scores for study quality ranged from 41 (moderate quality) to 64 (high quality) out of 100, with 28 out of 31 studies within the moderate quality range. The highest-scoring domains were clinical relevance and reporting quality, while the lowest-scoring domains were robustness of results and reproducibility.
CONCLUSIONS: Multiple AI models have demonstrated the potential for clinical translation for ulcerative colitis. Research concerning the endoscopic severity assessment of Crohn's disease is limited and should be further explored. More rigorous external validation of AI models and increased transparency of data and codes are needed to improve the quality of AI studies.
PMID:40163659 | DOI:10.1093/ibd/izaf050
A Systematic Review of Advances in Infant Cry Paralinguistic Classification: Methods, Implementation, and Applications
JMIR Rehabil Assist Technol. 2025 Feb 18. doi: 10.2196/69457. Online ahead of print.
ABSTRACT
BACKGROUND: Effective communication is essential for human interaction, yet infants can only express their needs through various types of suggestive cries. Traditional approaches of interpreting infant cries are often subjective, inconsistent, and slow leaving gaps in timely, precise caregiving responses. A precise interpretation of infant cries can potentially provide valuable insights into the infant's health, needs, and well-being, enabling prompt medical or caregiving actions.
OBJECTIVE: This study seeks to systematically review the advancements in methods, coverage, deployment schemes, and applications of infant cry classification over the last 24 years. The review focuses on the different infant cry classification techniques, feature extraction methods, and the practical applications. Furthermore, we aimed to identify recent trends and directions in the field of infant cry signal processing to address both academic and practical needs.
METHODS: A systematic literature review was conducted by using nine electronic databases: Cochrane Database of Systematic Reviews, JSTOR, Web of Science Core Collection, Scopus, PubMed, ACM, MEDLINE, IEEE Xplore, and Google Scholar. A total of 5904 search results were initially retrieved, with 126 studies meeting the eligibility criteria after screening by two independent reviewers. The methodological quality of the studies was assessed using the Cochrane risk-of-bias tool version 2 (RoB2), with 92% (n=116) of the studies indicating a low risk of bias and 8% (n=10) of the studies showing some concerns regarding bias. The overall quality assessment was performed using the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The data analysis was conducted using R version 3.64.
RESULTS: Notable advancements in infant cry classification methods were realized, particularly from 2019 onwards employing machine learning, deep learning, and hybrid approaches. Common audio features included Mel-frequency cepstral coefficients (MFCCs), spectrograms, pitch, duration, intensity, formants, zero-crossing rate and chroma. Deployment methods included mobile applications and web-based platforms for real-time analysis with 90% (n=113) of the remaining models remained undeployed to real world applications. Denoising techniques and federated learning were limitedly employed to enhance model robustness and ensure data confidentiality from 5% (n=6) of the studies. Some of the practical applications spanned healthcare monitoring, diagnostics, and caregiver support.
CONCLUSIONS: The evolution of infant cry classification methods has progressed from traditional classical statistical methods to machine learning models but with minimal considerations of data privacy, confidentiality, and ultimate deployment to the practical use. Further research is thus proposed to develop standardized foundational audio multimodal approaches, incorporating a broader range of audio features and ensuring data confidentiality through methods such as federated learning. Furthermore, a preliminary layer is proposed for denoising the cry signal before the feature extractions stage. These improvements will enhance the accuracy, generalizability, and practical applicability of infant cry classification models in diverse healthcare settings.
PMID:40163619 | DOI:10.2196/69457
Deep Learning with Reflection High-Energy Electron Diffraction Images to Predict Cation Ratio in Sr(2x)Ti(2(1-x))O(3) Thin Films
Nano Lett. 2025 Mar 31. doi: 10.1021/acs.nanolett.5c00787. Online ahead of print.
ABSTRACT
Machine learning (ML) with in-situ diagnostics offers a transformative approach to accelerate, understand, and control thin film synthesis by uncovering relationships between synthesis conditions and material properties. In this study, we demonstrate the application of deep learning to predict the stoichiometry of Sr2xTi2(1-x)O3 thin films using reflection high-energy electron diffraction images acquired during pulsed laser deposition. A gated convolutional neural network trained for regression of the Sr atomic fraction achieved accurate predictions with a small dataset of 31 samples. Explainable AI techniques revealed a previously unknown correlation between diffraction streak features and cation stoichiometry in Sr2xTi2(1-x)O3 thin films. Our results demonstrate how ML can be used to transform a ubiquitous in-situ diagnostic tool, that is usually limited to qualitative assessments, into a quantitative surrogate measurement of continuously valued thin film properties. Such methods are critically needed to enable real-time control, autonomous workflows, and accelerate traditional synthesis approaches.
PMID:40163590 | DOI:10.1021/acs.nanolett.5c00787
Anticancer drug response prediction integrating multi-omics pathway-based difference features and multiple deep learning techniques
PLoS Comput Biol. 2025 Mar 31;21(3):e1012905. doi: 10.1371/journal.pcbi.1012905. Online ahead of print.
ABSTRACT
Individualized prediction of cancer drug sensitivity is of vital importance in precision medicine. While numerous predictive methodologies for cancer drug response have been proposed, the precise prediction of an individual patient's response to drug and a thorough understanding of differences in drug responses among individuals continue to pose significant challenges. This study introduced a deep learning model PASO, which integrated transformer encoder, multi-scale convolutional networks and attention mechanisms to predict the sensitivity of cell lines to anticancer drugs, based on the omics data of cell lines and the SMILES representations of drug molecules. First, we use statistical methods to compute the differences in gene expression, gene mutation, and gene copy number variations between within and outside biological pathways, and utilized these pathway difference values as cell line features, combined with the drugs' SMILES chemical structure information as inputs to the model. Then the model integrates various deep learning technologies multi-scale convolutional networks and transformer encoder to extract the properties of drug molecules from different perspectives, while an attention network is devoted to learning complex interactions between the omics features of cell lines and the aforementioned properties of drug molecules. Finally, a multilayer perceptron (MLP) outputs the final predictions of drug response. Our model exhibits higher accuracy in predicting the sensitivity to anticancer drugs comparing with other methods proposed recently. It is found that PARP inhibitors, and Topoisomerase I inhibitors were particularly sensitive to SCLC when analyzing the drug response predictions for lung cancer cell lines. Additionally, the model is capable of highlighting biological pathways related to cancer and accurately capturing critical parts of the drug's chemical structure. We also validated the model's clinical utility using clinical data from The Cancer Genome Atlas. In summary, the PASO model suggests potential as a robust support in individualized cancer treatment. Our methods are implemented in Python and are freely available from GitHub (https://github.com/queryang/PASO).
PMID:40163555 | DOI:10.1371/journal.pcbi.1012905
Model interpretability on private-safe oriented student dropout prediction
PLoS One. 2025 Mar 31;20(3):e0317726. doi: 10.1371/journal.pone.0317726. eCollection 2025.
ABSTRACT
Student dropout is a significant social issue with extensive implications for individuals and society, including reduced employability and economic downturns, which, in turn, drastically influence social sustainable development. Identifying students at high risk of dropping out is a major challenge for sustainable education. While existing machine learning and deep learning models can effectively predict dropout risks, they often rely on real student data, raising ethical concerns and the risk of information leakage. Additionally, the poor interpretability of these models complicates their use in educational management, as it is difficult to justify identifying a student as high-risk based on an opaque model. To address these two issues, we introduced for the first time a modified Preprocessed Kernel Inducing Points data distillation technique (PP-KIPDD), specializing in distilling tabular structured dataset, and innovatively employed the PP-KIPDD to reconstruct new samples that serve as qualified training sets simulating student information distributions, thereby preventing student privacy information leakage, which showed better performance and efficiency compared to traditional data synthesis techniques such as the Conditional Generative Adversarial Networks. Furthermore, we empower the classifiers credibility by enhancing model interpretability utilized SHAP (SHapley Additive exPlanations) values and elucidated the significance of selected features from an educational management perspective. With well-explained features from both quantitative and qualitative aspects, our approach enhances the feasibility and reasonableness of dropout predictions using machine learning techniques. We believe our approach represents a novel end-to-end framework of artificial intelligence application in the field of sustainable education management from the view of decision-makers, as it addresses privacy leakage protection and enhances model credibility for practical management implementations.
PMID:40163446 | DOI:10.1371/journal.pone.0317726