Literature Watch
KalmanFormer: using transformer to model the Kalman Gain in Kalman Filters
Front Neurorobot. 2025 Jan 7;18:1460255. doi: 10.3389/fnbot.2024.1460255. eCollection 2024.
ABSTRACT
INTRODUCTION: Tracking the hidden states of dynamic systems is a fundamental task in signal processing. Recursive Kalman Filters (KF) are widely regarded as an efficient solution for linear and Gaussian systems, offering low computational complexity. However, real-world applications often involve non-linear dynamics, making it challenging for traditional Kalman Filters to achieve accurate state estimation. Additionally, the accurate modeling of system dynamics and noise in practical scenarios is often difficult. To address these limitations, we propose the KalmanFormer, a hybrid model-driven and data-driven state estimator. By leveraging data, the KalmanFormer promotes the performance of state estimation under non-linear conditions and partial information scenarios.
METHODS: The proposed KalmanFormer integrates classical Kalman Filter with a Transformer framework. Specifically, it utilizes the Transformer to learn the Kalman Gain directly from data without requiring prior knowledge of noise parameters. The learned Kalman Gain is then incorporated into the standard Kalman Filter workflow, enabling the system to better handle non-linearities and model mismatches. The hybrid approach combines the strengths of data-driven learning and model-driven methodologies to achieve robust state estimation.
RESULTS AND DISCUSSION: To evaluate the effectiveness of KalmanFormer, we conducted numerical experiments in both synthetic and real-world dataset. The results demonstrate that KalmanFormer outperforms the classical Extended Kalman Filter (EKF) in the same settings. It achieves superior accuracy in tracking hidden states, demonstrating resilience to non-linearities and imprecise system models.
PMID:39840232 | PMC:PMC11747084 | DOI:10.3389/fnbot.2024.1460255
Mid-infrared spectra of dried and roasted cocoa (<em>Theobroma cacao</em> L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine content
Data Brief. 2024 Dec 19;58:111243. doi: 10.1016/j.dib.2024.111243. eCollection 2025 Feb.
ABSTRACT
This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.
PMID:39840227 | PMC:PMC11748727 | DOI:10.1016/j.dib.2024.111243
Role of Artificial Intelligence in MRI-Based Rectal Cancer Staging: A Systematic Review
Cureus. 2024 Dec 22;16(12):e76185. doi: 10.7759/cureus.76185. eCollection 2024 Dec.
ABSTRACT
Several studies explored the application of artificial intelligence (AI) in magnetic resonance imaging (MRI)-based rectal cancer (RC) staging, but a comprehensive evaluation remains lacking. This systematic review aims to review the performance of AI models in MRI-based RC staging. PubMed and Embase were searched from the inception of the database till October 2024 without any language and year restrictions. The prospective or retrospective studies evaluating AI models (including machine learning (ML) and deep learning (DL)) for diagnostic performance in MRI-based RC staging compared with any comparator were included in this review. The performance metrics were considered as outcomes. Two independent reviewers were involved in the study selection and data extraction to limit bias; any disagreements were resolved through mutual consensus or discussion with a third reviewer. A total of 716 records were identified from the databases. Out of these, 14 studies (1.95%) were finally included in this review. These studies were published between 2019 and 2024. Various MRI technologies were adapted by the studies and multiple AI models were developed. DL was the most common. The MRI images including T1-weighted images (14.28%), T2-weighted images (85.71%), diffusion-weighted images (42.85%), or the combination of these from different landscapes and systems were used to develop the AI models. The models were built using various techniques, mainly DL such as conventional neural network (28.57%), DL reconstruction (14.28%), Weakly supervISed model DevelOpment fraMework (7.12%), deep neural network (7.12%), Faster region-based CNN (7.12%), ResNet, DL-based clinical-radiomics nomogram (7.12%), LASSO (7.12%), and random forest classifier (7.12%). All the models that used single-type images or combined imaging modalities showed a better performance than manual assessment in terms of higher accuracy, sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and area under the curve with a score of >0.75. This is considered to be a good performance. The current study indicates that MRI-based AI models for RC staging show great promise with a high performance.
PMID:39840208 | PMC:PMC11748814 | DOI:10.7759/cureus.76185
Optical coherence tomography-enabled classification of the human venoatrial junction
J Biomed Opt. 2025 Jan;30(1):016005. doi: 10.1117/1.JBO.30.1.016005. Epub 2025 Jan 21.
ABSTRACT
SIGNIFICANCE: Radiofrequency ablation to treat atrial fibrillation (AF) involves isolating the pulmonary vein from the left atria to prevent AF from occurring. However, creating ablation lesions within the pulmonary veins can cause adverse complications.
AIM: We propose automated classification algorithms to classify optical coherence tomography (OCT) volumes of human venoatrial junctions.
APPROACH: A dataset of comprehensive OCT volumes of 26 venoatrial junctions was used for this study. Texture, statistical, and optical features were extracted from OCT patches. Patches were classified as a left atrium or pulmonary vein using random forest (RF), logistic regression (LR), and convolutional neural networks (CNNs). The features were inputs into the RF and LR classifiers. The inputs to the CNNs included: (1) patches and (2) an ensemble of patches and patch-derived features.
RESULTS: Utilizing a sevenfold cross-validation, the patch-only CNN balances sensitivity and specificity best, with an area under the receiver operating characteristic (AUROC) curve of 0.84 ± 0.109 across the test sets. RF is more sensitive than LR, with an AUROC curve of 0.78 ± 0.102 .
CONCLUSIONS: Cardiac tissues can be identified in benchtop OCT images by automated analysis. Extending this analysis to data obtained in vivo is required to tune automated analysis further. Performing this classification in vivo could aid doctors in identifying substrates of interest and treating AF.
PMID:39840147 | PMC:PMC11747903 | DOI:10.1117/1.JBO.30.1.016005
Artificial intelligence-driven identification and mechanistic exploration of synergistic anti-breast cancer compound combinations from <em>Prunella vulgaris</em> L.-<em>Taraxacum mongolicum</em> Hand.-Mazz. herb pair
Front Pharmacol. 2025 Jan 7;15:1522787. doi: 10.3389/fphar.2024.1522787. eCollection 2024.
ABSTRACT
INTRODUCTION: The Prunella vulgaris L. (PVL) and Taraxacum mongolicum Hand.-Mazz. (TH) herb pair, which is commonly used in traditional Chinese medicine (TCM), has been applied for the treatment of breast cancer. Although its efficacy is validated, the synergistic anti-breast cancer compound combinations within this herb pair and their underlying mechanisms of action remain unclear.
METHODS: This study aimed to identify and validate synergistic anti-breast cancer compound combinations within the PVL-TH pair using large-scale biomedical data, artificial intelligence and experimental methods. The first step was to investigate the anti-breast cancer effects of various PVL and TH extracts using in vitro cellular assays to identify the most effective superior extracts. These superior extracts were subjected to liquid chromatography-mass spectrometry (LC-MS) analysis to identify their constituent compounds. A deep learning-based prediction model, DeepMDS, was applied to predict synergistic anti-breast cancer multi-compound combinations. These predicted combinations were experimentally validated for their anti-breast cancer effects at actual content ratios found in the extracts. Preliminary bioinformatics analyses were conducted to explore the mechanisms of action of these superior combinations. We also compared the anti-breast cancer effects of superior extracts from different geographical origins and analyzed the contents of compounds to assess their representation of the anti-tumor effect of the corresponding TCM.
RESULTS: The results revealed that LC-MS analysis identified 27 and 21 compounds in the superior extracts (50% ethanol extracts) of PVL and TH, respectively. Based on these compounds, DeepMDS model predicted synergistic anti-breast cancer compound combinations such as F973 (caffeic acid, rosmarinic acid, p-coumaric acid, and esculetin), T271 (chlorogenic acid, cichoric acid, and caffeic acid), and T1685 (chlorogenic acid, rosmarinic acid, and scopoletin) from single PVL, single TH and PVL-TH herb pair, respectively. These combinations, at their actual concentrations in extracts, demonstrated superior anti-breast cancer activity compared to the corresponding extracts. The bioinformatics analysis revealed that these compounds could regulate tumor-related pathways synergistically, inhibiting tumor cell growth, inducing cell apoptosis, and blocking cell cycle progression. Furthermore, the concentration ratio and total content of compounds in F973 and T271 were closely associated with their anti-breast cancer effects in extracts from various geographical origins. The compound combination T1685 could represent the synergistic anti-breast cancer effects of the PVL-TH pair.
DISCUSSION: This study provides insights into exploring the representative synergistic anti-breast cancer compound combinations within the complex TCM.
PMID:39840098 | PMC:PMC11747269 | DOI:10.3389/fphar.2024.1522787
WDR74-Mediated Ribosome Biogenesis and Proteome Dynamics During Mouse Preimplantation Development
Genes Cells. 2025 Jan;30(1):e70001. doi: 10.1111/gtc.70001.
ABSTRACT
Preimplantation embryonic development is orchestrated by dynamic changes in the proteome and transcriptome, regulated by mechanisms such as maternal-to-zygotic transition. Here, we employed label-free quantitative proteomics to comprehensively analyze proteome dynamics from germinal vesicle oocytes to blastocysts in mouse embryos. We identified 3490 proteins, including 715 consistently detected across all stages, revealing stage-specific changes in proteins associated with translation, protein modification, and mitochondrial metabolism. Comparison with transcriptomic data highlighted a low correlation between mRNA and protein levels, underscoring the significance of non-transcriptional regulatory mechanisms during early development. Additionally, we analyzed WD repeat-containing protein 74 (WDR74)-deficient embryos generated using CRISPR-Cas9 genome editing. WDR74, a pre-60S ribosome maturation factor, was found to be critical for ribosome biogenesis and cell division. Furthermore, WDR74 deficiency led to a significant reduction in ribosomal protein large subunit and impaired progression beyond the morula stage. Key ribosomal proteins such as ribosomal protein L24 (RPL24) and ribosomal protein L26 (RPL26), which influence cell division timing, were notably affected, while small subunit proteins remained largely unchanged. Taken together, our study demonstrates the utility of integrating genome editing with proteomic analysis to elucidate molecular mechanisms underlying early embryogenesis, and provides new insights into protein-level regulation of preimplantation development.
PMID:39840464 | DOI:10.1111/gtc.70001
Review of cancer cell volatile organic compounds: their metabolism and evolution
Front Mol Biosci. 2025 Jan 7;11:1499104. doi: 10.3389/fmolb.2024.1499104. eCollection 2024.
ABSTRACT
Cancer is ranked as the top cause of premature mortality. Volatile organic compounds (VOCs) are produced from catalytic peroxidation by reactive oxygen species (ROS) and have become a highly attractive non-invasive cancer screening approach. For future clinical applications, however, the correlation between cancer hallmarks and cancer-specific VOCs requires further study. This review discusses and compares cellular metabolism, signal transduction as well as mitochondrial metabolite translocation in view of cancer evolution and the basic biology of VOCs production. Certain cancerous characteristics as well as the origin of the ROS removal system date back to procaryotes and early eukaryotes and share commonalities with non-cancerous proliferative cells. This calls for future studies on metabolic cross talks and regulation of the VOCs production pathway.
PMID:39840075 | PMC:PMC11747368 | DOI:10.3389/fmolb.2024.1499104
Magnitude and dynamics of the T-cell response to SARS-CoV-2 infection at both individual and population levels
Front Immunol. 2025 Jan 7;15:1488860. doi: 10.3389/fimmu.2024.1488860. eCollection 2024.
ABSTRACT
INTRODUCTION: T cells are involved in the early identification and clearance of viral infections and also support the development of antibodies by B cells. This central role for T cells makes them a desirable target for assessing the immune response to SARS-CoV-2 infection.
METHODS: Here, we combined two high-throughput immune profiling methods to create a quantitative picture of the T-cell response to SARS-CoV-2. First, at the individual level, we deeply characterized 3 acutely infected and 58 recovered COVID-19 subjects by experimentally mapping their CD8 T-cell response through antigen stimulation to 545 Human Leukocyte Antigen (HLA) class I presented viral peptides. Then, at the population level, we performed T-cell repertoire sequencing on 1,815 samples (from 1,521 COVID-19 subjects) as well as 3,500 controls to identify shared "public" T-cell receptors (TCRs) associated with SARS-CoV-2 infection from both CD8 and CD4 T cells.
RESULTS: Collectively, our data reveal that CD8 T-cell responses are often driven by a few immunodominant, HLA-restricted epitopes. As expected, the T-cell response to SARS-CoV-2 peaks about one to two weeks after infection and is detectable for at least several months after recovery. As an application of these data, we trained a classifier to diagnose SARS-CoV-2 infection based solely on TCR sequencing from blood samples, and observed, at 99.8% specificity, high early sensitivity soon after diagnosis (Day 3-7 = 85.1% [95% CI = 79.9-89.7]; Day 8-14 = 94.8% [90.7-98.4]) as well as lasting sensitivity after recovery (Day 29+/convalescent = 95.4% [92.1-98.3]).
DISCUSSION: The approaches described in this work provide detailed insights into the adaptive immune response to SARS-CoV-2 infection, and they have potential applications in clinical diagnostics, vaccine development, and monitoring.
PMID:39840037 | PMC:PMC11747429 | DOI:10.3389/fimmu.2024.1488860
Comparison of sampling and culture methods for the recovery of yeast from hospital surfaces
Antimicrob Steward Healthc Epidemiol. 2025 Jan 17;5(1):e10. doi: 10.1017/ash.2024.481. eCollection 2025.
ABSTRACT
OBJECTIVE: To compare the recovery of yeast from hospital surfaces from two different collection methods: Eswab moistened with molecular water, and premoistened stick-mounted sponge.
DESIGN: Comparison of collection methods for the recovery of yeast in the hospital environment.
SETTING: This study took place at intensive care units of a large academic medical center.
PMID:39839357 | PMC:PMC11748012 | DOI:10.1017/ash.2024.481
MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays
Genome Biol. 2025 Jan 21;26(1):13. doi: 10.1186/s13059-025-03476-y.
ABSTRACT
Multiplexed assays of variant effect (MAVEs) are a critical tool for researchers and clinicians to understand genetic variants. Here we describe the 2024 update to MaveDB ( https://www.mavedb.org/ ) with four key improvements to the MAVE community's database of record: more available data including over 7 million variant effect measurements, an improved data model supporting assays such as saturation genome editing, new built-in exploration and visualization tools, and powerful APIs for data federation and streamlined submission and access. Together these changes support MaveDB's role as a hub for the analysis and dissemination of MAVEs now and into the future.
PMID:39838450 | DOI:10.1186/s13059-025-03476-y
Hepatotoxicity of statins: a real-world study based on the US Food and Drug Administration Adverse Event Reporting System database
Front Pharmacol. 2025 Jan 7;15:1502791. doi: 10.3389/fphar.2024.1502791. eCollection 2024.
ABSTRACT
BACKGROUND: Statins, as an important class of lipid-lowering drugs, play a key role in the prevention and treatment of cardiovascular diseases. However, with their widespread use in clinical practice, some adverse events have gradually emerged. In particular, the hepatotoxicity associated with statins use has become one of the clinical concerns that require sufficient attention.
METHODS: In this study, we conducted a comprehensive and detailed analysis of the hepatotoxicity of statins based on the data of the US Food and Drug Administration Adverse Event Reporting System database from the first quarter (Q1) of 2004 to the Q1 of 2024 and used Reporting Odds Ratios and Empirical Bayes Geometric Mean to mine the signal of adverse events.
RESULTS: In this study, hepatic disorder related seven statins all exhibited positive signals. Through signal mining, we identified a total of 14,511 cases of adverse events associated with hepatic disorder caused by these statin drugs, with atorvastatin, simvastatin, and rosuvastatin occurring at a higher rate. A total of 148 positive signals related to adverse events of hepatic disorder were captured. Autoimmune hepatitis and drug-induced liver injury both presented positive signals across multiple statin drugs. Notably, atorvastatin had the most significant signal strength in cholestatic pruritus and bilirubin conjugation abnormal. Fluvastatin also showed notable signal strength in autoimmune hepatitis, while simvastatin had a relatively weaker signal strength for hepatic enzyme increased.
CONCLUSION: This study discovered specific adverse event signal values, revealing potential hepatotoxic risks associated with the use of statin drugs. The results provide an important reference for the safe clinical use of drugs, help to improve the understanding of the safety of statins, and also provide a scientific basis for clinicians to make more accurate and safe decisions when making treatment plans.
PMID:39840096 | PMC:PMC11747658 | DOI:10.3389/fphar.2024.1502791
Notice to Extend the Expiration Date for PA-22-051 AHRQ Mentored Career Enhancement Awards for Established Investigators in Patient-Centered Outcome Research (K18)
Notice to Extend Expiration dates for AHRQ PA-22-049 and PA-22-050
Notice of NIAMS-specific language in RFA-NR-25-003 Transformative Research to Address Health Disparities and Advance Health Equity (U01 Clinical Trial Optional)
Notice of Intent to Publish a Notice of Funding Opportunity for the Re-issue of RFA-AG-23-016 Transition to Aging Research for Predoctoral Students (F99/K00 - Clinical Trial Not Allowed)
Notice of Special Interest (NOSI): Digital Technology for Early Detection and Monitoring of Alzheimers Disease (AD) and AD-Related Dementias (ADRD)
Data-driven model discovery and model selection for noisy biological systems
PLoS Comput Biol. 2025 Jan 21;21(1):e1012762. doi: 10.1371/journal.pcbi.1012762. eCollection 2025 Jan.
ABSTRACT
Biological systems exhibit complex dynamics that differential equations can often adeptly represent. Ordinary differential equation models are widespread; until recently their construction has required extensive prior knowledge of the system. Machine learning methods offer alternative means of model construction: differential equation models can be learnt from data via model discovery using sparse identification of nonlinear dynamics (SINDy). However, SINDy struggles with realistic levels of biological noise and is limited in its ability to incorporate prior knowledge of the system. We propose a data-driven framework for model discovery and model selection using hybrid dynamical systems: partial models containing missing terms. Neural networks are used to approximate the unknown dynamics of a system, enabling the denoising of the data while simultaneously learning the latent dynamics. Simulations from the fitted neural network are then used to infer models using sparse regression. We show, via model selection, that model discovery using hybrid dynamical systems outperforms alternative approaches. We find it possible to infer models correctly up to high levels of biological noise of different types. We demonstrate the potential to learn models from sparse, noisy data in application to a canonical cell state transition using data derived from single-cell transcriptomics. Overall, this approach provides a practical framework for model discovery in biology in cases where data are noisy and sparse, of particular utility when the underlying biological mechanisms are partially but incompletely known.
PMID:39836686 | DOI:10.1371/journal.pcbi.1012762
A network-based systems genetics framework identifies pathobiology and drug repurposing in Parkinson's disease
NPJ Parkinsons Dis. 2025 Jan 22;11(1):22. doi: 10.1038/s41531-025-00870-y.
ABSTRACT
Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder. However, current treatments only manage symptoms and lack the ability to slow or prevent disease progression. We utilized a systems genetics approach to identify potential risk genes and repurposable drugs for PD. First, we leveraged non-coding genome-wide association studies (GWAS) loci effects on five types of brain-specific quantitative trait loci (xQTLs, including expression, protein, splicing, methylation and histone acetylation) under the protein-protein interactome (PPI) network. We then prioritized 175 PD likely risk genes (pdRGs), such as SNCA, CTSB, LRRK2, DGKQ, and CD44, which are enriched in druggable targets and differentially expressed genes across multiple human brain-specific cell types. Integrating network proximity-based drug repurposing and patient electronic health record (EHR) data observations, we identified Simvastatin as being significantly associated with reduced incidence of PD (hazard ratio (HR) = 0.91 for fall outcome, 95% confidence interval (CI): 0.87-0.94; HR = 0.88 for dementia outcome, 95% CI: 0.86-0.89) after adjusting for 267 covariates. In summary, our network-based systems genetics framework identifies potential risk genes and repurposable drugs for PD and other neurodegenerative diseases if broadly applied.
PMID:39837893 | DOI:10.1038/s41531-025-00870-y
Comprehensive evaluation of pure and hybrid collaborative filtering in drug repurposing
Sci Rep. 2025 Jan 21;15(1):2711. doi: 10.1038/s41598-025-85927-x.
ABSTRACT
Drug development is known to be a costly and time-consuming process, which is prone to high failure rates. Drug repurposing allows drug discovery by reusing already approved compounds. The outcomes of past clinical trials can be used to predict novel drug-disease associations by leveraging drug- and disease-related similarities. To tackle this classification problem, collaborative filtering with implicit feedback (and potentially additional data on drugs and diseases) has become popular. It can handle large imbalances between negative and positive known associations and known and unknown associations. However, properly evaluating the improvement over the state of the art is challenging, as there is no consensus approach to compare models. We propose a reproducible methodology for comparing collaborative filtering-based drug repurposing. We illustrate this method by comparing 11 models from the literature on eight diverse drug repurposing datasets. Based on this benchmark, we derive guidelines to ensure a fair and comprehensive evaluation of the performance of those models. In particular, an uncontrolled bias on unknown associations might lead to severe data leakage and a misestimation of the model's true performance. Moreover, in drug repurposing, the ability of a model to extrapolate beyond its training distribution is crucial and should also be assessed. Finally, we identified a subcategory of collaborative filtering that seems efficient and robust to distribution shifts. Benchmarks constitute an essential step towards increased reproducibility and more accessible development of competitive drug repurposing methods.
PMID:39837888 | DOI:10.1038/s41598-025-85927-x
Mitochondrial DNA variants and their impact on epigenetic and biological aging in young adulthood
Transl Psychiatry. 2025 Jan 22;15(1):16. doi: 10.1038/s41398-025-03235-4.
ABSTRACT
The pace of biological aging varies between people independently of chronological age and mitochondria dysfunction is a key hallmark of biological aging. We hypothesized that higher functional impact (FI) score of mitochondrial DNA (mtDNA) variants might contribute to premature aging and tested the relationships between a novel FI score of mtDNA variants and epigenetic and biological aging in young adulthood. A total of 81 participants from the European Longitudinal Study of Pregnancy and Childhood (ELSPAC) prenatal birth cohort had good quality genetic data as well as blood-based markers to estimate biological aging in the late 20. A subset of these participants (n = 69) also had epigenetic data to estimate epigenetic aging in the early 20s using Horvath's epigenetic clock. The novel FI score was calculated based on 7 potentially pathogenic mtDNA variants. Greater FI score of mtDNA variants was associated with older epigenetic age in the early 20s and older biological age in the late 20s. These medium to large effects were independent of sex, current BMI, cigarette smoking, cannabis, and alcohol use. These findings suggest that elevated FI score of mtDNA variants might contribute to premature aging in young adulthood.
PMID:39837837 | DOI:10.1038/s41398-025-03235-4
Pages
