Systems Biology
Transcription factor prediction using protein 3D secondary structures
Bioinformatics. 2025 Jan 9:btae762. doi: 10.1093/bioinformatics/btae762. Online ahead of print.
ABSTRACT
MOTIVATION: Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs. Recently proposed TF prediction methods do not rely on DBDs. Such methods use features of protein sequences to train a machine learning model, and then use the trained model to predict whether a protein is a TF or not. Because the 3-dimensional (3D) structure of a protein captures more information than its sequence, using 3D protein structures will likely allow for more accurate prediction of novel TFs.
RESULTS: We propose a deep learning-based TF prediction method (StrucTFactor), which is the first method to utilize 3D secondary structural information of proteins. We compare StrucTFactor with recent state-of-the-art TF prediction methods based on ∼525 000 proteins across 12 datasets, capturing different aspects of data bias (including sequence redundancy) possibly influencing a method's performance. We find that StrucTFactor significantly (p-value<0.001) outperforms the existing TF prediction methods, improving the performance over its closest competitor by up to 17% based on Matthews correlation coefficient.
AVAILABILITY: Data and source code are available at https://github.com/lieboldj/StrucTFactor and on our website at https://apps.cosy.bio/StrucTFactor/.
SUPPLEMENTARY INFORMATION: Included.
PMID:39786868 | DOI:10.1093/bioinformatics/btae762
A new pipeline SPICE identifies novel JUN-IKZF1 composite elements
Elife. 2025 Jan 9;12:RP88833. doi: 10.7554/eLife.88833.
ABSTRACT
Transcription factor partners can cooperatively bind to DNA composite elements to augment gene transcription. Here, we report a novel protein-DNA binding screening pipeline, termed Spacing Preference Identification of Composite Elements (SPICE), that can systematically predict protein binding partners and DNA motif spacing preferences. Using SPICE, we successfully identified known composite elements, such as AP1-IRF composite elements (AICEs) and STAT5 tetramers, and also uncovered several novel binding partners, including JUN-IKZF1 composite elements. One such novel interaction was identified at CNS9, an upstream conserved noncoding region in the human IL10 gene, which harbors a non-canonical IKZF1 binding site. We confirmed the cooperative binding of JUN and IKZF1 and showed that the activity of an IL10-luciferase reporter construct in primary B and T cells depended on both this site and the AP1 binding site within this composite element. Overall, our findings reveal an unappreciated global association of IKZF1 and AP1 and establish SPICE as a valuable new pipeline for predicting novel transcription binding complexes.
PMID:39786853 | DOI:10.7554/eLife.88833
Exploring the Tomato Root Protein Network Exploited by Core Type 3 Effectors from the <em>Ralstonia solanacearum</em> Species Complex
J Proteome Res. 2025 Jan 9. doi: 10.1021/acs.jproteome.4c00757. Online ahead of print.
ABSTRACT
Proteomics has become a powerful approach for the identification and characterization of type III effectors (T3Es). Members of the Ralstonia solanacearum species complex (RSSC) deploy T3Es to manipulate host cells and to promote root infection of, among others, a wide range of solanaceous plants such as tomato, potato, and tobacco. Here, we used TurboID-mediated proximity labeling (PL) in tomato hairy root cultures to explore the proxeomes of the core RSSC T3Es RipU, RipD, and RipB. The RipU proxeome was enriched for multiple protein kinases, suggesting a potential impact on the two branches of the plant immune surveillance system, being the membrane-localized PAMP-triggered immunity (PTI) and the RIN4-dependent effector-triggered immunity (ETI) complexes. In agreement, a transcriptomics analysis in tomato revealed the potential involvement of RipU in modulating reactive oxygen species (ROS) signaling. The proxeome of RipB was putatively enriched for mitochondrial and chloroplast proteins and that of RipD for proteins potentially involved in the endomembrane system. Together, our results demonstrate that TurboID-PL in tomato hairy roots represents a promising tool to study Ralstonia T3E targets and functioning and that it can unravel potential host processes that can be hijacked by the bacterial pathogen.
PMID:39786355 | DOI:10.1021/acs.jproteome.4c00757
Public Health
Alzheimers Dement. 2024 Dec;20 Suppl 7:e091942. doi: 10.1002/alz.091942.
ABSTRACT
BACKGROUND: The Coaching for Cognition in Alzheimer's (COCOA) Trial was a prospective RCT testing a remotely coached multimodal lifestyle intervention for participants early on the Alzheimer's disease spectrum. Intervention focused on diet, exercise, cognitive training, sleep, stress, and social engagement. Enrollment criteria targeted individuals with cognitive decline who were able to engage remotely with a professional coach. COCOA demonstrated cognitive and functional benefits. Dense omics data were collected on 53 individuals (≥ 58 years).
METHODS: We sought to identify blood analytes that mediated the effects of specific elements of the multimodal intervention on specific outcomes. Outcomes were assessed with the MCI Screen (MCIS) and the Functional Assessment Staging Tool (FAST). We combined these and other measures with proteomics and metabolomics data. We analyzed the resulting dataset of over 300,000 distinct molecular data points-reflecting over 1400 measures- assayed over a period of two years. We used MEGENA to hierarchically multiscale cluster analytes based on correlated responses and identified individual metabolites and functional clusters associated with each intervention and outcome. We analyzed individual time courses of key analyte mediators to illustrate personalized effects of interventions and individualized functional and cognitive outcomes.
RESULTS: Distinct sets of correlated serum analytes ("communities") convey effects to functional (FAST) outcome and to cognitive (MCIS) outcome. Distinct communities respond to different modalities of intervention. Participants followed different aspects of the multimodal recommendations to different extents, and the analytes in their blood also responded idiosyncratically; analyte trajectories in different individuals show distinct dynamics. We made personalized predictions of future inflections in outcome based on observed changes in key serum mediators. We validated results with data from the Precision Recommendations for Environmental Variables, Exercise, Nutrition and Training Interventions to Optimize Neurocognition (PREVENTION) Trial.
CONCLUSIONS: Lifestyle interventions have profound effects on blood metabolites (Figure 1). These in turn convey subtler specific effects to cognition and broad-based effects to function. Pathways that ameliorate the impact of AD via lifestyle interventions in some individuals include nitrogen subsystems, kidney function, and mitochondrial metabolism. These highlight the importance of clinical attention to overall health spanning multiple organ systems in individuals across the Alzheimer's disease spectrum.
PMID:39784630 | DOI:10.1002/alz.091942
Public Health
Alzheimers Dement. 2024 Dec;20 Suppl 7:e092449. doi: 10.1002/alz.092449.
ABSTRACT
BACKGROUND: Medical management and lifestyle are potentially crucial interventions for Alzheimer's disease (AD). In this study, we present the relationship between adherence of a personalized multi-modal intervention for AD on change in cerebral blood flow (CBF) after 12-months.
METHODS: The PREVENTION study is an ongoing randomized clinical trial (McEwen, 2001). Thirty-three participants with biomarker evidence of amyloidosis had completed the study at the time of the analysis (Table 1). While both arms received personalized multi-modal lifestyle recommendations and four medical visits, the active arm also received dietary counseling, group physical and cognitive exercise, health coaching, and nutritional supplements free of charge. We examined the effects of the 1) the intervention and 2) adherence on CBF. We hypothesized that 1) the active arm and 2) higher intervention adherence would have improved CBF in regions related to level of physical activity (Kleinloog, 2019; Chapman, 2013) and those pertinent to AD. CBF was assessed using arterial spin labeling (ASL). Adherence was measured using the clinician rating scale (CRS), which uses a scale of 1-7 (Kemp, 1998). Participants were divided into two groups based on a cutoff of 5 (passive acceptance). One participant was excluded from this analysis due to missing CRS data. Effects were assessed using a two-tailed t-test.
RESULTS: Treatment arms did not differ in any demographic measures at baseline or CRS. Preliminary findings indicate that regional blood flow declined over one year across the whole sample (Table 2). However, individuals with higher adherence experienced increased blood flow in the fusiform gyrus and less blood flow reduction, compared to those with lower adherence, in the anterior cingulate and hippocampus. Findings were borderline significant in the fusiform and anterior cingulate, but not the hippocampus.
CONCLUSIONS: In this small sample, we found evidence that higher adherence increased or attenuated decline in CBF in regions impacted by physical activity, one modality of the PREVENTION intervention. We did not see an effect in the hippocampus, possibly due to small sample size. We did not find an effect of treatment arm, potentially because both receive recommendations and medical management, and did not differ in adherence.
PMID:39784611 | DOI:10.1002/alz.092449
Biomarkers
Alzheimers Dement. 2024 Dec;20 Suppl 2:e092688. doi: 10.1002/alz.092688.
ABSTRACT
BACKGROUND: Faced with a rapidly aging population and the rising prevalence of Alzheimer's disease (AD) and related dementias, the field needs to urgently consider screening tools that utilize widely accessible data modalities. We have previously shown that lower-cost data, operationalized as data modalities accessible at primary care visits, can indeed accurately predict AD clinical diagnosis and that clustering these data can provide useful information. Here, we apply a similar approach to predicting histopathological status.
METHOD: We first applied our previously-developed feature extraction method based on a supervised encoder (SE) to transform potentially noisy input features while maintaining or amplifying relevant information. We next performed classification and clustering to stratify subjects by their neuropathology. Here, we compared two traditional classification methods with a novel Bayesian clustering-classification algorithm called an Infinite Mixture Classifier (IMC). We identified distinct trajectories of subjects based upon changes in cluster assignment over time. Data for this study come from the National Alzheimer's Coordinating Center, funded by NIA/NIH Grant U24 AG072122 and contributed to by NIA-funded ADRCs.
RESULT: We found that relatively high classification accuracy of neuropathologic lesions was possible using widely accessible, lower cost clinical data. In addition, the supervised clusters, derived from using the SE's latent features and from the IMC, held meaningful clinical diagnostic information that differentiates subjects along the clinical and pathologic continuum. When clusters were derived using longitudinal clinical data, we further observed distinct trajectories of subjects across time as their cluster assignments changed. These trajectory subgroups have significantly different risk of showcasing each type of neuropathologic lesion obtained from postmortem neuropathology.
CONCLUSION: Our framework benefits from the combined strengths of clustering and classification methods while avoiding drawbacks of unsupervised methods. By using lower cost features, which could be obtained at Medicare annual wellness visits, this work is broadly generalizable and has direct implications for screening of neuropathologic lesions of AD and related dementias for the public. As blood biomarkers become more accessible, our framework can be easily extended to include additional data to improve screening for neuropathology using widely accessible data.
PMID:39784083 | DOI:10.1002/alz.092688
Drought induced metabolic shifts and water loss mechanisms in canola: role of cysteine, phenylalanine and aspartic acid
Front Plant Sci. 2024 Dec 23;15:1385414. doi: 10.3389/fpls.2024.1385414. eCollection 2024.
ABSTRACT
Drought conditions severely curtail the ability of plants to accumulate biomass due to the closure of stomata and the decrease of photosynthetic assimilation rate. Additionally, there is a shift in the plant's metabolic processes toward the production of metabolites that offer protection and aid in osmoadaptation, as opposed to those required for development and growth. To limit water loss via non-stomatal transpiration, plants adjust the load and composition of cuticle waxes, which act as an additional barrier. This study investigates the impact of soil water deficit on stomatal and epicuticular water losses, as well as metabolic adjustments in two canola (Brassica napus L.) cultivars-one drought-tolerant and the other drought-sensitive. Specifically, we examined the effect of a drought treatment, which involved reducing water holding capacity to 40%, on the levels of cysteine, sucrose, and abscisic acid (ABA) in the leaves of both cultivars. Next, we looked for potential differences in night, predawn, and early morning transpiration rates and the epicuticular wax load and composition in response to drought. A substantial rise in leaf cysteine was observed in both canola cultivars in response to drought, and a strong correlation was found between cysteine, ABA, and stomatal conductance, indicating that cysteine and sulfur may play a role in controlling stomatal movement during drought stress. Attributes related to CO2 diffusion (stomatal and mesophyll conductance) and photosynthetic capacity were different between the two canola cultivars suggesting a better management of water relations under stress by the drought-tolerant cultivar. Epicuticular waxes were found to adjust in response to drought, acting as an additional barrier against water loss. Surprisingly, both canola cultivars responded similarly to the metabolites (cysteine, sucrose, and ABA) and epicuticular waxes, indicating that they were not reliable stress markers in our test setup. However, the higher level of phenylalanine in the drought-tolerant canola cultivar is suggestive that this amino acid is important for adaptation to drier climates. Furthermore, a multitrait genotype-ideotype distance index (MGIDI) revealed the likely role of aspartic acid in sustaining nitrogen and carbon for immediate photosynthetic resumption after drought episodes. In conclusion, leveraging amino acid knowledge in agriculture can enhance crop yield and bolster resistance to environmental challenges.
PMID:39781188 | PMC:PMC11707614 | DOI:10.3389/fpls.2024.1385414
Analysis of different strains of the turquoise killifish identify transcriptomic signatures associated with heritable lifespan differences
J Gerontol A Biol Sci Med Sci. 2025 Jan 9:glae255. doi: 10.1093/gerona/glae255. Online ahead of print.
ABSTRACT
The African turquoise killifish Nothobranchius furzeri represents an emerging short-lived model for aging research. Captive strains of this species are characterized by large differences in lifespan. To identify the gene expression correlates of this lifespan differences, we analyzed a public transcriptomic dataset consisting of four different tissues in addition to embryos. We focused on the GRZ and the MZM0410 captive strains, which show a near twofold difference in lifespan, but similar growth and maturation and validated the results in a newly-generated dataset from a third longer-lived strain. The two strains show distinct transcriptome expression patterns already as embryos and the genotype has a larger effect than age on gene expression, both in terms of number of differentially expressed genes and magnitude of regulation. Network analysis detected RNA processing and histone modifications as the most prominent categories upregulated in GRZ that also showed idiosyncratic expression patterns such as high expression of DND is somatic tissues. The short-lived GRZ strain shows transcriptional aging signatures already at sexual maturity (anticipated aging) in all four tissues suggesting that short lifespan is the results of events that occur early in life rather than the progressive accumulation of strain-dependent differences. The GRZ strain is the most commonly used N. furzeri strain in intervention studies and our results warrant replication of at least key intervention studies in longer-lived strains.
PMID:39780413 | DOI:10.1093/gerona/glae255
Effective Mucosal Adjuvantation of the Intranasal Enterovirus A71 Vaccine With Zymosan
Immunology. 2025 Jan 8. doi: 10.1111/imm.13895. Online ahead of print.
ABSTRACT
Enterovirus A71 (EV-A71) has caused hand, foot, and mouth disease with an increased prevalence of neurological complications and acute mortality, threatening young children around the globe. By provoking mucosal immunity, intranasal vaccination has been suggested to prevent EV-A71 infection. However, antigens delivered via the nasal route usually fail to induce a protective memory response. Zymosan has been identified to activate multiple pattern recognition receptors to orchestrate innate and adaptive immunity. Herein, we aimed to investigate the capacity of zymosan to strengthen the vaccine response induced by an intranasal EV-A71 vaccine. First, we confirmed its remarkable capacity to ignite innate signaling by upregulating cytokine production in primary DCs in vitro. Second, we verified its capacity to promote the vaccine immunogenicity in vivo after triple vaccination with EV-A71, especially with the notable induction of virus-specific IgA at multiple mucosae and the IL-17-producing splenic population after antigen reencounter. Lastly, we validated its capacity to improve vaccine efficacy in vivo after dual vaccination by furnishing neonatal protection against lethal infection. Our findings show that zymosan, at a preferable dosage, could augment the benefits of the intranasal vaccination to tackle EV-A71 infection. This research provides a feasible strategy for preventing EV-A71 infection with severe complications and contributes to the development of nasal spray vaccination.
PMID:39780346 | DOI:10.1111/imm.13895
Elevated miR-221-3p inhibits epithelial-mesenchymal transition and biochemical recurrence of prostate cancer via targeting KPNA2: an evidence-based and knowledge-guided strategy
BMC Cancer. 2025 Jan 8;25(1):34. doi: 10.1186/s12885-025-13444-1.
ABSTRACT
BACKGROUND: Prostate cancer (PCa) is commonly occurred among males worldwide and its prognosis could be influenced by biochemical recurrence (BCR). MicroRNAs (miRNAs) are functional regulators in carcinogenesis, and miR-221-3p was reported as one of the significant candidates deregulated in PCa. However, its regulatory pattern in PCa BCR across literature reports was not consistent, and the targets and mechanisms in PCa malignant transition and BCR are less explored.
METHODS: In this study, an evidence-based and knowledge-guided approach was proposed to decipher the role and mechanism of miR-221-3p in PCa development. First, the literature-reported inconsistency between miR-221-3p and PCa BCR was quantitatively measured by meta-analysis. Then a knowledge-guided network strategy was applied to prioritize key targets of miR-221-3p in PCa progression based both on topological and functional characterization of genes in multi-omics miRNA-mRNA and protein-protein interaction networks. Finally, a key gene was computationally identified and experimentally validated using cell line and clinical samples through EdU assay, scratch assay, transwell assay, dual-luciferase reporter assay and the epithelial-to-mesenchymal transition (EMT)-related analysis.
RESULTS: Down-regulation of miR-221-3p was correlated with a lower biochemical recurrence-free survival (BRFS) in PCa (HR: 0.72, 95%, CI: 0.64-0.81, P < 0.00001). A significant down-regulation of miR-221-3p was observed in most of the PCa cells compared with the normal control. KPNA2 was identified as a key target of miR-221-3p and it was over-expressed in all the PCa cells and human PCa tissues. Moreover, elevated miR-221-3p inhibited the proliferation, migration, invasion, and EMT of PCa cells in vitro via directly and negatively mediating KPNA2 expression.
CONCLUSIONS: miR-221-3p down-regulation was a risk factor for PCa BRFS, and its over-expression could inhibit the malignant phenotype and EMT of PCa cells by directly targeting KPNA2. Translational and personalized applications of the findings will be conducted in the future.
PMID:39780096 | DOI:10.1186/s12885-025-13444-1
Gene regulation by convergent promoters
Nat Genet. 2025 Jan 6. doi: 10.1038/s41588-024-02025-w. Online ahead of print.
ABSTRACT
Convergent transcription, that is, the collision of sense and antisense transcription, is ubiquitous in mammalian genomes and believed to diminish RNA expression. Recently, antisense transcription downstream of promoters was found to be surprisingly prevalent. However, functional characteristics of affected promoters are poorly investigated. Here we show that convergent transcription marks an unexpected positively co-regulated promoter constellation. By assessing transcriptional dynamic systems, we identified co-regulated constituent promoters connected through a distinct chromatin structure. Within these cis-regulatory domains, transcription factors can regulate both constituting promoters by binding to only one of them. Convergent promoters comprise about a quarter of all active transcript start sites and initiate 5'-overlapping antisense RNAs-an RNA class believed previously to be rare. Visualization of nascent RNA molecules reveals convergent cotranscription at these loci. Together, our results demonstrate that co-regulated convergent promoters substantially expand the cis-regulatory repertoire, reveal limitations of the transcription interference model and call for adjusting the promoter concept.
PMID:39779959 | DOI:10.1038/s41588-024-02025-w
Transcript-specific enrichment enables profiling of rare cell states via single-cell RNA sequencing
Nat Genet. 2025 Jan 8. doi: 10.1038/s41588-024-02036-7. Online ahead of print.
ABSTRACT
Single-cell genomics technologies have accelerated our understanding of cell-state heterogeneity in diverse contexts. Although single-cell RNA sequencing identifies rare populations that express specific marker transcript combinations, traditional flow sorting requires cell surface markers with high-fidelity antibodies, limiting our ability to interrogate these populations. In addition, many single-cell studies require the isolation of nuclei from tissue, eliminating the ability to enrich learned rare cell states based on extranuclear protein markers. In the present report, we addressed these limitations by developing Programmable Enrichment via RNA FlowFISH by sequencing (PERFF-seq), a scalable assay that enables scRNA-seq profiling of subpopulations defined by the abundance of specific RNA transcripts. Across immune populations (n = 184,126 cells) and fresh-frozen and formalin-fixed, paraffin-embedded brain tissue (n = 33,145 nuclei), we demonstrated that programmable sorting logic via RNA-based cytometry can isolate rare cell populations and uncover phenotypic heterogeneity via downstream, high-throughput, single-cell genomics analyses.
PMID:39779958 | DOI:10.1038/s41588-024-02036-7
Profiling the epigenome using long-read sequencing
Nat Genet. 2025 Jan 8. doi: 10.1038/s41588-024-02038-5. Online ahead of print.
ABSTRACT
The advent of single-molecule, long-read sequencing (LRS) technologies by Oxford Nanopore Technologies and Pacific Biosciences has revolutionized genomics, transcriptomics and, more recently, epigenomics research. These technologies offer distinct advantages, including the direct detection of methylated DNA and simultaneous assessment of DNA sequences spanning multiple kilobases along with their modifications at the single-molecule level. This has enabled the development of new assays for analyzing chromatin states and made it possible to integrate data for DNA methylation, chromatin accessibility, transcription factor binding and histone modifications, thereby facilitating comprehensive epigenomic profiling. Owing to recent advancements, alternative, nascent and translating transcripts can be detected using LRS approaches. This Review discusses LRS-based experimental and computational strategies for characterizing chromatin states and highlights their advantages over short-read sequencing methods. Furthermore, we demonstrate how various long-read methods can be integrated to design multi-omics studies to investigate the relationship between chromatin states and transcriptional dynamics.
PMID:39779955 | DOI:10.1038/s41588-024-02038-5
β-Glucan reprograms neutrophils to promote disease tolerance against influenza A virus
Nat Immunol. 2025 Jan 8. doi: 10.1038/s41590-024-02041-2. Online ahead of print.
ABSTRACT
Disease tolerance is an evolutionarily conserved host defense strategy that preserves tissue integrity and physiology without affecting pathogen load. Unlike host resistance, the mechanisms underlying disease tolerance remain poorly understood. In the present study, we investigated whether an adjuvant (β-glucan) can reprogram innate immunity to provide protection against influenza A virus (IAV) infection. β-Glucan treatment reduces the morbidity and mortality against IAV infection, independent of host resistance. The enhanced survival is the result of increased recruitment of neutrophils via RoRγt+ T cells in the lung tissue. β-Glucan treatment promotes granulopoiesis in a type 1 interferon-dependent manner that leads to the generation of a unique subset of immature neutrophils utilizing a mitochondrial oxidative metabolism and producing interleukin-10. Collectively, our data indicate that β-glucan reprograms hematopoietic stem cells to generate neutrophils with a new 'regulatory' function, which is required for promoting disease tolerance and maintaining lung tissue integrity against viral infection.
PMID:39779870 | DOI:10.1038/s41590-024-02041-2
A rare PRIMER cell state in plant immunity
Nature. 2025 Jan 8. doi: 10.1038/s41586-024-08383-z. Online ahead of print.
ABSTRACT
Plants lack specialized and mobile immune cells. Consequently, any cell type that encounters pathogens must mount immune responses and communicate with surrounding cells for successful defence. However, the diversity, spatial organization and function of cellular immune states in pathogen-infected plants are poorly understood1. Here we infect Arabidopsis thaliana leaves with bacterial pathogens that trigger or supress immune responses and integrate time-resolved single-cell transcriptomic, epigenomic and spatial transcriptomic data to identify cell states. We describe cell-state-specific gene-regulatory logic that involves transcription factors, putative cis-regulatory elements and target genes associated with disease and immunity. We show that a rare cell population emerges at the nexus of immune-active hotspots, which we designate as primary immune responder (PRIMER) cells. PRIMER cells have non-canonical immune signatures, exemplified by the expression and genome accessibility of a previously uncharacterized transcription factor, GT-3A, which contributes to plant immunity against bacterial pathogens. PRIMER cells are surrounded by another cell state (bystander) that activates genes for long-distance cell-to-cell immune signalling. Together, our findings suggest that interactions between these cell states propagate immune responses across the leaf. Our molecularly defined single-cell spatiotemporal atlas provides functional and regulatory insights into immune cell states in plants.
PMID:39779856 | DOI:10.1038/s41586-024-08383-z
A foundation model of transcription across human cell types
Nature. 2025 Jan 8. doi: 10.1038/s41586-024-08391-z. Online ahead of print.
ABSTRACT
Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types1,2. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types3. GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks. We evaluated its performance in prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors and found that it outperforms current models4 in predicting lentivirus-based massively parallel reporter assay readout5,6. In fetal erythroblasts7, we identified distal (greater than 1 Mbp) regulatory regions that were missed by previous models, and, in B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukaemia risk predisposing germline mutation8-10. In sum, we provide a generalizable and accurate model for transcription together with catalogues of gene regulation and transcription factor interactions, all with cell type specificity.
PMID:39779852 | DOI:10.1038/s41586-024-08391-z
Bidirectional histone monoaminylation dynamics regulate neural rhythmicity
Nature. 2025 Jan 8. doi: 10.1038/s41586-024-08371-3. Online ahead of print.
ABSTRACT
Histone H3 monoaminylations at Gln5 represent an important family of epigenetic marks in brain that have critical roles in permissive gene expression1-3. We previously demonstrated that serotonylation4-10 and dopaminylation9,11-13 of Gln5 of histone H3 (H3Q5ser and H3Q5dop, respectively) are catalysed by transglutaminase 2 (TG2), and alter both local and global chromatin states. Here we found that TG2 additionally functions as an eraser and exchanger of H3 monoaminylations, including H3Q5 histaminylation (H3Q5his), which displays diurnally rhythmic expression in brain and contributes to circadian gene expression and behaviour. We found that H3Q5his, in contrast to H3Q5ser, inhibits the binding of WDR5, a core member of histone H3 Lys4 (H3K4) methyltransferase complexes, thereby antagonizing methyltransferase activities on H3K4. Taken together, these data elucidate a mechanism through which a single chromatin regulatory enzyme has the ability to sense chemical microenvironments to affect the epigenetic states of cells, the dynamics of which have critical roles in the regulation of neural rhythmicity.
PMID:39779849 | DOI:10.1038/s41586-024-08371-3
Saturation genome editing-based clinical classification of BRCA2 variants
Nature. 2025 Jan 8. doi: 10.1038/s41586-024-08349-1. Online ahead of print.
ABSTRACT
Sequencing-based genetic tests have uncovered a vast array of BRCA2 sequence variants1. Owing to limited clinical, familial and epidemiological data, thousands of variants are considered to be variants of uncertain significance2-4 (VUS). Here we have utilized CRISPR-Cas9-based saturation genome editing in a humanized mouse embryonic stem cell line to determine the functional effect of VUS. We have categorized nearly all possible single nucleotide variants (SNVs) in the region that encodes the carboxylate-terminal DNA-binding domain of BRCA2. We have generated function scores for 6,551 SNVs, covering 96.4% of possible SNVs in exons 15-26 spanning BRCA2 residues 2479-3216. These variants include 1,282 SNVs that are categorized as missense VUS in the clinical variant database ClinVar, with 77.2% of these classified as benign and 20.4% classified as pathogenic using our functional score. Our assay provides evidence that 3,384 of the SNVs in the region are benign and 776 are pathogenic. Our classification aligns closely with pathogenicity data from ClinVar, orthogonal functional assays and computational meta predictors. We have integrated our embryonic stem cell-based BRCA2-saturation genome editing dataset with other available evidence and utilized the American College of Medical Genetics and Genomics/Association for Molecular Pathology guidelines for clinical classification of all possible SNVs. This classification is available as a sequence-function map and serves as a valuable resource for interpreting unidentified variants in the population and for physicians and genetic counsellors to assess BRCA2 VUS in patients.
PMID:39779848 | DOI:10.1038/s41586-024-08349-1
Erythema nodosum, malignant melanoma and non-melanoma skin cancer in relation to inflammatory bowel disease: a Mendelian randomization study
Sci Rep. 2025 Jan 8;15(1):1369. doi: 10.1038/s41598-025-85249-y.
ABSTRACT
Inflammatory bowel disease (IBD) is a multisystem condition that could affect the cutaneous systems, namely cutaneous extraintestinal manifestations (EIMs). It has been suggested that IBD is associated with erythema nodosum (EN), malignant melanoma (MM) and non-melanoma skin cancer (NMSC). However, the potential causal relationship between IBD and the mentioned above cutaneous EIMs is still unclear. This study aims to determine the effect of IBD on EN, MM and NMSC within a Mendelian randomization (MR) design. Summary-level data for IBD, EN, MM, NMSC were obtained from large-scale genome-wide association studies. We utilized five different methods, including the inverse variance weighted model (IVW), MR Egger, Weighted median, Simple mode, Weighted mode in the MR analysis, then the Cochran's Q test, the MR-Egger pleiotropy test, the MR-PRESSO global pleiotropy test and leave-one-out sensitivity test were used to evaluate the heterogeneity and pleiotropy of identified IVs. To further ensure the validity of our findings, we evaluated the strength of the instrumental variables using the F-statistic and estimated the statistical power of our study. Findings were verified using an independent validation dataset, as well as through different MR methods with different model assumptions. MR analysis suggested that genetically determined IBD had a detrimental causal effect on NMSC (IVW: odds ratio [OR] = 1.002037, 95% confidence interval [CI] = 1.0001150-1.003962, P = 0.03776677), but not on EN (IVW: [OR] = 1.0937191, 95% [CI] = 0.9685831-1.235022, P = 0.1484349) and MM (IVW: [OR] = 0.9998064, 95% [CI] = 0.9994885-1.000124, P = 0.2326482). Besides, a positive causal effect of IBD on NMSC was verified in an independent validation dataset (IVW: [OR] = 1.002651, 95% [CI] = 1.0006524-1.004654, P = 0.009307506). The present study corroborated the causal relationship between IBD and NMSC. In contrast, our results showed no evidence of a causal association of IBD on EN and MM. These findings provide new insights into increasing attention to patients with IBD to prevent concurrent NMSC.
PMID:39779820 | DOI:10.1038/s41598-025-85249-y
Artificial intelligence for body composition assessment focusing on sarcopenia
Sci Rep. 2025 Jan 8;15(1):1324. doi: 10.1038/s41598-024-83401-8.
ABSTRACT
This study aimed to address the limitations of conventional methods for measuring skeletal muscle mass for sarcopenia diagnosis by introducing an artificial intelligence (AI) system for direct computed tomography (CT) analysis. The primary focus was on enhancing simplicity, reproducibility, and convenience, and assessing the accuracy and speed of AI compared with conventional methods. A cohort of 3096 cases undergoing CT imaging up to the third lumbar (L3) level between 2011 and 2021 were included. Random division into preprocessing and sarcopenia cohorts was performed, with further random splits into training and validation cohorts for BMI_AI and Body_AI creation. Sarcopenia_AI utilizes the Skeletal Muscle Index (SMI), which is calculated as (total skeletal muscle area at L3)/(height)2. The SMI was conventionally measured twice, with the first as the AI label reference and the second for comparison. Agreement and diagnostic change rates were calculated. Three groups were randomly assigned and 10 images before and after L3 were collected for each case. AI models for body region detection (Deeplabv3) and sarcopenia diagnosis (EfficientNetV2-XL) were trained on a supercomputer, and their abilities and speed per image were evaluated. The conventional method showed a low agreement rate (κ coefficient) of 0.478 for the test cohort and 0.236 for the validation cohort, with diagnostic changes in 43% of cases. Conversely, the AI consistently produced identical results after two measurements. The AI demonstrated robust body region detection ability (intersection over Union (IoU) = 0.93), accurately detecting only the body region in all images. The AI for sarcopenia diagnosis exhibited high accuracy, with a sensitivity of 82.3%, specificity of 98.1%, and a positive predictive value of 89.5%. In conclusion, the reproducibility of the conventional method for sarcopenia diagnosis was low. The developed sarcopenia diagnostic AI, with its high positive predictive value and convenient diagnostic capabilities, is a promising alternative for addressing the shortcomings of conventional approaches.
PMID:39779762 | DOI:10.1038/s41598-024-83401-8