Systems Biology
Correlating the succession of microbial communities from Nigerian soils to petroleum biodegradation
World J Microbiol Biotechnol. 2023 Jul 1;39(9):239. doi: 10.1007/s11274-023-03656-7.
ABSTRACT
Whilst biodegradation of different hydrocarbon components has been widely demonstrated to occur by specialist oil-degrading bacteria, less is known about the impact on microbial communities as a function of oil composition by comparing the biodegradation of chemically complex fuels to synthetic products. The objectives of this study were (i) to assess the biodegradation capacity and succession of microbial communities isolated from Nigerian soils in media with crude oil or synthetic oil as sole sources of carbon and energy, and (ii) to assess the temporal variability of the microbial community size. Community profiling was done using 16 S rRNA gene amplicon sequencing (Illumina), and oil profiling using gas chromatography. The biodegradation of natural and synthetic oil differed probably due to the content of sulfur that may interfere with the biodegradation of hydrocarbons. Both alkanes and PAHs in the natural oil were biodegraded faster than in the synthetic oil. Variable community responses were observed during the degradation of alkanes and more simple aromatic compounds, but at later phases of growth they became more homogeneous. The degradation capacity and the size of the community from the more-contaminated soil were higher than those from the less-contaminated soil. Six abundant organisms isolated from the cultures were found to biodegrade oil molecules in pure cultures. Ultimately, this knowledge may contribute to a better understanding of how to improve the biodegradation of crude oil by optimizing culturing conditions through inoculation or bioaugmentation of specific bacteria during ex-situ biodegradation such as biodigesters or landfarming.
PMID:37392206 | DOI:10.1007/s11274-023-03656-7
Molecular network of the oil palm root response to aluminum stress
BMC Plant Biol. 2023 Jun 30;23(1):346. doi: 10.1186/s12870-023-04354-0.
ABSTRACT
BACKGROUND: The solubilization of aluminum ions (Al3+) that results from soil acidity (pH < 5.5) is a limiting factor in oil palm yield. Al can be uptaken by the plant roots affecting DNA replication and cell division and triggering root morphological alterations, nutrient and water deprivation. In different oil palm-producing countries, oil palm is planted in acidic soils, representing a challenge for achieving high productivity. Several studies have reported the morphological, physiological, and biochemical oil palm mechanisms in response to Al-stress. However, the molecular mechanisms are just partially understood.
RESULTS: Differential gene expression and network analysis of four contrasting oil palm genotypes (IRHO 7001, CTR 3-0-12, CR 10-0-2, and CD 19 - 12) exposed to Al-stress helped to identify a set of genes and modules involved in oil palm early response to the metal. Networks including the ABA-independent transcription factors DREB1F and NAC and the calcium sensor Calmodulin-like (CML) that could induce the expression of internal detoxifying enzymes GRXC1, PER15, ROMT, ZSS1, BBI, and HS1 against Al-stress were identified. Also, some gene networks pinpoint the role of secondary metabolites like polyphenols, sesquiterpenoids, and antimicrobial components in reducing oxidative stress in oil palm seedlings. STOP1 expression could be the first step of the induction of common Al-response genes as an external detoxification mechanism mediated by ABA-dependent pathways.
CONCLUSIONS: Twelve hub genes were validated in this study, supporting the reliability of the experimental design and network analysis. Differential expression analysis and systems biology approaches provide a better understanding of the molecular network mechanisms of the response to aluminum stress in oil palm roots. These findings settled a basis for further functional characterization of candidate genes associated with Al-stress in oil palm.
PMID:37391695 | DOI:10.1186/s12870-023-04354-0
The TNFR1 antagonist Atrosimab reduces neuronal loss, glial activation and memory deficits in an acute mouse model of neurodegeneration
Sci Rep. 2023 Jun 30;13(1):10622. doi: 10.1038/s41598-023-36846-2.
ABSTRACT
Tumor necrosis factor alpha (TNF-α) and its key role in modulating immune responses has been widely recognized as a therapeutic target for inflammatory and neurodegenerative diseases. Even though inhibition of TNF-α is beneficial for the treatment of certain inflammatory diseases, total neutralization of TNF-α largely failed in the treatment of neurodegenerative diseases. TNF-α exerts distinct functions depending on interaction with its two TNF receptors, whereby TNF receptor 1 (TNFR1) is associated with neuroinflammation and apoptosis and TNF receptor 2 (TNFR2) with neuroprotection and immune regulation. Here, we investigated the effect of administering the TNFR1-specific antagonist Atrosimab, as strategy to block TNFR1 signaling while maintaining TNFR2 signaling unaltered, in an acute mouse model for neurodegeneration. In this model, a NMDA-induced lesion that mimics various hallmarks of neurodegenerative diseases, such as memory loss and cell death, was created in the nucleus basalis magnocellularis and Atrosimab or control protein was administered centrally. We showed that Atrosimab attenuated cognitive impairments and reduced neuroinflammation and neuronal cell death. Our results demonstrate that Atrosimab is effective in ameliorating disease symptoms in an acute neurodegenerative mouse model. Altogether, our study indicates that Atrosimab may be a promising candidate for the development of a therapeutic strategy for the treatment of neurodegenerative diseases.
PMID:37391534 | DOI:10.1038/s41598-023-36846-2
Small RNA sequencing of field Culex mosquitoes identifies patterns of viral infection and the mosquito immune response
Sci Rep. 2023 Jun 30;13(1):10598. doi: 10.1038/s41598-023-37571-6.
ABSTRACT
Mosquito-borne disease remains a significant burden on global health. In the United States, the major threat posed by mosquitoes is transmission of arboviruses, including West Nile virus by mosquitoes of the Culex genus. Virus metagenomic analysis of mosquito small RNA using deep sequencing and advanced bioinformatic tools enables the rapid detection of viruses and other infecting organisms, both pathogenic and non-pathogenic to humans, without any precedent knowledge. In this study, we sequenced small RNA samples from over 60 pools of Culex mosquitoes from two major areas of Southern California from 2017 to 2019 to elucidate the virome and immune responses of Culex. Our results demonstrated that small RNAs not only allowed the detection of viruses but also revealed distinct patterns of viral infection based on location, Culex species, and time. We also identified miRNAs that are most likely involved in Culex immune responses to viruses and Wolbachia bacteria, and show the utility of using small RNA to detect antiviral immune pathways including piRNAs against some pathogens. Collectively, these findings show that deep sequencing of small RNA can be used for virus discovery and surveillance. One could also conceive that such work could be accomplished in various locations across the world and over time to better understand patterns of mosquito infection and immune response to many vector-borne diseases in field samples.
PMID:37391513 | DOI:10.1038/s41598-023-37571-6
Single-cell profiling of lncRNA expression during Ebola virus infection in rhesus macaques
Nat Commun. 2023 Jun 30;14(1):3866. doi: 10.1038/s41467-023-39627-7.
ABSTRACT
Long non-coding RNAs (lncRNAs) are involved in numerous biological processes and are pivotal mediators of the immune response, yet little is known about their properties at the single-cell level. Here, we generate a multi-tissue bulk RNAseq dataset from Ebola virus (EBOV) infected and not-infected rhesus macaques and identified 3979 novel lncRNAs. To profile lncRNA expression dynamics in immune circulating single-cells during EBOV infection, we design a metric, Upsilon, to estimate cell-type specificity. Our analysis reveals that lncRNAs are expressed in fewer cells than protein-coding genes, but they are not expressed at lower levels nor are they more cell-type specific when expressed in the same number of cells. In addition, we observe that lncRNAs exhibit similar changes in expression patterns to those of protein-coding genes during EBOV infection, and are often co-expressed with known immune regulators. A few lncRNAs change expression specifically upon EBOV entry in the cell. This study sheds light on the differential features of lncRNAs and protein-coding genes and paves the way for future single-cell lncRNA studies.
PMID:37391481 | DOI:10.1038/s41467-023-39627-7
Cortical somatostatin interneuron subtypes form cell-type-specific circuits
Neuron. 2023 Jun 22:S0896-6273(23)00435-X. doi: 10.1016/j.neuron.2023.05.032. Online ahead of print.
ABSTRACT
The cardinal classes are a useful simplification of cortical interneuron diversity, but such broad subgroupings gloss over the molecular, morphological, and circuit specificity of interneuron subtypes, most notably among the somatostatin interneuron class. Although there is evidence that this diversity is functionally relevant, the circuit implications of this diversity are unknown. To address this knowledge gap, we designed a series of genetic strategies to target the breadth of somatostatin interneuron subtypes and found that each subtype possesses a unique laminar organization and stereotyped axonal projection pattern. Using these strategies, we examined the afferent and efferent connectivity of three subtypes (two Martinotti and one non-Martinotti) and demonstrated that they possess selective connectivity with intratelecephalic or pyramidal tract neurons. Even when two subtypes targeted the same pyramidal cell type, their synaptic targeting proved selective for particular dendritic compartments. We thus provide evidence that subtypes of somatostatin interneurons form cell-type-specific cortical circuits.
PMID:37390821 | DOI:10.1016/j.neuron.2023.05.032
Alternative Identification of Glycosides Using MS/MS Matching with an In Silico-Modified Aglycone Mass Spectra Library
Anal Chem. 2023 Jun 30. doi: 10.1021/acs.analchem.3c00957. Online ahead of print.
ABSTRACT
Glycosylation of metabolites serves multiple purposes. Adding sugars makes metabolites more water soluble and improves their biodistribution, stability, and detoxification. In plants, the increase in melting points enables storing otherwise volatile compounds that are released by hydrolysis when needed. Classically, glycosylated metabolites were identified by mass spectrometry (MS/MS) using [M-sugar] neutral losses. Herein, we studied 71 pairs of glycosides with their respective aglycones, including hexose, pentose, and glucuronide moieties. Using liquid chromatography (LC) coupled to electrospray ionization high-resolution mass spectrometry, we detected the classic [M-sugar] product ions for only 68% of glycosides. Instead, we found that most aglycone MS/MS product ions were conserved in the MS/MS spectra of their corresponding glycosides, even when no [M-sugar] neutral losses were observed. We added pentose and hexose units to the precursor masses of an MS/MS library of 3057 aglycones to enable rapid identification of glycosylated natural products with standard MS/MS search algorithms. When searching unknown compounds in untargeted LC-MS/MS metabolomics data of chocolate and tea, we structurally annotated 108 novel glycosides in standard MS-DIAL data processing. We uploaded this new in silico-glycosylated product MS/MS library to GitHub to enable users to detect natural product glycosides without authentic chemical standards.
PMID:37390485 | DOI:10.1021/acs.analchem.3c00957
A guide to the BRAIN Initiative Cell Census Network data ecosystem
PLoS Biol. 2023 Jun 30;21(6):e3002133. doi: 10.1371/journal.pbio.3002133. eCollection 2023 Jun.
ABSTRACT
Characterizing cellular diversity at different levels of biological organization and across data modalities is a prerequisite to understanding the function of cell types in the brain. Classification of neurons is also essential to manipulate cell types in controlled ways and to understand their variation and vulnerability in brain disorders. The BRAIN Initiative Cell Census Network (BICCN) is an integrated network of data-generating centers, data archives, and data standards developers, with the goal of systematic multimodal brain cell type profiling and characterization. Emphasis of the BICCN is on the whole mouse brain with demonstration of prototype feasibility for human and nonhuman primate (NHP) brains. Here, we provide a guide to the cellular and spatial approaches employed by the BICCN, and to accessing and using these data and extensive resources, including the BRAIN Cell Data Center (BCDC), which serves to manage and integrate data across the ecosystem. We illustrate the power of the BICCN data ecosystem through vignettes highlighting several BICCN analysis and visualization tools. Finally, we present emerging standards that have been developed or adopted toward Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience. The combined BICCN ecosystem provides a comprehensive resource for the exploration and analysis of cell types in the brain.
PMID:37390046 | DOI:10.1371/journal.pbio.3002133
Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
mBio. 2023 Jun 30:e0104623. doi: 10.1128/mbio.01046-23. Online ahead of print.
ABSTRACT
High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCE When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
PMID:37389439 | DOI:10.1128/mbio.01046-23
KG-Hub - Building and Exchanging Biological Knowledge Graphs
Bioinformatics. 2023 Jun 30:btad418. doi: 10.1093/bioinformatics/btad418. Online ahead of print.
ABSTRACT
MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking.
RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simple, modular extract-transform-load (ETL) pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate knowledge graphs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification.
AVAILABILITY AND IMPLEMENTATION: https://kghub.org.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID:37389415 | DOI:10.1093/bioinformatics/btad418
Precise and versatile microplate reader-based analyses of biosensor signals from arrayed microbial colonies
Front Microbiol. 2023 Jun 14;14:1187228. doi: 10.3389/fmicb.2023.1187228. eCollection 2023.
ABSTRACT
Genetically encoded fluorescent biosensors have emerged as a powerful tool to support phenotypic screenings of microbes. Optical analyses of fluorescent sensor signals from colonies grown on solid media can be challenging as imaging devices need to be equipped with appropriate filters matching the properties of fluorescent biosensors. Toward versatile fluorescence analyses of different types of biosensor signals derived from arrayed colonies, we investigate here the use of monochromator equipped microplate readers as an alternative to imaging approaches. Indeed, for analyses of the LacI-controlled expression of the reporter mCherry in Corynebacterium glutamicum, or promoter activity using GFP as reporter in Saccharomyces cerevisiae, an improved sensitivity and dynamic range was observed for a microplate reader-based analyses compared to their analyses via imaging. The microplate reader allowed us to capture signals of ratiometric fluorescent reporter proteins (FRPs) with a high sensitivity and thereby to further improve the analysis of internal pH via the pH-sensitive FRP mCherryEA in Escherichia coli colonies. Applicability of this novel technique was further demonstrated by assessing redox states in C. glutamicum colonies using the FRP Mrx1-roGFP2. By the use of a microplate reader, oxidative redox shifts were measured in a mutant strain lacking the non-enzymatic antioxidant mycothiol (MSH), indicating its major role for maintaining a reduced redox state also in colonies on agar plates. Taken together, analyses of biosensor signals from microbial colonies using a microplate reader allows comprehensive phenotypic screenings and thus facilitates further development of new strains for metabolic engineering and systems biology.
PMID:37389345 | PMC:PMC10303141 | DOI:10.3389/fmicb.2023.1187228
Exploring the crop epigenome: a comparison of DNA methylation profiling techniques
Front Plant Sci. 2023 May 30;14:1181039. doi: 10.3389/fpls.2023.1181039. eCollection 2023.
ABSTRACT
Epigenetic modifications play a vital role in the preservation of genome integrity and in the regulation of gene expression. DNA methylation, one of the key mechanisms of epigenetic control, impacts growth, development, stress response and adaptability of all organisms, including plants. The detection of DNA methylation marks is crucial for understanding the mechanisms underlying these processes and for developing strategies to improve productivity and stress resistance of crop plants. There are different methods for detecting plant DNA methylation, such as bisulfite sequencing, methylation-sensitive amplified polymorphism, genome-wide DNA methylation analysis, methylated DNA immunoprecipitation sequencing, reduced representation bisulfite sequencing, MS and immuno-based techniques. These profiling approaches vary in many aspects, including DNA input, resolution, genomic region coverage, and bioinformatics analysis. Selecting an appropriate methylation screening approach requires an understanding of all these techniques. This review provides an overview of DNA methylation profiling methods in crop plants, along with comparisons of the efficacy of these techniques between model and crop plants. The strengths and limitations of each methodological approach are outlined, and the importance of considering both technical and biological factors are highlighted. Additionally, methods for modulating DNA methylation in model and crop species are presented. Overall, this review will assist scientists in making informed decisions when selecting an appropriate DNA methylation profiling method.
PMID:37389288 | PMC:PMC10306282 | DOI:10.3389/fpls.2023.1181039
Spectrum of <em>PRSS1</em>, <em>SPINK1</em>, <em>CTRC</em>, <em>CFTR</em>, and <em>CPA1</em> Gene Variants in Chronic Pancreatitis Patients in Russia
Sovrem Tekhnologii Med. 2023;15(2):60-70. doi: 10.17691/stm2023.15.2.06. Epub 2023 Mar 29.
ABSTRACT
The aim of the study was to define the spectrum of genetic risk factors of chronic pancreatitis (CP) development in patients living in the European part of the Russian Federation.
MATERIALS AND METHODS: The study group included 105 patients with CP, with the age of the disease onset under 40 years old (the average age of onset was 26.9 years). The control group consisted of 76 persons without clinical signs of pancreatitis. The diagnosis of chronic pancreatitis in patients was made on the basis of clinical manifestations and the results of laboratory and instrumental investigations. Genetic examination of patients was conducted using the next-generation sequencing (NGS) technology and included targeted sequencing of all exons and exon-intron boundaries of the PRSS1, SPINK1, CTRC, CFTR, and CPA1 genes. The genotyping of the rs61734659 locus of the PRSS2 gene was also conducted.
RESULTS: Genetic risk factors of the CP development were found in 61% of patients. Pathogenic and likely-pathogenic variants associated with the risk of CP development were identified in the following genes: CTRC (37.1% of patients), CFTR (18.1%), SPINK1 (8.6%), PRSS1 (8.6%), and CPA1 (6.7%). The frequent gene variants in Russian patients with CP were as follows: CTRC gene - c.180C>T (rs497078), c.760C>T (rs121909293), c.738_761del24 (rs746224507); cumulative odds ratio (OR) for all risk alleles was 1.848 (95% CI: 1.054-3.243); CFTR gene - c.3485G>T (rs1800120), c.1521_1523delCTT (p.Phe508del, rs113993960), and c.650A>G (rs121909046); OR=2.432 (95% CI: 1.066-5.553). In the SPINK1, PRSS1, and CPA1 genes, pathogenic variants were found only in the group of patients with CP. The frequent variants of the SPINK1 gene include c.101A>G (p.Asn34Ser, rs17107315) and c.194+2T>C (rs148954387); of the PRSS1 gene - c.86A>T (p.Asn29Ile, rs111033566); of the CPA1 gene - c.586-30C>T (rs782335525) and c.696+23_696+24delGG. The OR for the CP development for the c.180TT genotype (rs497078) CTRC according to the recessive model (TT vs. CT+CC) was 7.05 (95% CI: 0.86-263, p=0.011). In the CTRC gene, the variant c.493+49G>C (rs6679763) appeared to be benign, the c.493+51C>A (rs10803384) variant was frequently detected among both the diseased and healthy persons and did not demonstrate a protective effect. The protective factor c.571G>A (p.Gly191Arg, rs61734659) of the PRSS2 gene was detected only in the group of healthy individuals and confirmed its protective role. 12.4% of the patients with CP had risk factors in 2 or 3 genes.
CONCLUSION: Sequencing of the coding regions of the PRSS1, SPINK1, CTRC, CFTR, and CPA1 genes allowed to identify genetic risk factors of the CP development in 61% of cases. Determining the genetic cause of CP helps to predict the disease course, perform preventive measures in the proband's relatives, and facilitate a personalized treatment of the patient in future.
PMID:37389024 | PMC:PMC10306969 | DOI:10.17691/stm2023.15.2.06
A systems biology approach uncovers novel disease mechanisms in age-related macular degeneration
Cell Genom. 2023 Apr 18;3(6):100302. doi: 10.1016/j.xgen.2023.100302. eCollection 2023 Jun 14.
ABSTRACT
Age-related macular degeneration (AMD) is a leading cause of blindness, affecting 200 million people worldwide. To identify genes that could be targeted for treatment, we created a molecular atlas at different stages of AMD. Our resource is comprised of RNA sequencing (RNA-seq) and DNA methylation microarrays from bulk macular retinal pigment epithelium (RPE)/choroid of clinically phenotyped normal and AMD donor eyes (n = 85), single-nucleus RNA-seq (164,399 cells), and single-nucleus assay for transposase-accessible chromatin (ATAC)-seq (125,822 cells) from the retina, RPE, and choroid of 6 AMD and 7 control donors. We identified 23 genome-wide significant loci differentially methylated in AMD, over 1,000 differentially expressed genes across different disease stages, and an AMD Müller state distinct from normal or gliosis. Chromatin accessibility peaks in genome-wide association study (GWAS) loci revealed putative causal genes for AMD, including HTRA1 and C6orf223. Our systems biology approach uncovered molecular mechanisms underlying AMD, including regulators of WNT signaling, FRZB and TLE2, as mechanistic players in disease.
PMID:37388919 | PMC:PMC10300496 | DOI:10.1016/j.xgen.2023.100302
Cardiovascular disease biomarkers derived from circulating cell-free DNA methylation
NAR Genom Bioinform. 2023 Jun 28;5(2):lqad061. doi: 10.1093/nargab/lqad061. eCollection 2023 Jun.
ABSTRACT
Acute coronary syndrome (ACS) remains a major cause of worldwide mortality. The syndrome occurs when blood flow to the heart muscle is decreased or blocked, causing muscle tissues to die or malfunction. There are three main types of ACS: Non-ST-elevation myocardial infarction, ST-elevation myocardial infarction, and unstable angina. The treatment depends on the type of ACS, and this is decided by a combination of clinical findings, such as electrocardiogram and plasma biomarkers. Circulating cell-free DNA (ccfDNA) is proposed as an additional marker for ACS since the damaged tissues can release DNA to the bloodstream. We used ccfDNA methylation profiles for differentiating between the ACS types and provided computational tools to repeat similar analysis for other diseases. We leveraged cell type specificity of DNA methylation to deconvolute the ccfDNA cell types of origin and to find methylation-based biomarkers that stratify patients. We identified hundreds of methylation markers associated with ACS types and validated them in an independent cohort. Many such markers were associated with genes involved in cardiovascular conditions and inflammation. ccfDNA methylation showed promise as a non-invasive diagnostic for acute coronary events. These methods are not limited to acute events, and may be used for chronic cardiovascular diseases as well.
PMID:37388821 | PMC:PMC10304763 | DOI:10.1093/nargab/lqad061
Higher-order genetic interaction discovery with network-based biological priors
Bioinformatics. 2023 Jun 30;39(Supplement_1):i523-i533. doi: 10.1093/bioinformatics/btad273.
ABSTRACT
MOTIVATION: Complex phenotypes, such as many common diseases and morphological traits, are controlled by multiple genetic factors, namely genetic mutations and genes, and are influenced by environmental conditions. Deciphering the genetics underlying such traits requires a systemic approach, where many different genetic factors and their interactions are considered simultaneously. Many association mapping techniques available nowadays follow this reasoning, but have some severe limitations. In particular, they require binary encodings for the genetic markers, forcing the user to decide beforehand whether to use, e.g. a recessive or a dominant encoding. Moreover, most methods cannot include any biological prior or are limited to testing only lower-order interactions among genes for association with the phenotype, potentially missing a large number of marker combinations.
RESULTS: We propose HOGImine, a novel algorithm that expands the class of discoverable genetic meta-markers by considering higher-order interactions of genes and by allowing multiple encodings for the genetic variants. Our experimental evaluation shows that the algorithm has a substantially higher statistical power compared to previous methods, allowing it to discover genetic mutations statistically associated with the phenotype at hand that could not be found before. Our method can exploit prior biological knowledge on gene interactions, such as protein-protein interaction networks, genetic pathways, and protein complexes, to restrict its search space. Since computing higher-order gene interactions poses a high computational burden, we also develop a more efficient search strategy and support computation to make our approach applicable in practice, leading to substantial runtime improvements compared to state-of-the-art methods.
AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/BorgwardtLab/HOGImine.
PMID:37387173 | DOI:10.1093/bioinformatics/btad273
Trap spaces of multi-valued networks: definition, computation, and applications
Bioinformatics. 2023 Jun 30;39(Supplement_1):i513-i522. doi: 10.1093/bioinformatics/btad262.
ABSTRACT
MOTIVATION: Boolean networks are simple but efficient mathematical formalism for modelling complex biological systems. However, having only two levels of activation is sometimes not enough to fully capture the dynamics of real-world biological systems. Hence, the need for multi-valued networks (MVNs), a generalization of Boolean networks. Despite the importance of MVNs for modelling biological systems, only limited progress has been made on developing theories, analysis methods, and tools that can support them. In particular, the recent use of trap spaces in Boolean networks made a great impact on the field of systems biology, but there has been no similar concept defined and studied for MVNs to date.
RESULTS: In this work, we generalize the concept of trap spaces in Boolean networks to that in MVNs. We then develop the theory and the analysis methods for trap spaces in MVNs. In particular, we implement all proposed methods in a Python package called trapmvn. Not only showing the applicability of our approach via a realistic case study, we also evaluate the time efficiency of the method on a large collection of real-world models. The experimental results confirm the time efficiency, which we believe enables more accurate analysis on larger and more complex multi-valued models.
AVAILABILITY AND IMPLEMENTATION: Source code and data are freely available at https://github.com/giang-trinh/trap-mvn.
PMID:37387165 | DOI:10.1093/bioinformatics/btad262
Transfer learning for drug-target interaction prediction
Bioinformatics. 2023 Jun 30;39(Supplement_1):i103-i110. doi: 10.1093/bioinformatics/btad234.
ABSTRACT
MOTIVATION: Utilizing AI-driven approaches for drug-target interaction (DTI) prediction require large volumes of training data which are not available for the majority of target proteins. In this study, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. To explore this idea, we selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. In two independent experiments, the protein families of transporters and nuclear receptors were individually set as the target datasets, while the remaining five families were used as the source datasets. Several size-based target family training datasets were formed in a controlled manner to assess the benefit provided by the transfer learning approach.
RESULTS: Here, we present a systematic evaluation of our approach by pre-training a feed-forward neural network with source training datasets and applying different modes of transfer learning from the pre-trained source network to a target dataset. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets.
AVAILABILITY AND IMPLEMENTATION: The source code and datasets are available at https://github.com/cansyl/TransferLearning4DTI. Our web-based service containing the ready-to-use pre-trained models is accessible at https://tl4dti.kansil.org.
PMID:37387156 | DOI:10.1093/bioinformatics/btad234
scKINETICS: inference of regulatory velocity with single-cell transcriptomics data
Bioinformatics. 2023 Jun 30;39(Supplement_1):i394-i403. doi: 10.1093/bioinformatics/btad267.
ABSTRACT
MOTIVATION: Transcriptional dynamics are governed by the action of regulatory proteins and are fundamental to systems ranging from normal development to disease. RNA velocity methods for tracking phenotypic dynamics ignore information on the regulatory drivers of gene expression variability through time.
RESULTS: We introduce scKINETICS (Key regulatory Interaction NETwork for Inferring Cell Speed), a dynamical model of gene expression change which is fit with the simultaneous learning of per-cell transcriptional velocities and a governing gene regulatory network. Fitting is accomplished through an expectation-maximization approach designed to learn the impact of each regulator on its target genes, leveraging biologically motivated priors from epigenetic data, gene-gene coexpression, and constraints on cells' future states imposed by the phenotypic manifold. Applying this approach to an acute pancreatitis dataset recapitulates a well-studied axis of acinar-to-ductal transdifferentiation whilst proposing novel regulators of this process, including factors with previously appreciated roles in driving pancreatic tumorigenesis. In benchmarking experiments, we show that scKINETICS successfully extends and improves existing velocity approaches to generate interpretable, mechanistic models of gene regulatory dynamics.
AVAILABILITY AND IMPLEMENTATION: All python code and an accompanying Jupyter notebook with demonstrations are available at http://github.com/dpeerlab/scKINETICS.
PMID:37387147 | DOI:10.1093/bioinformatics/btad267
A multilocus approach for accurate variant calling in low-copy repeats using whole-genome sequencing
Bioinformatics. 2023 Jun 30;39(Supplement_1):i279-i287. doi: 10.1093/bioinformatics/btad268.
ABSTRACT
MOTIVATION: Low-copy repeats (LCRs) or segmental duplications are long segments of duplicated DNA that cover > 5% of the human genome. Existing tools for variant calling using short reads exhibit low accuracy in LCRs due to ambiguity in read mapping and extensive copy number variation. Variants in more than 150 genes overlapping LCRs are associated with risk for human diseases.
METHODS: We describe a short-read variant calling method, ParascopyVC, that performs variant calling jointly across all repeat copies and utilizes reads independent of mapping quality in LCRs. To identify candidate variants, ParascopyVC aggregates reads mapped to different repeat copies and performs polyploid variant calling. Subsequently, paralogous sequence variants that can differentiate repeat copies are identified using population data and used for estimating the genotype of variants for each repeat copy.
RESULTS: On simulated whole-genome sequence data, ParascopyVC achieved higher precision (0.997) and recall (0.807) than three state-of-the-art variant callers (best precision = 0.956 for DeepVariant and best recall = 0.738 for GATK) in 167 LCR regions. Benchmarking of ParascopyVC using the genome-in-a-bottle high-confidence variant calls for HG002 genome showed that it achieved a very high precision of 0.991 and a high recall of 0.909 across LCR regions, significantly better than FreeBayes (precision = 0.954 and recall = 0.822), GATK (precision = 0.888 and recall = 0.873) and DeepVariant (precision = 0.983 and recall = 0.861). ParascopyVC demonstrated a consistently higher accuracy (mean F1 = 0.947) than other callers (best F1 = 0.908) across seven human genomes.
AVAILABILITY AND IMPLEMENTATION: ParascopyVC is implemented in Python and is freely available at https://github.com/tprodanov/ParascopyVC.
PMID:37387146 | DOI:10.1093/bioinformatics/btad268