Systems Biology

Bigtools: a high-performance BigWig and BigBed library in rust

Wed, 2024-06-05 06:00

Bioinformatics. 2024 Jun 5:btae350. doi: 10.1093/bioinformatics/btae350. Online ahead of print.

ABSTRACT

MOTIVATION: The BigWig and BigBed file formats were originally designed for the visualization of next-generation sequencing data through a genome browser. Due to their versatility, these formats have long since become ubiquitous for the storage of processed sequencing data and regularly serve as the basis for downstream data analysis. As the number and size of sequencing experiments continues to accelerate, there is an increasing demand to efficiently generate and query BigWig and BigBed files in a scalable and robust manner, and to efficiently integrate these functionalities into data analysis environments and third-party applications.

RESULTS: Here, we present Bigtools, a feature-complete, high-performance, and integrable software library for generating and querying both BigWig and BigBed files. Bigtools is written in the Rust programming language and includes a flexible suite of command line tools as well as bindings to Python.

AVAILABILITY AND IMPLEMENTATION: Bigtools is cross-platform and released under the MIT license. It is distributed on Crates.io, Bioconda, and the Python Package Index, and the source code is available at https://github.com/jackh726/bigtools.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

PMID:38837370 | DOI:10.1093/bioinformatics/btae350

Categories: Literature Watch

Integrated systems biology analysis of acute lymphoblastic leukemia: unveiling molecular signatures and drug repurposing opportunities

Wed, 2024-06-05 06:00

Ann Hematol. 2024 Jun 5. doi: 10.1007/s00277-024-05821-w. Online ahead of print.

ABSTRACT

Acute lymphoblastic leukemia (ALL) is a hematological malignancy characterized by aberrant proliferation and accumulation of lymphoid precursor cells within the bone marrow. The tyrosine kinase inhibitor (TKI), imatinib mesylate, has played a significant role in the treatment of Philadelphia chromosome-positive ALL (Ph + ALL). However, the achievement of durable and sustained therapeutic success remains a challenge due to the development of TKI resistance during the clinical course.The primary objective of this investigation is to propose a novel and efficacious treatment approach through drug repositioning, targeting ALL and its Ph + subtype by identifying and addressing differentially expressed genes (DEGs). This study involves a comprehensive analysis of transcriptome datasets pertaining to ALL and Ph + ALL in order to identify DEGs associated with the progression of these diseases to identify possible repurposable drugs that target identified hub proteins.The outcomes of this research have unveiled 698 disease-related DEGs for ALL and 100 for Ph + ALL. Furthermore, a subset of drugs, specifically glipizide for Ph + ALL, and maytansine and isoprenaline for ALL, have been identified as potential candidates for therapeutic intervention. Subsequently, cytotoxicity assessments were performed to confirm the in vitro cytotoxic effects of these selected drugs on both ALL and Ph + ALL cell lines.In conclusion, this study offers a promising avenue for the management of ALL and Ph + ALL through drug repurposed drugs. Further investigations are necessary to elucidate the mechanisms underlying cell death, and clinical trials are recommended to validate the promising results obtained through drug repositioning strategies.

PMID:38836918 | DOI:10.1007/s00277-024-05821-w

Categories: Literature Watch

Transient stabilization of human cardiovascular progenitor cells from hPSCs in vitro reflects stage-specific heart development in vivo

Wed, 2024-06-05 06:00

Cardiovasc Res. 2024 Jun 5:cvae118. doi: 10.1093/cvr/cvae118. Online ahead of print.

ABSTRACT

AIM: Understanding the molecular identity of human pluripotent stem cell (hPSC)-derived cardiac progenitors and mechanisms controlling their proliferation and differentiation, is valuable for developmental biology and regenerative medicine.

METHODS AND RESULTS: Here we show that chemical modulation of Histone Acetyl Transferases (HATs; by IQ-1) and WNT (by CHIR99021), synergistically enable the transient and reversible block of directed cardiac differentiation progression on hPSCs. The resulting stabilized cardiovascular progenitors (SCPs) are characterized by ISL1pos/KI-67pos/NKX2-5neg expression. In the presence of the chemical inhibitors, SCPs maintain a proliferation quiescent state. Upon small molecules removal SCPs resume proliferation and concomitant NKX2-5 upregulation triggers cell-autonomous differentiation into cardiomyocytes. Directed differentiation of SCPs into the endothelial and smooth muscle lineages confirms their full developmental potential typical of bona fide cardiovascular progenitors. Single-cell RNAseq-based transcriptional profiling of our in vitro generated human SCPs notably reflects the dynamic cellular composition of E8.25-E9.25 posterior second heart field (pSHF) of mouse hearts, hallmarked by NR2F2 expression. Investigating molecular mechanisms of SCP stabilization, we found that the cell-autonomously regulated Retinoic Acid (RA) and BMP signaling is governing SCPs transition from quiescence towards proliferation and cell-autonomous differentiation, reminiscent of a niche-like behavior.

CONCLUSION: The chemically defined and reversible nature our stabilization approach provides an unprecedented opportunity to dissect mechanisms of cardiovascular progenitors' specification and reveal their cellular and molecular properties.

PMID:38836637 | DOI:10.1093/cvr/cvae118

Categories: Literature Watch

Oxygen-resistant [FeFe]hydrogenases: new biocatalysis tools for clean energy and cascade reactions

Wed, 2024-06-05 06:00

Faraday Discuss. 2024 Jun 5. doi: 10.1039/d4fd00010b. Online ahead of print.

ABSTRACT

The use of enzymes to generate hydrogen, instead of using rare metal catalysts, is an exciting area of study in modern biochemistry and biotechnology, as well as biocatalysis driven by sustainable hydrogen. Thus far, the oxygen sensitivity of the fastest hydrogen-producing/exploiting enzymes, [FeFe]hydrogenases, has hindered their practical application, thereby restricting innovations mainly to their [NiFe]-based, albeit slower, counterparts. Recent exploration of the biodiversity of clostridial hydrogen-producing enzymes has yielded the isolation of representatives from a relatively understudied group. These enzymes possess an inherent defense mechanism against oxygen-induced damage. This discovery unveils fresh opportunities for applications such as electrode interfacing, biofuel cells, immobilization, and entrapment for enhanced stability in practical uses. Furthermore, it suggests potential combinations with cascade reactions for CO2 conversion or cofactor regeneration, like NADPH, facilitating product separation in biotechnological processes. This work provides an overview of this new class of biocatalysts, incorporating unpublished protein engineering strategies to further investigate the dynamic mechanism of oxygen protection and to address crucial details remaining elusive such as still unidentified switching hot-spots and their effects. Variants with improved kcat as well as chimeric versions with promising features to attain gain-of-function variants and applications in various biotechnological processes are also presented.

PMID:38836410 | DOI:10.1039/d4fd00010b

Categories: Literature Watch

How to find genomic regions relevant for gene regulation

Wed, 2024-06-05 06:00

Med Genet. 2021 Aug 14;33(2):157-165. doi: 10.1515/medgen-2021-2074. eCollection 2021 Jun.

ABSTRACT

Genetic variants associated with human diseases are often located outside the protein coding regions of the genome. Identification and functional characterization of the regulatory elements in the non-coding genome is therefore of crucial importance for understanding the consequences of genetic variation and the mechanisms of disease. The past decade has seen rapid progress in high-throughput analysis and mapping of chromatin accessibility, looping, structure, and occupancy by transcription factors, as well as epigenetic modifications, all of which contribute to the proper execution of regulatory functions in the non-coding genome. Here, we review the current technologies for the definition and functional validation of non-coding regulatory regions in the genome.

PMID:38836026 | PMC:PMC11007629 | DOI:10.1515/medgen-2021-2074

Categories: Literature Watch

Emerging role of a systems biology approach to elucidate factors of reduced penetrance: transcriptional changes in THAP1-linked dystonia as an example

Wed, 2024-06-05 06:00

Med Genet. 2022 Aug 12;34(2):131-141. doi: 10.1515/medgen-2022-2126. eCollection 2022 Jun.

ABSTRACT

Pathogenic variants in THAP1 can cause dystonia with a penetrance of about 50 %. The underlying mechanisms are unknown and can be considered as means of endogenous disease protection. Since THAP1 encodes a transcription factor, drivers of this variability putatively act at the transcriptome level. Several transcriptome studies tried to elucidate THAP1 function in diverse cellular and mouse models, including mutation carrier-derived cells and iPSC-derived neurons, unveiling various differentially expressed genes and affected pathways. These include nervous system development, dopamine signalling, myelination, or cell-cell adhesion. A network diffusion analysis revealed mRNA splicing, mitochondria, DNA repair, and metabolism as significant pathways that may represent potential targets for therapeutic interventions.

PMID:38835919 | PMC:PMC11006298 | DOI:10.1515/medgen-2022-2126

Categories: Literature Watch

MONET: a database for prediction of neoantigens derived from microsatellite loci

Wed, 2024-06-05 06:00

Front Immunol. 2024 May 21;15:1394593. doi: 10.3389/fimmu.2024.1394593. eCollection 2024.

ABSTRACT

BACKGROUND: Microsatellite instability (MSI) secondary to mismatch repair (MMR) deficiency is characterized by insertions and deletions (indels) in short DNA sequences across the genome. These indels can generate neoantigens, which are ideal targets for precision immune interception. However, current neoantigen databases lack information on neoantigens arising from coding microsatellites. To address this gap, we introduce The MicrOsatellite Neoantigen Discovery Tool (MONET).

METHOD: MONET identifies potential mutated tumor-specific neoantigens (neoAgs) by predicting frameshift mutations in coding microsatellite sequences of the human genome. Then MONET annotates these neoAgs with key features such as binding affinity, stability, expression, frequency, and potential pathogenicity using established algorithms, tools, and public databases. A user-friendly web interface (https://monet.mdanderson.org/) facilitates access to these predictions.

RESULTS: MONET predicts over 4 million and 15 million Class I and Class II potential frameshift neoAgs, respectively. Compared to existing databases, MONET demonstrates superior coverage (>85% vs. <25%) using a set of experimentally validated neoAgs.

CONCLUSION: MONET is a freely available, user-friendly web tool that leverages publicly available resources to identify neoAgs derived from microsatellite loci. This systems biology approach empowers researchers in the field of precision immune interception.

PMID:38835776 | PMC:PMC11148240 | DOI:10.3389/fimmu.2024.1394593

Categories: Literature Watch

Identification and structure of AIMP2-DX2 for therapeutic perspectives

Wed, 2024-06-05 06:00

BMB Rep. 2024 Jun 5:6233. Online ahead of print.

ABSTRACT

Regulation of cell fate and lung cell differentiation is associated with Aminoacyl-tRNA synthetases (ARS)-interacting multifunctional protein 2 (AIMP2), which acts as a non-enzymatic component required for the multi-tRNA synthetase complex. In response to DNA damage, a component of AIMP2 separates from the multi-tRNA synthetase complex, binds to p53, and prevents its degradation by MDM2, inducing apoptosis. Additionally, AIMP2 reduces proliferation in TGF-β and Wnt pathways, while enhancing apoptotic signaling induced by tumor necrosis factor- α. Given the crucial role of these pathways in tumorigenesis, AIMP2 is expected to function as a broad-spectrum tumor suppressor. The full-length AIMP2 transcript consists of four exons, with a small section of the pre-mRNA undergoing alternative splicing to produce a variant (AIMP2-DX2) lacking the second exon. AIMP2-DX2 binds to FBP, TRAF2, and p53 similarly to AIMP2, but competes with AIMP2 for binding to these target proteins, thereby impairing its tumor-suppressive activity. AIMP2-DX2 is specifically expressed in a diverse range of cancer cells, including breast cancer, liver cancer, bone cancer, and stomach cancer. There is growing interest in AIMP2-DX2 as a promising biomarker for prognosis and diagnosis, with AIMP2-DX2 inhibition attracting significant interest as a potentially effective therapeutic approach for the treatment of lung, ovarian, prostate, and nasopharyngeal cancers.

PMID:38835119

Categories: Literature Watch

A microbiota-directed complementary food intervention in 12-18-month-old Bangladeshi children improves linear growth

Tue, 2024-06-04 06:00

EBioMedicine. 2024 Jun 3;104:105166. doi: 10.1016/j.ebiom.2024.105166. Online ahead of print.

ABSTRACT

BACKGROUND: Globally, stunting affects ∼150 million children under five, while wasting affects nearly 50 million. Current interventions have had limited effectiveness in ameliorating long-term sequelae of undernutrition including stunting, cognitive deficits and immune dysfunction. Disrupted development of the gut microbiota has been linked to the pathogenesis of undernutrition, providing potentially new treatment approaches.

METHODS: 124 Bangladeshi children with moderate acute malnutrition (MAM) enrolled (at 12-18 months) in a previously reported 3-month RCT of a microbiota-directed complementary food (MDCF-2) were followed for two years. Weight and length were monitored by anthropometry, the abundances of bacterial strains were assessed by quantifying metagenome-assembled genomes (MAGs) in serially collected fecal samples and levels of growth-associated proteins were measured in plasma.

FINDINGS: Children who had received MDCF-2 were significantly less stunted during follow-up than those who received a standard ready-to-use supplementary food (RUSF) [linear mixed-effects model, βtreatment group x study week (95% CI) = 0.002 (0.001, 0.003); P = 0.004]. They also had elevated fecal abundances of Agathobacter faecis, Blautia massiliensis, Lachnospira and Dialister, plus increased levels of a group of 37 plasma proteins (linear model; FDR-adjusted P < 0.1), including IGF-1, neurotrophin receptor NTRK2 and multiple proteins linked to musculoskeletal and CNS development, that persisted for 6-months post-intervention.

INTERPRETATION: MDCF-2 treatment of Bangladeshi children with MAM, which produced significant improvements in wasting during intervention, also reduced stunting during follow-up. These results suggest that the effectiveness of supplementary foods for undernutrition may be improved by including ingredients that sponsor healthy microbiota-host co-development.

FUNDING: This work was supported by the BMGF (Grants OPP1134649/INV-000247).

PMID:38833839 | DOI:10.1016/j.ebiom.2024.105166

Categories: Literature Watch

GJA4 expressed on cancer associated fibroblasts (CAFs)-A 'promoter' of the mesenchymal phenotype

Tue, 2024-06-04 06:00

Transl Oncol. 2024 Jun 3;46:102009. doi: 10.1016/j.tranon.2024.102009. Online ahead of print.

ABSTRACT

BACKGROUND: Colorectal cancer (CRC) is the third most common cancer worldwide. Connexin is a transmembrane protein involved in gap junctions (GJs) formation. Our previous study found that connexin 37 (Cx37), encoded by gap junction protein alpha 4 (GJA4), expressed on fibroblasts acts as a promoter of CRC and is closely related to epithelial-mesenchymal transition (EMT) and tumor immune microenvironment. However, to date, the mechanism concerning the malignancy of GJA4 in tumor stroma has not been studied.

METHODS: Hematoxylin-eosin (HE) and immunohistochemical (IHC) staining were used to validate the expression and localization of GJA4. Using single-cell analysis, enrichment analysis, spatial transcriptomics, immunofluorescence staining (IF), Sirius red staining, wound healing and transwell assays, western blotting (WB), Cell Counting Kit-8 (CCK8) assay and in vivo experiments, we investigated the possible mechanisms of GJA4 in promoting CRC.

RESULTS: We discovered that in CRC, GJA4 on fibroblasts is involved in promoting fibroblast activation and promoting EMT through a fibroblast-dependent pathway. Furthermore, GJA4 may act synergistically with M2 macrophages to limit T cell infiltration by stimulating the formation of an immune-excluded desmoplasic barrier. Finally, we found a significantly correlation between GJA4 and pathological staging (P < 0.0001) or D2 dimer (R = 0.03, P < 0.05).

CONCLUSION: We have identified GJA4 expressed on fibroblasts is actually a promoter of the tumor mesenchymal phenotype. Our findings suggest that the interaction between GJA4+ fibroblasts and M2 macrophages may be an effective target for enhancing tumor immunotherapy.

PMID:38833783 | DOI:10.1016/j.tranon.2024.102009

Categories: Literature Watch

Author Correction: Myt1l safeguards neuronal identity by actively repressing many non-neuronal fates

Tue, 2024-06-04 06:00

Nature. 2024 Jun 4. doi: 10.1038/s41586-024-07594-8. Online ahead of print.

NO ABSTRACT

PMID:38834754 | DOI:10.1038/s41586-024-07594-8

Categories: Literature Watch

Palaeoproteomic identification of the original binder and modern contaminants in distemper paints from Uvdal stave church, Norway

Tue, 2024-06-04 06:00

Sci Rep. 2024 Jun 4;14(1):12858. doi: 10.1038/s41598-024-63455-4.

ABSTRACT

Two distemper paint samples taken from decorative boards in Uvdal stave church, Norway, were analysed using palaeoproteomics, with an aim of identifying their binder and possible contaminants. The results point at the use of calfskin to produce hide glue as the original paint binder, and are consistent with the instructions of binder production and resource allocation in the historical records of Norway. Although we did not observe any evidence of prior restoration treatments using protein-based materials, we found abundant traces of human saliva proteins, as well as a few oats and barley peptides, likely deposited together on the boards during their discovery in the 1970s. This work illustrates the need to fully consider contamination sources in palaeoproteomics and to inform those working with such objects about the potential for their contamination.

PMID:38834702 | DOI:10.1038/s41598-024-63455-4

Categories: Literature Watch

yQTL Pipeline: A structured computational workflow for large scale quantitative trait loci discovery and downstream visualization

Tue, 2024-06-04 06:00

PLoS One. 2024 Jun 4;19(6):e0298501. doi: 10.1371/journal.pone.0298501. eCollection 2024.

ABSTRACT

Quantitative trait loci (QTL) denote regions of DNA whose variation is associated with variations in quantitative traits. QTL discovery is a powerful approach to understand how changes in molecular and clinical phenotypes may be related to DNA sequence changes. However, QTL discovery analysis encompasses multiple analytical steps and the processing of multiple input files, which can be laborious, error prone, and hard to reproduce if performed manually. To facilitate and automate large-scale QTL analysis, we developed the yQTL Pipeline, where the 'y' indicates the dependent quantitative variable being modeled. Prior to the association test, the pipeline supports the calculation or the direct input of pre-defined genome-wide principal components and genetic relationship matrix when applicable. User-specified covariates can also be provided. Depending on whether familial relatedness exists among the subjects, genome-wide association tests will be performed using either a linear mixed-effect model or a linear model. The options to run an ANOVA model or testing the interaction with a covariate are also available. Using the workflow management tool Nextflow, the pipeline parallelizes the analysis steps to optimize run-time and ensure results reproducibility. In addition, a user-friendly R Shiny App is developed to facilitate result visualization. It can generate Manhattan and Miami plots of phenotype traits, genotype-phenotype boxplots, and trait-QTL connection networks. We applied the yQTL Pipeline to analyze metabolomics profiles of blood serum from the New England Centenarians Study (NECS) participants. A total of 9.1M SNPs and 1,052 metabolites across 194 participants were analyzed. Using a p-value cutoff 5e-8, we found 14,983 mQTLs associated with 312 metabolites. The built-in parallelization of our pipeline reduced the run time from ~90 min to ~26 min. Visualization using the R Shiny App revealed multiple mQTLs shared across multiple metabolites. The yQTL Pipeline is available with documentation on GitHub at https://github.com/montilab/yQTLpipeline.

PMID:38833463 | DOI:10.1371/journal.pone.0298501

Categories: Literature Watch

Diabetes and Infectious Diseases with a Focus on Melioidosis

Tue, 2024-06-04 06:00

Curr Microbiol. 2024 Jun 4;81(7):208. doi: 10.1007/s00284-024-03748-z.

ABSTRACT

Diabetes mellitus (DM) leads to impaired innate and adaptive immune responses. This renders individuals with DM highly susceptible to microbial infections such as COVID-19, tuberculosis and melioidosis. Melioidosis is a tropical disease caused by the bacterial pathogen Burkholderia pseudomallei, where diabetes is consistently reported as the most significant risk factor associated with the disease. Type-2 diabetes is observed in 39% of melioidosis patients where the risk of infection is 13-fold higher than non-diabetic individuals. B. pseudomallei is found in the environment and is an opportunistic pathogen in humans, often exhibiting severe clinical manifestations in immunocompromised patients. The pathophysiology of diabetes significantly affects the host immune responses that play a critical role in fighting the infection, such as leukocyte and neutrophil impairment, macrophage and monocyte inhibition and natural killer cell dysfunction. These defects result in delayed recruitment as well as activation of immune cells to target the invading B. pseudomallei. This provides an advantage for the pathogen to survive and adapt within the immunocompromised diabetic patients. Nevertheless, knowledge gaps on diabetes-infectious disease comorbidity, in particular, melioidosis-diabetes comorbidity, need to be filled to fully understand the dysfunctional host immune responses and adaptation of the pathogen under diabetic conditions to guide therapeutic options.

PMID:38833191 | DOI:10.1007/s00284-024-03748-z

Categories: Literature Watch

Heterogeneous distribution of kinesin-streptavidin complexes revealed by mass photometry

Tue, 2024-06-04 06:00

Soft Matter. 2024 Jun 4. doi: 10.1039/d3sm01702h. Online ahead of print.

ABSTRACT

Kinesin-streptavidin complexes are widely used in microtubule-based active-matter studies. The stoichiometry of the complexes is empirically tuned but experimentally challenging to determine. Here, mass photometry measurements reveal heterogenous distributions of kinesin-streptavidin complexes. Our binding model indicates that heterogeneity arises from both the kinesin-streptavidin mixing ratio and the kinesin-biotinylation efficiency.

PMID:38832814 | DOI:10.1039/d3sm01702h

Categories: Literature Watch

Host tracheal and intestinal microbiomes inhibit <em>Coccidioides</em> growth <em>in vitro</em>

Tue, 2024-06-04 06:00

Microbiol Spectr. 2024 Jun 4:e0297823. doi: 10.1128/spectrum.02978-23. Online ahead of print.

ABSTRACT

Coccidioidomycosis, also known as Valley fever, is a disease caused by the fungal pathogen Coccidioides. Unfortunately, patients are often misdiagnosed with bacterial pneumonia, leading to inappropriate antibiotic treatment. The soil Bacillus subtilis-like species exhibits antagonistic properties against Coccidioides in vitro; however, the antagonistic capabilities of host microbiota against Coccidioides are unexplored. We sought to examine the potential of the tracheal and intestinal microbiomes to inhibit the growth of Coccidioides in vitro. We hypothesized that an uninterrupted lawn of microbiota obtained from antibiotic-free mice would inhibit the growth of Coccidioides, while partial in vitro depletion through antibiotic disk diffusion assays would allow a niche for fungal growth. We observed that the microbiota grown on 2×GYE (GYE) and Columbia colistin and nalidixic acid with 5% sheep's blood agar inhibited the growth of Coccidioides, but microbiota grown on chocolate agar did not. Partial depletion of the microbiota through antibiotic disk diffusion revealed diminished inhibition and comparable growth of Coccidioides to controls. To characterize the bacteria grown and identify potential candidates contributing to the inhibition of Coccidioides, 16S rRNA sequencing was performed on tracheal and intestinal agar cultures and murine lung extracts. We found that the host bacteria likely responsible for this inhibition primarily included Lactobacillus and Staphylococcus. The results of this study demonstrate the potential of the host microbiota to inhibit the growth of Coccidioides in vitro and suggest that an altered microbiome through antibiotic treatment could negatively impact effective fungal clearance and allow a niche for fungal growth in vivo.

IMPORTANCE: Coccidioidomycosis is caused by a fungal pathogen that invades the host lungs, causing respiratory distress. In 2019, 20,003 cases of Valley fever were reported to the CDC. However, this number likely vastly underrepresents the true number of Valley fever cases, as many go undetected due to poor testing strategies and a lack of diagnostic models. Valley fever is also often misdiagnosed as bacterial pneumonia, resulting in 60%-80% of patients being treated with antibiotics prior to an accurate diagnosis. Misdiagnosis contributes to a growing problem of antibiotic resistance and antibiotic-induced microbiome dysbiosis; the implications for disease outcomes are currently unknown. About 5%-10% of symptomatic Valley fever patients develop chronic pulmonary disease. Valley fever causes a significant financial burden and a reduced quality of life. Little is known regarding what factors contribute to the development of chronic infections and treatments for the disease are limited.

PMID:38832766 | DOI:10.1128/spectrum.02978-23

Categories: Literature Watch

Statistical and computational methods for integrating microbiome, host genomics, and metabolomics data

Tue, 2024-06-04 06:00

Elife. 2024 Jun 4;13:e88956. doi: 10.7554/eLife.88956.

ABSTRACT

Large-scale microbiome studies are progressively utilizing multiomics designs, which include the collection of microbiome samples together with host genomics and metabolomics data. Despite the increasing number of data sources, there remains a bottleneck in understanding the relationships between different data modalities due to the limited number of statistical and computational methods for analyzing such data. Furthermore, little is known about the portability of general methods to the metagenomic setting and few specialized techniques have been developed. In this review, we summarize and implement some of the commonly used methods. We apply these methods to real data sets where shotgun metagenomic sequencing and metabolomics data are available for microbiome multiomics data integration analysis. We compare results across methods, highlight strengths and limitations of each, and discuss areas where statistical and computational innovation is needed.

PMID:38832759 | DOI:10.7554/eLife.88956

Categories: Literature Watch

Hecatomb: an integrated software platform for viral metagenomics

Tue, 2024-06-04 06:00

Gigascience. 2024 Jan 2;13:giae020. doi: 10.1093/gigascience/giae020.

ABSTRACT

BACKGROUND: Modern sequencing technologies offer extraordinary opportunities for virus discovery and virome analysis. Annotation of viral sequences from metagenomic data requires a complex series of steps to ensure accurate annotation of individual reads and assembled contigs. In addition, varying study designs will require project-specific statistical analyses.

FINDINGS: Here we introduce Hecatomb, a bioinformatic platform coordinating commonly used tasks required for virome analysis. Hecatomb means "a great sacrifice." In this setting, Hecatomb is "sacrificing" false-positive viral annotations using extensive quality control and tiered-database searches. Hecatomb processes metagenomic data obtained from both short- and long-read sequencing technologies, providing annotations to individual sequences and assembled contigs. Results are provided in commonly used data formats useful for downstream analysis. Here we demonstrate the functionality of Hecatomb through the reanalysis of a primate enteric and a novel coral reef virome.

CONCLUSION: Hecatomb provides an integrated platform to manage many commonly used steps for virome characterization, including rigorous quality control, host removal, and both read- and contig-based analysis. Each step is managed using the Snakemake workflow manager with dependency management using Conda. Hecatomb outputs several tables properly formatted for immediate use within popular data analysis and visualization tools, enabling effective data interpretation for a variety of study designs. Hecatomb is hosted on GitHub (github.com/shandley/hecatomb) and is available for installation from Bioconda and PyPI.

PMID:38832467 | DOI:10.1093/gigascience/giae020

Categories: Literature Watch

Unveiling the mysteries: Functional insights into hypothetical proteins from <em>Bacteroides fragilis</em> 638R

Tue, 2024-06-04 06:00

Heliyon. 2024 May 22;10(11):e31713. doi: 10.1016/j.heliyon.2024.e31713. eCollection 2024 Jun 15.

ABSTRACT

Humans benefit from a vast community of microorganisms in their gastrointestinal tract, known as the gut microbiota, numbering in the tens of trillions. An imbalance in the gut microbiota known as dysbiosis, can lead to changes in the metabolite profile, elevating the levels of toxins like Bacteroides fragilis toxin (BFT), colibactin, and cytolethal distending toxin. These toxins are implicated in the process of oncogenesis. However, a significant portion of the Bacteroides fragilis genome consists of functionally uncharacterized and hypothetical proteins. This study delves into the functional characterization of hypothetical proteins (HPs) encoded by the Bacteroides fragilis genome, employing a systematic in silico approach. A total of 379 HPs were subjected to a BlastP homology search against the NCBI non-redundant protein sequence database, resulting in 162 HPs devoid of identity to known proteins. CDD-Blast identified 106 HPs with functional domains, which were then annotated using Pfam, InterPro, SUPERFAMILY, SCANPROSITE, SMART, and CATH. Physicochemical properties, such as molecular weight, isoelectric point, and stability indices, were assessed for 60 HPs whose functional domains were identified by at least three of the aforementioned bioinformatic tools. Subsequently, subcellular localization analysis was examined and the gene ontology analysis revealed diverse biological processes, cellular components, and molecular functions. Remarkably, E1WPR3 was identified as a virulent and essential gene among the HPs. This study presents a comprehensive exploration of B. fragilis HPs, shedding light on their potential roles and contributing to a deeper understanding of this organism's functional landscape.

PMID:38832264 | PMC:PMC11145332 | DOI:10.1016/j.heliyon.2024.e31713

Categories: Literature Watch

PERCEPT: Replacing binary <em>p</em>-value thresholding with scaling for more nuanced identification of sample differences

Tue, 2024-06-04 06:00

iScience. 2024 May 3;27(6):109891. doi: 10.1016/j.isci.2024.109891. eCollection 2024 Jun 21.

ABSTRACT

Key to a biologists' capacity to understand data is the ability to make meaningful conclusions about differences in experimental observations. Typically, data are noisy, and conventional methods rely on replicates to average out noise and enable univariate statistical tests to assign p-values. Yet thresholding p-values to determine significance is controversial and often misleading, especially for omics datasets with few replicates. This study introduces PERCEPT, an alternative that transforms data using an ad-hoc scaling factor derived from p-values. By applying this method, low confidence effects are suppressed compared to high confidence ones, enabling clearer patterns to emerge from noisy datasets. The effectiveness of PERCEPT scaling is demonstrated using simulated datasets and published omics studies. The approach reduces the exclusion of datapoints, enhances accuracy, and enables nuanced interpretation of data. PERCEPT is easy to apply for the non-expert in statistics and provides researchers a straightforward way to improve data-driven analyses.

PMID:38832020 | PMC:PMC11145341 | DOI:10.1016/j.isci.2024.109891

Categories: Literature Watch

Pages