NIH Extramural Nexus News
Genome-wide identification of binding sites and gene targets of Alx1, a pivotal regulator of echinoderm skeletogenesis.
Genome-wide identification of binding sites and gene targets of Alx1, a pivotal regulator of echinoderm skeletogenesis.
Development. 2019 08 19;146(16):
Authors: Khor JM, Guerrero-Santoro J, Ettensohn CA
Abstract
Alx1 is a conserved regulator of skeletogenesis in echinoderms and evolutionary changes in Alx1 sequence and expression have played a pivotal role in modifying programs of skeletogenesis within the phylum. Alx1 regulates a large suite of effector genes that control the morphogenetic behaviors and biomineral-forming activities of skeletogenic cells. To better understand the gene regulatory control of skeletogenesis by Alx1, we used genome-wide ChIP-seq to identify Alx1-binding sites and direct gene targets. Our analysis revealed that many terminal differentiation genes receive direct transcriptional inputs from Alx1. In addition, we found that intermediate transcription factors previously shown to be downstream of Alx1 all receive direct inputs from Alx1. Thus, Alx1 appears to regulate effector genes by indirect, as well as direct, mechanisms. We tested 23 high-confidence ChIP-seq peaks using GFP reporters and identified 18 active cis-regulatory modules (CRMs); this represents a high success rate for CRM discovery. Detailed analysis of a representative CRM confirmed that a conserved, palindromic Alx1-binding site was essential for expression. Our work significantly advances our understanding of the gene regulatory circuitry that controls skeletogenesis in sea urchins and provides a framework for evolutionary studies.
PMID: 31331943 [PubMed - indexed for MEDLINE]
ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles.
ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles.
Bioinformatics. 2019 06 01;35(11):1974-1977
Authors: Silva TC, Coetzee SG, Gull N, Yao L, Hazelett DJ, Noushmehr H, Lin DC, Berman BP
Abstract
MOTIVATION: DNA methylation has been used to identify functional changes at transcriptional enhancers and other cis-regulatory modules (CRMs) in tumors and other disease tissues. Our R/Bioconductor package ELMER (Enhancer Linking by Methylation/Expression Relationships) provides a systematic approach that reconstructs altered gene regulatory networks (GRNs) by combining enhancer methylation and gene expression data derived from the same sample set.
RESULTS: We present a completely revised version 2 of ELMER that provides numerous new features including an optional web-based interface and a new Supervised Analysis mode to use pre-defined sample groupings. We show that Supervised mode significantly increases statistical power and identifies additional GRNs and associated Master Regulators, such as SOX11 and KLF5 in Basal-like breast cancer.
AVAILABILITY AND IMPLEMENTATION: ELMER v.2 is available as an R/Bioconductor package at http://bioconductor.org/packages/ELMER/.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
PMID: 30364927 [PubMed - indexed for MEDLINE]
Enhancer Evolution in Chordates: Lessons from Functional Analyses of Cephalochordate Cis-Regulatory Modules.
Enhancer Evolution in Chordates: Lessons from Functional Analyses of Cephalochordate Cis-Regulatory Modules.
Dev Growth Differ. 2020 Jun 01;:
Authors: Yasuoka Y
Abstract
Chordates comprise three major groups, cephalochordates (amphioxus), tunicates (urochordates), and vertebrates. Since cephalochordates were the early branching group, comparisons between amphioxus and other chordates help us to speculate about ancestral chordates. Here I summarize accumulating data from functional studies analyzing amphioxus cis-regulatory modules (CRMs) in model systems of other chordate groups, such as mice, chickens, clawed frogs, fish, and ascidians. Conservatism and variability of CRM functions illustrate how gene regulatory networks have evolved in chordates. Amphioxus CRMs, which correspond to CRMs deeply conserved among animal phyla, govern reporter gene expression in conserved expression domains of the putative target gene in host animals. In addition, some CRMs located in similar genomic regions (intron, upstream, or downstream) also possess conserved activity, even though their sequences are divergent. These conservative CRM functions imply ancestral genomic structures and gene regulatory networks in chordates. However, interestingly, if expression patterns of amphioxus genes do not correspond to those of orthologs of experimental models, some amphioxus CRMs recapitulate expression patterns of amphioxus genes, but not those of endogenous genes, suggesting that these amphioxus CRMs are close to the ancestral states of chordate CRMs, while vertebrates/tunicates innovated new CRMs to reconstruct gene regulatory networks subsequent to the divergence of the cephalochordates. Alternatively, amphioxus CRMs may have secondarily lost ancestral CRM activity and evolved independently. These data help to solve fundamental questions of chordate evolution, such as neural crest cells, placodes, a forebrain / midbrain, and genome duplication. Experimental validation is crucial to verify CRM functions and evolution.
PMID: 32479656 [PubMed - as supplied by publisher]
Twist-dependent ratchet functioning downstream from Dorsal revealed using a light-inducible degron.
Twist-dependent ratchet functioning downstream from Dorsal revealed using a light-inducible degron.
Genes Dev. 2020 May 28;:
Authors: Irizarry J, McGehee J, Kim G, Stein D, Stathopoulos A
Abstract
Graded transcription factors are pivotal regulators of embryonic patterning, but whether their role changes over time is unclear. A light-regulated protein degradation system was used to assay temporal dependence of the transcription factor Dorsal in dorsal-ventral axis patterning of Drosophila embryos. Surprisingly, the high-threshold target gene snail only requires Dorsal input early but not late when Dorsal levels peak. Instead, late snail expression can be supported by action of the Twist transcription factor, specifically, through one enhancer, sna.distal This study demonstrates that continuous input is not required for some Dorsal targets and downstream responses, such as twist, function as molecular ratchets.
PMID: 32467225 [PubMed - as supplied by publisher]
Early Xenopus gene regulatory programs, chromatin states, and the role of maternal transcription factors.
Early Xenopus gene regulatory programs, chromatin states, and the role of maternal transcription factors.
Curr Top Dev Biol. 2020;139:35-60
Authors: Paraiso KD, Cho JS, Yong J, Cho KWY
Abstract
For decades, the early development of the Xenopus embryo has been an essential model system to study the gene regulatory mechanisms that govern cellular specification. At the top of the hierarchy of gene regulatory networks, maternally deposited transcription factors initiate this process and regulate the expression of zygotic genes that give rise to three distinctive germ layer cell types (ectoderm, mesoderm, and endoderm), and subsequent generation of organ precursors. The onset of germ layer specification is also closely coupled with changes associated with chromatin modifications. This review will examine the timing of maternal transcription factors initiating the zygotic genome activation, the epigenetic landscape of embryonic chromatin, and the network structure that governs the process.
PMID: 32450966 [PubMed - in process]
The notochord gene regulatory network in chordate evolution: Conservation and divergence from Ciona to vertebrates.
The notochord gene regulatory network in chordate evolution: Conservation and divergence from Ciona to vertebrates.
Curr Top Dev Biol. 2020;139:325-374
Authors: Di Gregorio A
Abstract
The notochord is a structure required for support and patterning of all chordate embryos, from sea squirts to humans. An increasing amount of information on notochord development and on the molecular strategies that ensure its proper morphogenesis has been gleaned through studies in the sea squirt Ciona. This invertebrate chordate offers a fortunate combination of experimental advantages, ranging from translucent, fast-developing embryos to a compact genome and impressive biomolecular resources. These assets have enabled the rapid identification of numerous notochord genes and cis-regulatory regions, and provide a rather unique opportunity to reconstruct the gene regulatory network that controls the formation of this developmental and evolutionary chordate landmark. This chapter summarizes the morphogenetic milestones that punctuate notochord formation in Ciona, their molecular effectors, and the current knowledge of the gene regulatory network that ensures the accurate spatial and temporal orchestration of these processes.
PMID: 32450965 [PubMed - in process]
fcScan: a versatile tool to cluster combinations of sites using genomic coordinates.
fcScan: a versatile tool to cluster combinations of sites using genomic coordinates.
BMC Bioinformatics. 2020 May 19;21(1):194
Authors: El-Kurdi A, Khalil GA, Khazen G, Khoueiry P
Abstract
BACKGROUND: Finding combinations of homotypic or heterotypic genomic sites obeying a specific grammar in DNA sequences is a frequent task in bioinformatics. A typical case corresponds to the identification of cis-regulatory modules characterized by a combination of transcription factor binding sites in a defined window size. Although previous studies identified clusters of genomic sites in species with varying genome sizes, the availability of a dedicated and versatile tool to search for such clusters is lacking.
RESULTS: We present fcScan, an R/Bioconductor package to search for clusters of genomic sites based on user defined criteria including cluster size, inter-cluster distances and sites order and orientation allowing users to adapt their search criteria to specific biological questions. It supports GRanges, data frame and VCF/BED files as input and returns data in GRanges format. By performing clustering on vectorized data, fcScan is adapted to search for genomic clusters in millions of sites as input in short time and is thus ideal to scan data generated by high throughput methods including next generation sequencing.
CONCLUSIONS: fcScan is ideal for detecting cis-regulatory modules of transcription factor binding sites with a specific grammar as well as genomic loci enriched for mutations. The flexibility in input parameters allows users to perform searches targeting specific research questions. It is released under Artistic-2.0 License. The source code is freely available through Bioconductor (https://bioconductor.org/packages/fcScan) and GitHub (https://github.com/pkhoueiry/fcScan).
PMID: 32429868 [PubMed - in process]
BICORN: An R package for integrative inference of de novo cis-regulatory modules.
BICORN: An R package for integrative inference of de novo cis-regulatory modules.
Sci Rep. 2020 May 14;10(1):7960
Authors: Chen X, Gu J, Neuwald AF, Hilakivi-Clarke L, Clarke R, Xuan J
Abstract
Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites based on inferred cis-regulatory modules (CRMs). CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.
PMID: 32409786 [PubMed - in process]
Zygotic pioneer factor activity of Odd-paired/Zic is necessary for late function of the Drosophila segmentation network.
Zygotic pioneer factor activity of Odd-paired/Zic is necessary for late function of the Drosophila segmentation network.
Elife. 2020 Apr 29;9:
Authors: Soluri IV, Zumerling LM, Payan Parra OA, Clark EG, Blythe SA
Abstract
Because chromatin determines whether information encoded in DNA is accessible to transcription factors, dynamic chromatin states in development may constrain how gene regulatory networks impart embryonic pattern. To determine the interplay between chromatin states and regulatory network function, we performed ATAC-seq on Drosophila embryos during the establishment of the segmentation network, comparing wild-type and mutant embryos in which all graded maternal patterning inputs are eliminated. While during the period between zygotic genome activation and gastrulation many regions maintain stable accessibility, cis-regulatory modules (CRMs) within the network undergo extensive patterning-dependent changes in accessibility. A component of the network, Odd-paired (opa), is necessary for pioneering accessibility of late segmentation network CRMs. opa-driven changes in accessibility are accompanied by equivalent changes in gene expression. Interfering with the timing of opa activity impacts the proper patterning of expression. These results indicate that dynamic systems for chromatin regulation directly impact the reading of embryonic patterning information.
PMID: 32347792 [PubMed - as supplied by publisher]
Bone morphogenetic protein signaling regulates Id1 mediated neural stem cell quiescence in the adult zebrafish brain via a phylogenetically conserved enhancer module.
Bone morphogenetic protein signaling regulates Id1 mediated neural stem cell quiescence in the adult zebrafish brain via a phylogenetically conserved enhancer module.
Stem Cells. 2020 Apr 03;:
Authors: Zhang G, Ferg M, Lübke L, Takamiya M, Beil T, Gourain V, Diotel N, Strähle U, Rastegar S
Abstract
In the telencephalon of adult zebrafish, the inhibitor of DNA binding 1 (id1) gene is expressed in radial glial cells (RGCs), behaving as neural stem cells (NSCs), during constitutive and regenerative neurogenesis. Id1 controls the balance between resting and proliferating states of RGCs by promoting quiescence. Here, we identified a phylogenetically conserved cis-regulatory module (CRM) mediating the specific expression of id1 in RGCs. Systematic deletion mapping and mutation of conserved transcription factor binding sites in stable transgenic zebrafish lines reveal that this CRM operates via conserved smad1/5 and 4 binding motifs (SBMs) under both homeostatic and regenerative conditions. Transcriptome analysis of injured and uninjured telencephala as well as pharmacological inhibition experiments identify a crucial role of bone morphogenetic protein (BMP) signaling for the function of the CRM. Our data highlight that BMP signals control id1 expression and thus NSC proliferation during constitutive and induced neurogenesis. © AlphaMed Press 2020 SIGNIFICANCE STATEMENT: In the adult brain, to maintain a continuous supply of new neurons and to avoid the exhaustion of neural stem cells pool, a tight control between quiescence and proliferation is crucial. id1 gene controls the balance between dividing and resting neural stem cells by promoting quiescence. We identified a regulatory sequence of id1, which mediates the input from the BMP signaling into the adult neural stem cells. This regulatory sequence has a high potential to serve as an interface, which will permit to alter the balance between proliferation and maintenance of stem cells in experimental as well as medical applications.
PMID: 32246536 [PubMed - as supplied by publisher]
[Endogenous retroviruses: friend or foe of the immune system?]
[Endogenous retroviruses: friend or foe of the immune system?]
Med Sci (Paris). 2020 Mar;36(3):253-260
Authors: Adoue V, Joffre O
Abstract
Upon priming by dendritic cells, naïve CD4 T lymphocytes are exposed to distinct molecular environments depending on the nature of the pathological stimulus. In response, they mobilize different gene networks that establish lineage-specific developmental programs, and coordinate the acquisition of specific phenotype and functions. Accordingly, CD4 T cells are capable of differentiation into a large variety of functionally-distinct T helper (Th) cell subsets. In this review, we describe the molecular events that control CD4 T cell differentiation at the level of the chromatin. We insist on recent works that have highlighted the key role of H3K9me3-dependent epigenetic mechanisms in the regulation of T cell identity. Interestingly, these pathways shape and control the developmental programs at least in part through the regulation of endogenous retroviruses-derived sequences that have been exapted into cis-regulatory modules of Th genes.
PMID: 32228844 [PubMed - in process]
Functional effects of variation in transcription factor binding highlight long-range gene regulation by epromoters.
Functional effects of variation in transcription factor binding highlight long-range gene regulation by epromoters.
Nucleic Acids Res. 2020 Feb 29;:
Authors: Mitchelmore J, Grinberg NF, Wallace C, Spivakov M
Abstract
Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritizing such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal >1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localize to the promoter regions of other genes, supporting the notion of 'epromoters': dual-action CRMs with promoter and distal enhancer activity.
PMID: 32112106 [PubMed - as supplied by publisher]
Establishment of chromatin accessibility by the conserved transcription factor Grainy head is developmentally regulated.
Establishment of chromatin accessibility by the conserved transcription factor Grainy head is developmentally regulated.
Development. 2020 Feb 25;:
Authors: Nevil M, Gibson TJ, Bartolutti C, Iyengar A, Harrison MM
Abstract
The dramatic changes in gene expression required for development necessitate the establishment of cis-regulatory modules defined by regions of accessible chromatin. Pioneer transcription factors have the unique property of binding closed chromatin and facilitating the establishment of these accessible regions. Nonetheless, much of how pioneer transcription factors coordinate changes in chromatin accessibility during development remains unknown. To determine whether pioneer-factor function is intrinsic to the protein or whether pioneering activity is developmentally modulated, we studied the highly conserved, essential transcription factor, Grainy head (Grh). Prior work established that Grh is expressed throughout Drosophila development and is a pioneer factor in the larva. We demonstrated that Grh remains bound to mitotic chromosomes, a property shared with other pioneer factors. By assaying chromatin accessibility in embryos lacking maternal and/or zygotic Grh at three stages of development, we discovered that Grh is not required for chromatin accessibility in early embryogenesis, in contrast to its essential functions later in development. Our data reveal that the pioneering activity of Grh is temporally regulated and likely influenced by additional factors expressed at a given developmental stage.
PMID: 32098765 [PubMed - as supplied by publisher]
Candidate Cancer Driver Mutations in Distal Regulatory Elements and Long-Range Chromatin Interaction Networks.
Candidate Cancer Driver Mutations in Distal Regulatory Elements and Long-Range Chromatin Interaction Networks.
Mol Cell. 2020 Jan 17;:
Authors: Zhu H, Uusküla-Reimand L, Isaev K, Wadi L, Alizada A, Shuai S, Huang V, Aduluso-Nwaobasi D, Paczkowska M, Abd-Rabbo D, Ocsenas O, Liang M, Thompson JD, Li Y, Ruan L, Krassowski M, Dzneladze I, Simpson JT, Lupien M, Stein LD, Boutros PC, Wilson MD, Reimand J
Abstract
A comprehensive catalog of cancer driver mutations is essential for understanding tumorigenesis and developing therapies. Exome-sequencing studies have mapped many protein-coding drivers, yet few non-coding drivers are known because genome-wide discovery is challenging. We developed a driver discovery method, ActiveDriverWGS, and analyzed 120,788 cis-regulatory modules (CRMs) across 1,844 whole tumor genomes from the ICGC-TCGA PCAWG project. We found 30 CRMs with enriched SNVs and indels (FDR < 0.05). These frequently mutated regulatory elements (FMREs) were ubiquitously active in human tissues, showed long-range chromatin interactions and mRNA abundance associations with target genes, and were enriched in motif-rewiring mutations and structural variants. Genomic deletion of one FMRE in human cells caused proliferative deficiencies and transcriptional deregulation of cancer genes CCNB1IP1, CDH1, and CDKN2B, validating observations in FMRE-mutated tumors. Pathway analysis revealed further sub-significant FMREs at cancer genes and processes, indicating an unexplored landscape of infrequent driver mutations in the non-coding genome.
PMID: 31954095 [PubMed - as supplied by publisher]
Combinatorial interactions of the LEC1 transcription factor specify diverse developmental programs during soybean seed development.
Combinatorial interactions of the LEC1 transcription factor specify diverse developmental programs during soybean seed development.
Proc Natl Acad Sci U S A. 2019 Dec 31;:
Authors: Jo L, Pelletier JM, Hsu SW, Baden R, Goldberg RB, Harada JJ
Abstract
The LEAFY COTYLEDON1 (LEC1) transcription factor is a central regulator of seed development, because it controls diverse biological programs during seed development, such as embryo morphogenesis, photosynthesis, and seed maturation. To understand how LEC1 regulates different gene sets during development, we explored the possibility that LEC1 acts in combination with other transcription factors. We identified and compared genes that are directly transcriptionally regulated by ABA-RESPONSIVE ELEMENT BINDING PROTEIN3 (AREB3), BASIC LEUCINE ZIPPER67 (bZIP67), and ABA INSENSITIVE3 (ABI3) with those regulated by LEC1. We showed that LEC1 operates with specific sets of transcription factors to regulate different gene sets and, therefore, distinct developmental processes. Thus, LEC1 controls diverse processes through its combinatorial interactions with other transcription factors. DNA binding sites for the transcription factors are closely clustered in genomic regions upstream of target genes, defining cis-regulatory modules that are enriched for DNA sequence motifs that resemble sequences known to be bound by these transcription factors. Moreover, cis-regulatory modules for genes regulated by distinct transcription factor combinations are enriched for different sets of DNA motifs. Expression assays with embryo cells indicate that the enriched DNA motifs are functional cis elements that regulate transcription. Together, the results suggest that combinatorial interactions between LEC1 and other transcription factors are mediated by cis-regulatory modules containing clustered cis elements and by physical interactions that are documented to occur between the transcription factors.
PMID: 31892538 [PubMed - as supplied by publisher]
RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding.
RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding.
Comput Struct Biotechnol J. 2019;17:1415-1428
Authors: Santana-Garcia W, Rocha-Acevedo M, Ramirez-Navarro L, Mbouamboua Y, Thieffry D, Thomas-Chollier M, Contreras-Moreira B, van Helden J, Medina-Rivera A
Abstract
Gene regulatory regions contain short and degenerated DNA binding sites recognized by transcription factors (TFBS). When TFBS harbor SNPs, the DNA binding site may be affected, thereby altering the transcriptional regulation of the target genes. Such regulatory SNPs have been implicated as causal variants in Genome-Wide Association Study (GWAS) studies. In this study, we describe improved versions of the programs Variation-tools designed to predict regulatory variants, and present four case studies to illustrate their usage and applications. In brief, Variation-tools facilitate i) obtaining variation information, ii) interconversion of variation file formats, iii) retrieval of sequences surrounding variants, and iv) calculating the change on predicted transcription factor affinity scores between alleles, using motif scanning approaches. Notably, the tools support the analysis of haplotypes. The tools are included within the well-maintained suite Regulatory Sequence Analysis Tools (RSAT, http://rsat.eu), and accessible through a web interface that currently enables analysis of five metazoa and ten plant genomes. Variation-tools can also be used in command-line with any locally-installed Ensembl genome. Users can input personal collections of variants and motifs, providing flexibility in the analysis.
PMID: 31871587 [PubMed]
Exploring a Drosophila Transcription Factor Interaction Network to Identify Cis-Regulatory Modules.
Exploring a Drosophila Transcription Factor Interaction Network to Identify Cis-Regulatory Modules.
J Comput Biol. 2019 Dec 20;:
Authors: Mahmud AKMF, Yang D, Stenberg P, Ioshikhes I, Nandi S
Abstract
Multiple transcription factors (TFs) bind to specific sites in the genome and interact among themselves to form the cis-regulatory modules (CRMs). They are essential in modulating the expression of genes, and it is important to study this interplay to understand gene regulation. In the present study, we integrated experimentally identified TF binding sites collected from published studies with computationally predicted TF binding sites to identify Drosophila CRMs. Along with the detection of the previously known CRMs, this approach identified novel protein combinations. We determined high-occupancy target sites, where a large number of TFs bind. Investigating these sites revealed that Giant, Dichaete, and Knirp are highly enriched in these locations. A common TAG team motif was observed at these sites, which might play a role in recruiting other TFs. While comparing the binding sites at distal and proximal promoters, we found that certain regulatory TFs, such as Zelda, were highly enriched in enhancers. Our study has shown that, from the information available concerning the TF binding sites, the real CRMs could be predicted accurately and efficiently. Although we only may claim co-occurrence of these proteins in this study, it may actually point to their interaction (as known interaction proteins typically co-occur together). Such an integrative approach can, therefore, help us to provide a better understanding of the interplay among the factors, even though further experimental verification is required.
PMID: 31855461 [PubMed - as supplied by publisher]
What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants.
What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants.
Front Genet. 2019;10:1078
Authors: Penzar DD, Zinkevich AO, Vorontsov IE, Sitnik VV, Favorov AV, Makeev VJ, Kulakovskiy IV
Abstract
Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
PMID: 31737053 [PubMed]
ANISEED 2019: 4D exploration of genetic data for an extended range of tunicates.
ANISEED 2019: 4D exploration of genetic data for an extended range of tunicates.
Nucleic Acids Res. 2019 Nov 04;:
Authors: Dardaillon J, Dauga D, Simion P, Faure E, Onuma TA, DeBiasse MB, Louis A, Nitta KR, Naville M, Besnardeau L, Reeves W, Wang K, Fagotto M, Guéroult-Bellone M, Fujiwara S, Dumollard R, Veeman M, Volff JN, Roest Crollius H, Douzery E, Ryan JF, Davidson B, Nishida H, Dantec C, Lemaire P
Abstract
ANISEED (https://www.aniseed.cnrs.fr) is the main model organism database for the worldwide community of scientists working on tunicates, the vertebrate sister-group. Information provided for each species includes functionally-annotated gene and transcript models with orthology relationships within tunicates, and with echinoderms, cephalochordates and vertebrates. Beyond genes the system describes other genetic elements, including repeated elements and cis-regulatory modules. Gene expression profiles for several thousand genes are formalized in both wild-type and experimentally-manipulated conditions, using formal anatomical ontologies. These data can be explored through three complementary types of browsers, each offering a different view-point. A developmental browser summarizes the information in a gene- or territory-centric manner. Advanced genomic browsers integrate the genetic features surrounding genes or gene sets within a species. A Genomicus synteny browser explores the conservation of local gene order across deuterostome. This new release covers an extended taxonomic range of 14 species, including for the first time a non-ascidian species, the appendicularian Oikopleura dioica. Functional annotations, provided for each species, were enhanced through a combination of manual curation of gene models and the development of an improved orthology detection pipeline. Finally, gene expression profiles and anatomical territories can be explored in 4D online through the newly developed Morphonet morphogenetic browser.
PMID: 31680137 [PubMed - as supplied by publisher]
Low affinity binding sites in an activating CRM mediate negative autoregulation of the Drosophila Hox gene Ultrabithorax.
Low affinity binding sites in an activating CRM mediate negative autoregulation of the Drosophila Hox gene Ultrabithorax.
PLoS Genet. 2019 Oct 07;15(10):e1008444
Authors: Delker RK, Ranade V, Loker R, Voutev R, Mann RS
Abstract
Specification of cell identity and the proper functioning of a mature cell depend on precise regulation of gene expression. Both binary ON/OFF regulation of transcription, as well as more fine-tuned control of transcription levels in the ON state, are required to define cell types. The Drosophila melanogaster Hox gene, Ultrabithorax (Ubx), exhibits both of these modes of control during development. While ON/OFF regulation is needed to specify the fate of the developing wing (Ubx OFF) and haltere (Ubx ON), the levels of Ubx within the haltere differ between compartments along the proximal-distal axis. Here, we identify and molecularly dissect the novel contribution of a previously identified Ubx cis-regulatory module (CRM), anterobithorax (abx), to a negative auto-regulatory loop that decreases Ubx expression in the proximal compartment of the haltere as compared to the distal compartment. We find that Ubx, in complex with the known Hox cofactors, Homothorax (Hth) and Extradenticle (Exd), acts through low-affinity Ubx-Exd binding sites to reduce the levels of Ubx transcription in the proximal compartment. Importantly, we also reveal that Ubx-Exd-binding site mutations sufficient to result in de-repression of abx activity in a transgenic context are not sufficient to de-repress Ubx expression when mutated at the endogenous locus, suggesting the presence of multiple mechanisms through which Ubx-mediated repression occurs. Our results underscore the complementary nature of CRM analysis through transgenic reporter assays and genome modification of the endogenous locus; but, they also highlight the increasing need to understand gene regulation within the native context to capture the potential input of multiple genomic elements on gene control.
PMID: 31589607 [PubMed - as supplied by publisher]