In this illustration, a transcription factor (blue) contacts DNA (green) and recruits RNA polymerase (gray) for gene transcription. The start of the gene is shown with a flash of light.
Credit: Eva Nogales, Berkeley Lab
Transcription factors are proteins that control gene expression—the degree to which specific genes are turned “on” or “off”—by binding to nearby DNA. Each transcription factor recognizes and binds to a specific sequence in the DNA alphabet (A, C, G, and T) known as a consensus site.
Although scientists have developed experimental techniques to identify consensus sites, transcription factors often bind to only a fraction of the consensus sites found in the genome. For example, the yeast transcription factor Gcn4 is predicted to bind to the sequence TGACTCA. This sequence occurs at 1,078 sites in the yeast genome, yet Gcn4 binds to less than a third of them.
One potential explanation is that many consensus sites are blocked by nucleosomes—sections of DNA wrapped around proteins. Another possibility is that consensus sites are imperfectly defined, such that only some of the predicted sites are true sites. To investigate this question, the Clark Lab designed a new method to determine preferred binding sites for a DNA-binding protein and applied it to Gcn4.
Their analysis confirmed that Gcn4 binds to the predicted TGACTCA sequence. However, they also identified a more precise sequence: RTGACTCAY, in which R represents G or A and Y represents C or T. The RTGACTCAY sequence occurs at only 166 sites in the yeast genome, and Gcn4 binds to all 166, independently of whether they occur in nucleosome regions. To a lesser extent, Gcn4 also binds to other TGACTCA sequence variations, such as RTGACTCAR and YTGACTCAY.
The authors propose that transcription factor binding sites should be defined more precisely to allow greater insight into gene regulation.
Learn more about the Genomics and Basic Mechanisms of Growth and Development Group: https://www.nichd.nih.gov/about/org/dir/affinity-groups/GBMGD