Skip to main content



Myc – Everything but not ANYthing

When we look at data, do we really know the best way to analyze the data in order to obtain meaningful conclusions about the biology?

A classical method to investigate the impact of transcription factors is via an analysis of total transcriptional output, made possible on a global scale, by our knowledge of the genomic sequence. The great advantage of this type of experiment is that it is unbiased. In analyzing the totality of gene transcription, every transcript has the opportunity to shine, no transcript is left behind. The great challenge of this technique is the enormity of the data generated in analyzing a genome’s-worth of transcription.

How do we “read” the data in order to understand the biological reality that is represented by the data?

Typically what happens is that the data is “normalized”. So, if we wish to know which genes are responding to a particular transcription factor, two sets of transcripts are collected from cells, one set in the presence of the transcription factor and one set in the absence of the transcription factor. Each set is analyzed separately for the level of all transcripts present. The data is then normalized according to genes whose expression is expected to be constant across samples, so called “house-keeping” genes. Any transcript showing a greater or lower pattern in the normalized data is then much easier to spot.

This “normalization” approach works particularly well when looking for the presence of infectious disease agents. If we take an uninfected cell, and analyze the transcripts for the presence of cellular genes and viral genes, only the cellular genes are being expressed (green dots) while the viral genes are not present and not expressed (empty dots).

 

Looking at the same cell in the virally infected state, both the cellular and viral genes are being expressed (red dots).

To compare the two data sets, the data is normalized. This involves setting the level of cellular gene expression to an arbitrary 100% and then overlapping the data sets. Now the cellular genes are represented by orange dots (overlap of red and green) and the viral gene expression is obvious.

For these types of situations, this normalization approach works wonderfully, allowing the good signal/noise ratio over the background data. A superb exponent of this approach is Joe DeRisi of UCSF.

 

However there IS a problem in applying this type of normalization approach to other situations. Take the situation where there are changes in the transcript levels across many different genes. This is represented by the diagram below comparing the levels of transcripts from genes A-H in two different cells where one cell in general expresses all transcripts at a higher level.

If we were to take the same “normalization” approach to compare these two sets of data as illustrated above, the result would look like this:

 

In this situation, the assumption was made that the level of transcripts from genes A and B is equivalent between the two cells, normalization to these genes as 100% then comparing the rest of the data suggests that some of the other genes have a reduction in transcript level and others an increase. Representing the results of this analysis as a fold change plot, which is a common way to represent the data, gives the (erroneous) impression of many effects in transcript levels, both positively and negatively when in fact we know that the real difference is that there is a global increase in transcription which is a TOTALLY DIFFERENT biological reality.

So in this situation we need a new “normal”, a different method of data processing. This is accomplished by addition of a known quantity of RNA into each sample, a “spike-in” control which serves as an exogenous reference point for data normalization. The results are normalized to the exogenous controls, allowing a more accurate representation of the biological reality.

 

 

To quote: “Complex techniques require complex controls” (TIBS Vol.28 No.5 May 2003). Reading the data correctly allows us to get closer to decoding biological reality.

This approach (let’s call it the “new normal” approach), was used by the group of Rick Young in a ground-breaking study published recently on the myc oncogene.

Searching PubMed for “myc” in the title of an article brings up ~8000 entries. An enormous body of scientific and medical investigation. We know it is a potent human oncogene, implicated in many cancers and that its major role is to act as a transcription factor. But there is a huge number of reports in the literature showing that myc different transcripts differently in different cell types, a confusing morass of data with no clear inference.

Using the “spike-in” controls, this landmark study, together with another study published in the same journal issue, showed that c-Myc can amplify the gene expression program of cells, producing two to three times more total RNA and generating larger cells. Not only is this an important discovery, but it puts into context much of the confusing data surrounding the transcriptional impact of myc which is to say that we can all agree that c-myc DOES do everything, but not ANYTHING (see Notes).

 

Notes:

1. Diagrams used in this post are directly adapted from the article Revisiting Global Gene Expression Analysis (2012) Cell Vol. 151 pages 476-482.

2. “c-myc DOES do everything, but not ANYTHING”. This phrase summarizes the recent advances in understanding of myc function, but is also vague. There is still much to discover about the molecular mechanism of c-myc and other myc family members, their precise role(s) in altering the the transcriptional landscape and additional functions such as the role of cytoplasmic myc-nick. These studies can have a new focus following the discovery of Young and colleagues (chief credit goes to the study’s first authors, Peter B. Rahl, Charles Lin and Jakob Lovén), and are critical if we are to translate our new knowledge of biological reality into clinical application.

3. The previous work of Peter Rahl in Rick Young’s lab looking at the role of c-myc in embryonic stem cells was a key building block providing  novel insights that directly led to the new assessment of myc function in cancer. This article, and the review article from cancer biologist Gerard Evan that puts the discoveries in broader context, are recommended reading.

4. The title of this post was inspired by the work of my distinguished Cornell colleague Malcolm Bilson. If you are looking for an awesome holiday gift, you could do no better than purchase this DVD boxed set, probably the most value package of scholarship available on DVD anywhere, inspiring and stimulating for all levels (even my 8 year old was fascinated).

Comments

Leave a Reply

You must be logged in to post a comment.