[BioC] degraded RNA and background correction
Jenny Drnevich
drnevich at uiuc.edu
Tue Feb 28 20:43:53 CET 2006
Hi everyone,
I have an interesting situation which involved samples with degraded RNA. I
would like to get some comments on it, and a search of the archives
indicated that an example of what degraded RNA looks like on an Affy chip
would be useful for others.
The samples are from E. coli, which has notoriously unstable RNA with
half-lifes on the order of minutes. When grown in anaerobic conditions, the
RNA becomes even more unstable. Many of the total RNA samples that our core
facility was receiving from the researchers were degraded, even though they
were supposedly fine after extraction. Due to a variety of reasons, our
core decided to label and hyb a sample that was completely degraded
according to a Bioanalyzer. We eventually were able to get non-degraded
total RNA for all of the samples. When I compare the degraded sample to
the other samples, it is indeed an outlier, but it has HIGHER pm and mm
signals than the non-degraded samples. The density plots show an
interesting bimodal distribution for all the samples, and a disturbing
trend towards the shape of the degraded sample. I say disturbing because
the samples grown in anaerobic conditions are closer to the degraded sample
than the samples grown in aerobic conditions. The pm, mm and both density
plots can be seen here (the degraded sample is the green line with the
largest right-hand peak): ftp://ftp.biotec.uiuc.edu/pub/Ecoli_figures/
Oddly, when I look at the RNA digestion plot (which may or may not show
degradation, according to what I found in the archives), all of the samples
included the degraded sample have flat slopes; only a couple had p<0.05,
and they had slightly negative slops; see above link for plot (degraded
sample in green, anaerobic samples in red; plotting was done with
transform="neither").
What also surprises me is the huge difference between gc-based background
correction of GCRMA and either of the background corrections used by RMA or
MAS5 (I've traced it to the background correction). When the gc-based
background correction is used then median polish to summarize the values
(no normalization), the degraded sample has, as expected, almost no signal,
but using either RMA or MAS5 (without normalization) results in the
degraded sample having the HIGHEST signals. And the closer a sample's raw
distribution was to the degraded sample, the more it followed the same
pattern for background correction (see the above link for boxplots; the
degraded sample is the last one on the right, #423; even numbers are
aerobic samples, odd are anaerobic samples).
Based on the behavior of the degraded sample, I would say that the gc-based
background correction is the one to use. It also appears that all of the
aerobic samples, but only 3 of the anaerobic samples (417, 419, 423B) have
relatively little degradation, and the rest of the anaerobic samples are
severely degraded. Re-doing these samples is unfortunately NOT an option at
this point. The researchers want me to go ahead with the statistical
analysis, even though I have told them that any changes will be primarily
driven by degradation, and not real expression levels. I probably should
not use any normalization method because the assumption of few changes is
not met.
Would you agree with my conclusions above, or do you have alternative
interpretations and suggestions? Is it possible that prokaryotic RNA and
its N-terminus labeling method are different enough from eukaryotic RNA and
the biotin-labeled nucleotide labeling method so that these functions
(particularly gc-based bg correction) should not be used?
Thanks in advance,
Jenny
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list