[BioC] Inherits(x,"data.frame") error in SamR

McGee, Monnie mmcgee at mail.smu.edu
Thu Feb 16 18:03:10 CET 2006


Dear Group,

 I trying to use samr.  I have read a previous post about the ease of use of siggenes vs. samr.  It is so true!   I used siggenes originally, but that doesn't help me with the problem I am having.  I still need to use samr because I want to assess sample size using sam.assess.samplesize.  To assess sample size using sam, I need to supply sam.assess.samplesize with a "data" vector.  I can't understand how to form this vector - perhaps a manual would help, but the manual is not on the SAM website as the R-help files claim.    I am using a PowerMac G5 with R Version 2.2.1  (2005-12-20 r36812) installed.

I would like to use samr to assess the sample size requirements for an experiment I am planning.   I have some training data, which is the drosophila spike-in experiment data given in Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biology, 6, R16.

Here is what I have done:
gsbatch = ReadAffy() # the experiment consists of 3 technical replicates from "control" chips
# and 3 technical replicates from Spike-in chips on th DrosGenome1 chip
> gs.rma = rma(gsbatch) # get expression values
## get the exprSet into a format that samr can manage:
> gs.rma.fr = as.data.frame.exprSet(gs.rma)
> gs.mat = matrix(gs.rma.fr$exprs,nrow=14010,ncol=6)
> gs.mat.con = gs.mat[,1:3]
> gs.mat.si = gs.mat[,4:6]
> gs.mat.sam = rbind(gs.mat.con,gs.mat.si)  ## this is a matrix with dim 28020 by 3, control arrays on top, spike-ins on bottom
## grouping vector
> y = c(rep(1,14010),rep(2,14010))
> geneid = as.character(1:nrow(gs.mat.sam))
> genenames = gs.rma.fr$genenames[1:14010]
> data = list(x=gs.mat.sam, y =y , geneid = geneid, genenames = rep(genenames,2),logged2=TRUE)
> samr(data,resp.type="Two class unpaired",nperms=20)
Error in inherits(x, "data.frame") : (subscript) logical subscript too long
I also tried deleting the geneid & genenames vectors from the "data" list, but still received the same error.

I can't figure this out.  I am sure the problem is in the way that I defined the "data" list, but, without a manual, I really don't understand what I did wrong.

Thank you for your help,
Monnie

Monnie McGee, Ph.D.
Assistant Professor
Department of Statistical Science
Southern Methodist University
Ph: 214-768-2462
Fax: 214-768-4035



-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of bioconductor-request at stat.math.ethz.ch
Sent: Thu 2/16/2006 5:00 AM
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 36, Issue 14
 
Send Bioconductor mailing list submissions to
	bioconductor at stat.math.ethz.ch

To subscribe or unsubscribe via the World Wide Web, visit
	https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
	bioconductor-request at stat.math.ethz.ch

You can reach the person managing the list at
	bioconductor-owner at stat.math.ethz.ch

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."


Today's Topics:

   1. Affy Package - MAplot Function help needed... (Joern Wessels)
   2. Re: Affy Package - MAplot Function help needed...
      (James W. MacDonald)
   3. Re: RMA normalization when using subsets of samples (Ron Ophir)
   4.  LPE (Nicolas Servant)
   5. Re: Timeseries loop design analysis using Limma or Maanova? (Pete)
   6. Re: Timeseries loop design analysis using Limma or Maanova?
      (James W. MacDonald)
   7. interpretation of vsn normalized data (Maurice Melancon)
   8. interpretation of vsn normalized data (Maurice Melancon)
   9. Fold Change values after RMA (kfbargad at ehu.es)
  10. Re: interpretation of vsn normalized data (Wolfgang Huber)
  11. Re: Fold Change values after RMA (James W. MacDonald)
  12.  ANN: BioC2006 Conference Scheduled for August in Seattle
      (Nianhua Li)
  13. Run GOHyperG without specifying a chip
      (knaxerov at ix.urz.uni-heidelberg.de)
  14.  Data Frame Error in Affycomp (John.Brozek at it-omics.com)
  15. Re: Fold Change values after RMA (Ben Bolstad)
  16. Differences: mas5/mas5calls vs. call.expr/pairwise.comparison
      (Benjamin Otto)
  17. comparing correlation coefficients (fwd) (Ilhem Diboun)
  18. Re: Differences: mas5/mas5calls vs.
      call.expr/pairwise.comparison (Benjamin Otto)
  19. GeneR (Antoine Lucas)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Feb 2006 13:09:19 +0100
From: "Joern Wessels" <wessels at staff.uni-marburg.de>
Subject: [BioC] Affy Package - MAplot Function help needed...
To: <bioconductor at stat.math.ethz.ch>
Message-ID: <000101c63228$a93548c0$b8acf889 at pc17368>
Content-Type: text/plain;	charset="iso-8859-1"

Hi everybody,
this is my very first post to this mailing list :-) I have got a problem I
could not solve via manual or google:
When I use the Maplot function of the affy package on more than three array
data sets, the text in the fields showing Median and IQR is to big to be
shown correctly.
I tried to use the "cex.main, cex.lab, cex.axis, cex.sub but aside from
changing axis text I could not change anything. Using ?MAplot did not
produce any useable help (for me as a beginner at least).

I used the following line to get my graphs:

MAplot(Data, pairs = TRUE)

I sombody could help me with that I would be one happy R rookie user.

Thanks,
J?rn
 
 
________________
J?rn We?els
Diplom-Biologe
 
Philipps-Universit?t Marburg
Fachbereich Biologie - Tierphysiologie
Karl-von-Frisch-Str. 8
35032 Marburg an der Lahn
 
Tel.: +49 (0) 6421 28 23547
Fax: +49 (0) 6421 28 28937
Mobil: +49 (0) 170 9346198
E-Mail: wessels at staff.uni-marburg.de
Webseite:
http://cgi-host.uni-marburg.de/~omtierph/stoff/member.php?mem=wessels&lang=d
e



------------------------------

Message: 2
Date: Wed, 15 Feb 2006 09:36:39 -0500
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] Affy Package - MAplot Function help needed...
To: Joern Wessels <wessels at staff.uni-marburg.de>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <43F33C77.8070705 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Joern,

Joern Wessels wrote:
> Hi everybody,
> this is my very first post to this mailing list :-) I have got a problem I
> could not solve via manual or google:
> When I use the Maplot function of the affy package on more than three array
> data sets, the text in the fields showing Median and IQR is to big to be
> shown correctly.
> I tried to use the "cex.main, cex.lab, cex.axis, cex.sub but aside from
> changing axis text I could not change anything. Using ?MAplot did not
> produce any useable help (for me as a beginner at least).
> 
> I used the following line to get my graphs:
> 
> MAplot(Data, pairs = TRUE)
> 
> I sombody could help me with that I would be one happy R rookie user.

Well, unfortunately the cex arguments for the text is hard coded, so no 
combination of cex.xxx = ? is going to change things. I will probably 
fix this so you can pass the cex argument to the function, but that will 
take a few days to propagate to the devel download repository and would 
require that you use the devel version of R. Neither of these things is 
likely to be a good thing for a rookie, so the best fix in this case is 
for you to make a modified function that will do what you want (which 
has the added benefit of helping you to learn R).

When you call MAplot() with pairs = TRUE, this function calls another 
function called mva.pairs(). If you type mva.pairs at an R prompt, it 
will be printed to the screen:

 > mva.pairs
function (x, labels = colnames(x), log.it = TRUE, span = 2/3,
     family.loess = "gaussian", digits = 3, line.col = 2, main = "MVA 
plot", ...)
{
     if (log.it)
         x <- log2(x)
     J <- dim(x)[2]
     frame()
     old.par <- par(no.readonly = TRUE)
     on.exit(par(old.par))
     par(mfrow = c(J, J), mgp = c(0, 0.2, 0), mar = c(1, 1, 1,
         1), oma = c(1, 1.4, 2, 1))
     for (j in 1:(J - 1)) {
         par(mfg = c(j, j))
         plot(1, 1, type = "n", xaxt = "n", yaxt = "n", xlab = "",
             ylab = "")
         text(1, 1, labels[j], cex = 2)
         for (k in (j + 1):J) {
             par(mfg = c(j, k))
             yy <- x[, j] - x[, k]
             xx <- (x[, j] + x[, k])/2
             sigma <- IQR(yy)
             mean <- median(yy)
             ma.plot(xx, yy, tck = 0, show.statistics = FALSE,
                 pch = ".", xlab = "", ylab = "", tck = 0, span = span,
                 ...)
             par(mfg = c(k, j))
             txt <- format(sigma, digits = digits)
             txt2 <- format(mean, digits = digits)
             plot(c(0, 1), c(0, 1), type = "n", ylab = "", xlab = "",
                 xaxt = "n", yaxt = "n")
             text(0.5, 0.5, paste(paste("Median:", txt2), paste("IQR:",
                 txt), sep = "\n"), cex = 2)
         }
     }
     par(mfg = c(J, J))
     plot(1, 1, type = "n", xaxt = "n", yaxt = "n", xlab = "",
         ylab = "")
     text(1, 1, labels[J], cex = 2)
     mtext("A", 1, outer = TRUE, cex = 1.5)
     mtext("M", 2, outer = TRUE, cex = 1.5, las = 1)
     mtext(main, 3, outer = TRUE, cex = 1.5)
     invisible()
}


You can then copy this output and paste it into your favorite text 
editor. Fix the first line to say something like this:

my.mva.pairs <- function (x, labels = colnames(x), log.it = TRUE, span = 
2/3, family.loess = "gaussian", digits = 3, line.col = 2, main = "MVA 
plot", cex.text = 2, ...)

Note the change in the function name (along with the <- ), and the 
addition of cex.text = 2.

Now change the line that reads

text(0.5, 0.5, paste(paste("Median:", txt2), paste("IQR:",
                 txt), sep = "\n"), cex = 2)

to say

text(0.5, 0.5, paste(paste("Median:", txt2), paste("IQR:",
                 txt), sep = "\n"), cex = cex.text)

Now you can either copy/paste this function back into your R session, or 
save it as something like my.mva.pairs.R and source() it into your R 
session.

You are almost there - MAplot does some pre-processing of the data first 
that you will have to do by hand. You need to extract the raw intensity 
data from your AffyBatch and pass that to your new function:

pms <- unlist(indexProbes(Data, "both"))
x <- intensity(Data)[pms, ]
my.mva.pairs(x, cex.text = 1)

Poking around in other people's code and seeing what it does/changing it 
to do slightly different things is one of the best ways IMO to learn R.

HTH,

Jim



> 
> Thanks,
> J?rn
>  
>  
> ________________
> J?rn We?els
> Diplom-Biologe
>  
> Philipps-Universit?t Marburg
> Fachbereich Biologie - Tierphysiologie
> Karl-von-Frisch-Str. 8
> 35032 Marburg an der Lahn
>  
> Tel.: +49 (0) 6421 28 23547
> Fax: +49 (0) 6421 28 28937
> Mobil: +49 (0) 170 9346198
> E-Mail: wessels at staff.uni-marburg.de
> Webseite:
> http://cgi-host.uni-marburg.de/~omtierph/stoff/member.php?mem=wessels&lang=d
> e
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor


-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



------------------------------

Message: 3
Date: Wed, 15 Feb 2006 17:05:13 +0200
From: "Ron Ophir" <ron.ophir at weizmann.ac.il>
Subject: Re: [BioC] RMA normalization when using subsets of samples
To: <bioconductor at stat.math.ethz.ch>
Message-ID: <s3f35f58.078 at wisemail.weizmann.ac.il>
Content-Type: text/plain; charset=US-ASCII

Dear all,
I think that D.R. Godstein has tried to  answer Sylvia's question in

http://ludwig-sun2.unil.ch/~darlene/ms/prRMA.pdf

Ron

>>> <Larry.Lapointe at csiro.au> 02/15/06 11:55 AM >>>
Dear Martin,

We have run up to 550 chips achieving a reasonable processing time --
not more than an hour or so.  The practical limits seem to be more
related to machine RAM and R memory management.  RMA normalization of
550 chips occupies about 12 GB or so on our quad processor Opteron-based
system.

Larry


Lawrence LaPointe
CSIRO Bioinformatics for Human Health
Sydney, Australia




-----Original Message-----
From:	bioconductor-bounces at stat.math.ethz.ch on behalf of
martin.schumacher at novartis.com 
Sent:	Wed 2/15/2006 7:43 PM
To:	bioconductor at stat.math.ethz.ch 
Cc:	
Subject:	Re: [BioC] RMA normalization when using subsets of
samples

Dear Colleagues,

Greetings from Switzerland !
I agree with the statements of Wolfgang and Adai. Using all chips will

certainly put you on the safe side. 
I wonder what you feel would be the minimal number of chips for a
"stable" 
(meaning that a larger set would give essentially the same results) RMA

processing? People from GeneLogic told me that about 20 chips are 
sufficient.
Is it possible to run RMA using Bioconductor with 200 chips and get the

results back within a reasonable time?

Best regards,
Martin






Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
Sent by: bioconductor-bounces at stat.math.ethz.ch 
15.02.2006 01:01
Please respond to ramasamy

 
        To:     Wolfgang Huber <huber at ebi.ac.uk>
        cc:     Sylvia.Merk at ukmuenster.de,
bioconductor at stat.math.ethz.ch, (bcc: Martin 
Schumacher/PH/Novartis)
        Subject:        Re: [BioC] RMA normalization when using subsets
of samples
        Category: 



This would be a problem if one or more of the resulting subsets is
small
and contains outliers.

My preference is to preprocess all arrays together. My reasoning is
that
doing this will give RMA median polish (and to a lesser extent with
the
quantile normalisation) steps much more information to work with.

Regards, Adai




On Wed, 2006-02-15 at 17:16 +0000, Wolfgang Huber wrote:
> Dear Sylvia,
> 
> this might not be the answer that you want to hear, but for the end 
> result it should not matter (substantially) which of the two 
> possibilities you take, and I would be worried if it did.
> 
> Best wishes
>   Wolfgang
> 
> -------------------------------------
> Wolfgang Huber
> European Bioinformatics Institute
> European Molecular Biology Laboratory
> Cambridge CB10 1SD
> England
> Phone: +44 1223 494642
> Fax:   +44 1223 494486
> Http:  www.ebi.ac.uk/huber 
> -------------------------------------
> 
> Sylvia.Merk at ukmuenster.de wrote:
> > Dear bioconductor list,
> > 
> > I have a question concerning RMA-normalization:
> > 
> > There are for example 200 CEL-Files and the clinicians have
several
> > research questions - each concernig only a subset of the 200
samples
> > including the possibility that some samples are included in more
than
> > one question.
> > 
> > There are two possibilities to normalize the CEL-Files: 
> > 
> > 1.: Read all 200 chips in an affybatch-object and normalize all
200
> > chips together and further analyze the required subset. 
> > 
> > 2.: Read only the required chips in an affybatch-object, normalize

these
> > chips and then perform further analysis 
> > I think that this approach is the better one but it has the 
disadvantage
> > that some samples are included in several normalizations ending in
> > different gene expression levels for a single sample.
> > 
> > What is (from a statisticians view) the appropriate approach to
> > normalize CEL-Files in this case?
> > 
> > Thank you in advance
> > Sylvia 
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor 
>

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://stat.ethz.ch/mailman/listinfo/bioconductor 



	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://stat.ethz.ch/mailman/listinfo/bioconductor 

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 
https://stat.ethz.ch/mailman/listinfo/bioconductor



------------------------------

Message: 4
Date: Wed, 15 Feb 2006 16:07:16 +0100
From: Nicolas Servant <Nicolas.Servant at curie.fr>
Subject: [BioC]  LPE
To: bioconductor at stat.math.ethz.ch
Message-ID: <43F343A4.9030500 at curie.fr>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed


-- 
Nicolas Servant
Equipe Bioinformatique
Institut Curie 
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE

Email: Nicolas.Servant at curie.fr
Tel: 01 44 32 42 75



------------------------------

Message: 5
Date: Wed, 15 Feb 2006 15:38:57 -0000
From: "Pete" <p.underhill at har.mrc.ac.uk>
Subject: Re: [BioC] Timeseries loop design analysis using Limma or
	Maanova?
To: <bioconductor at stat.math.ethz.ch>, "Jenny Drnevich"
	<drnevich at uiuc.edu>
Message-ID: <03d001c63245$ef9799e0$a2f1f682 at mrc153>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original

Thanks for your response,

>>Hello all,
>>
>>I have been asked to analyse a set of timecourse data with an unusual
>>incomplete loop design. This is the design of this type I have looked at
>>and I'm not entirely sure how to treat it.
>>
>>The initial (and fairly easy) question asked of the data is, what are the
>>differences between the mutant and the control animals at each timepoint?
>
> I am interested in how you are going to analyze the differences between
> mutants and controls at each time point given that there is no replication
> of the control animals (only 1 control pool). I just advised a researcher
> against this kind of experimental design because I could not think offhand
> of a way to analyze it statistically. If there is a statistically valid
> method, I would like to know about it.
>

I'm not quite sure I understand your point here? I was going to treat this 
as a simple dye swap experiment, ignoring time and comparing mutant to WT.
Is this not a statistically valid approach? There are 3 independ mutant 
samples compared in dyeswaps to the WT pool. I understand that there is no 
biological replicate for the WT pool, however it is technically replicated 
at the dyeswap level and cDNA synthesis level. The biological variation of 
the WT population is not of immediate interest in this case, hence a pool 
was used. Individual mutant samples were used instead of a pool, because 
only a limited number of mutants were available.



>
>>The second question is how the mutant changes across the timeseries. The
>>authors
>>wish to use a bayesian timeseries clustering algorithmn to analyse this, 
>>but
>>this requires a standardised measure for the mutant at each timepoint.
>
> How are you going to implement this bayesian timeseries clustering? My
> interpretation of clustering algorithms in general is that they should not
> be used to determine which genes are "differentially" expressed, but 
> rather
> one should first use a statistical model to determine differential
> expression, then only cluster the genes that show a significant difference
> somewhere along the time series to find groups of genes that show a 
> similar
> expression pattern. My approach to this situation would be something along
> the lines of a single-channel analysis using a mixed model with array + 
> dye
> + treatment + time + treatment*time, and then cluster genes that showed a
> significant time effect, using the lsmeans for each mutant*timepoint 
> group.
> The lack of replication of the controls may cause this not to work...
>
> Cheers,
> Jenny


I agree with your statement about clustering, and prehaps I didn't word my 
question very clearly. The timeseires clustering will indeed be performed on 
genes selected as differential with respect to time. In the past I have used 
the MAANOVA package to select these differential genes, however in that 
particular case, the samples were all compared back to a single reference 
sample rather than multiple references that are then compared to eachother 
in an incomplete loop.

The issue I am concerned with is how to both, select genes that have a time 
effect, and how/what to use as a standardised expression level for these 
genes so that it can then be used in the clustering.


Cheers

Pete

>
>
>
>>I am unsure quite how to achieve this second point and welcome any
>>suggestions or references that may help. Is this something I could do in
>>Limma or MAanova?
>>
>>
>>The data are from spotted, two-colour, oligo arrays. There are 6 
>>timepoints.
>>At each timepoint, tissue samples from 3 individual mutant animals are
>>compared to a control pool of WT animals at the same timepoint, with dye
>>swaps. In addition each control pool has then been compared in a dye swap 
>>to
>>the next timepoint control pool. See diagram below (if it comes out
>>correctly!) or the table further below where a1 a2 a3 represent any 3
>>individual animals.
>>
>>
>>
>>a1t1    a2t1    a3t1           a1t2    a2t2    a3t2      etc............
>>     \\        ||        //                \\        ||        //
>>      Control t1  ========= Control t2    ==== etc...............
>>
>>or
>>
>>SLIDE        CY3            CY5
>>1                a1t1            control t1
>>2                control t1    a1t1
>>3                a2t1            control t1
>>4                control t1     a2t1
>>5                a3t1            control t1
>>6                control t1    a3t1
>>7                a1t2            control t2
>>8                control t2    a1t2
>>9                a2t2            control t2
>>10                control t2        a2t2
>>11                a3t2                control t2
>>12                control t2        a3t2
>>13                a1t3                control t3
>>14                control t3        a1t3
>>15                a2t3                control t3
>>16                control t3        a2t3
>>17                a3t3                control t3
>>18                control t3        a3t3
>>19                a1t4                control t4
>>20                control t4        a1t4
>>21                a2t4                control t4
>>22                control t4        a2t4
>>23                a3t4                control t4
>>24                control t4        a3t4
>>25                a1t5                control t5
>>26                control t5        a1t5
>>27                a2t5                control t5
>>28                control t5        a2t5
>>29                a3t5                control t5
>>30                control t5        a3t5
>>31                a1t6                control t6
>>32                control t6        a1t6
>>33                a2t6                control t6
>>34                control t6        a2t6
>>35                a3t6                control t6
>>36                control t6        a3t6
>>37                control t1        control t2
>>38                control t2        control t1
>>39                control t2        control t3
>>40                control t3        control t2
>>41                control t3        control t4
>>42                control t4        control t3
>>43                control t4        control t5
>>44                control t5        control t4
>>45                control t5        control t6
>>46                control t6        control t5
>>
>>
>>Many thanks
>>
>>Pete
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at uiuc.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



------------------------------

Message: 6
Date: Wed, 15 Feb 2006 11:01:13 -0500
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] Timeseries loop design analysis using Limma or
	Maanova?
To: Pete <p.underhill at har.mrc.ac.uk>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <43F35049.6000701 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Pete,

Pete wrote:

> I'm not quite sure I understand your point here? I was going to treat this 
> as a simple dye swap experiment, ignoring time and comparing mutant to WT.
> Is this not a statistically valid approach? There are 3 independ mutant 
> samples compared in dyeswaps to the WT pool. I understand that there is no 
> biological replicate for the WT pool, however it is technically replicated 
> at the dyeswap level and cDNA synthesis level. The biological variation of 
> the WT population is not of immediate interest in this case, hence a pool 
> was used. Individual mutant samples were used instead of a pool, because 
> only a limited number of mutants were available.

You can certainly do something like this, but there are some caveats. 
First, by comparing WT to mutant and ignoring time you are essentially 
looking at a main effect that might not be of much interest (hence why 
would you make the effort to do a time series?). Usually a more 
interesting question is to look for genes that are differentially 
expressed between mutant and WT at particular times, which I assume is 
why Jenny said you have no replication.

Second, when you compare biological replicates to technical replicates 
you are underestimating the true variability of the WT samples, which 
may result in apparent significance where there may have been none had 
biological replicates been used for WT samples as well. This is usually 
only a problem when you try to validate the results (using new 
biologically replicated samples), if there are many genes that fail to 
validate. Since the validation step is usually much slower and 
laborious, decreasing the number of false positives in the microarray 
step is often worth the time and effort.

Best,

Jim


-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



------------------------------

Message: 7
Date: Wed, 15 Feb 2006 09:49:12 -0800
From: Maurice Melancon <dmso12 at gmail.com>
Subject: [BioC] interpretation of vsn normalized data
To: bioconductor at stat.math.ethz.ch
Message-ID:
	<87ba8bf70602150949s6e9e2af4t5ca024b1b77d394b at mail.gmail.com>
Content-Type: text/plain

Hello All,

I used vsn to normalze my one-channel cDNA microarray experiment.  I'm sorry
if this is an elementary question (I'm not a math person) but can vsn data
be interpreted in similar fashion to log2 data, e.g. 1 log vale = 2-fold
induction?  What would be the appropriate transformation to get to either
log2 or raw data from vsn data?

Briefly, what I did was to normalize using vsn, then I used SAS to run
anovas with pairwise comparisons and anova slicing.  Using the estimate
function returns estimated differences between the reported means.  I am
seeking then to bridge the gap between these estimates and actual fold
changes.  I think this can be done, but I am unsure about how to either
reverse-transform the vsn data or how to interpret it biologically (e.g. 1
log = 2x fold change)

WIth thanks

Maurice

	[[alternative HTML version deleted]]



------------------------------

Message: 8
Date: Tue, 14 Feb 2006 16:16:49 -0800
From: Maurice Melancon <dmso12 at gmail.com>
Subject: [BioC] interpretation of vsn normalized data
To: bioconductor at stat.math.ethz.ch
Message-ID:
	<87ba8bf70602141616t5fb5275o29c41c9a950f4825 at mail.gmail.com>
Content-Type: text/plain

Hello All,

I used vsn to normalze my one-channel cDNA microarray experiment.  I'm sorry
if this is an elementary question (I'm not a math person) but can vsn data
be interpreted in similar fashion to log2 data, e.g. 1 log vale = 2-fold
induction?  What would be the appropriate transformation to get to either
log2 or raw data from vsn data?

WIth thanks

Maurice

	[[alternative HTML version deleted]]



------------------------------

Message: 9
Date: Tue, 14 Feb 2006 10:30:23 +0100 (CET)
From: kfbargad at ehu.es
Subject: [BioC] Fold Change values after RMA
To: bioconductor at stat.math.ethz.ch
Message-ID: <1759269561kfbargad at ehu.es>
Content-Type: text/plain;	charset="ISO-8859-1"


Dear List,

I have come across an article (Choudary et al. 2005, PNAS 102,15653-
15658) where they state that the FC values after RMA 
preprocessing "always" remain below a maximum of 2.0 Is this right?

I have performed some analyses using RMA, quantile normalisation and 
limma and am getting M values higher than 2, and if M = log2(FC), then 
FC values are higher than 4. What am I missing? Any comments on this?

Thanks in advance,

David



------------------------------

Message: 10
Date: Thu, 16 Feb 2006 18:16:24 +0000
From: Wolfgang Huber <huber at ebi.ac.uk>
Subject: Re: [BioC] interpretation of vsn normalized data
To: Maurice Melancon <dmso12 at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <43F4C178.6020305 at ebi.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed


Hi Maurice,

in statistics it is sometimes useful to differentiate between (a) the 
estimator and (b) the true underlying quantity that you want to estimate.

For example, if you want to estimate the expectation value of a 
symmetric distribution, you can use the mean, or the median as 
estimators. They are both correct, but depending on the data they can 
provide different, and more or less appropriate answers.

With microarrays, (b) is the fold-change, that is the change in mRNA 
abundance. The log-ratio of fluorescence intensities is a simple and 
intuitive estimator for this, but if the fluorescence intensities become 
small, this estimator can have unpleasant properties, like large 
variance. The glog-ratio (what vsn provides) is an alternative 
estimator, which avoids the variance explosion, for the price of being 
biased towards 0 when the fluorescence intensities are small.

Note that the vsn function returns glog to base e (so a glog-ratio of 1 
corresponds to an estimated fold change of exp(1) = 2.718..) while many 
other packages use log2.

  Hope this helps
  Wolfgang




Maurice Melancon wrote:
> Hello All,
> 
> I used vsn to normalze my one-channel cDNA microarray experiment.  I'm sorry
> if this is an elementary question (I'm not a math person) but can vsn data
> be interpreted in similar fashion to log2 data, e.g. 1 log vale = 2-fold
> induction?  What would be the appropriate transformation to get to either
> log2 or raw data from vsn data?
> 
> Briefly, what I did was to normalize using vsn, then I used SAS to run
> anovas with pairwise comparisons and anova slicing.  Using the estimate
> function returns estimated differences between the reported means.  I am
> seeking then to bridge the gap between these estimates and actual fold
> changes.  I think this can be done, but I am unsure about how to either
> reverse-transform the vsn data or how to interpret it biologically (e.g. 1
> log = 2x fold change)
> 
> WIth thanks
> 
> Maurice
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor


-- 
Best regards
   Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber



------------------------------

Message: 11
Date: Wed, 15 Feb 2006 13:37:00 -0500
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] Fold Change values after RMA
To: kfbargad at ehu.es
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <43F374CC.4060407 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi David,

kfbargad at ehu.es wrote:
> Dear List,
> 
> I have come across an article (Choudary et al. 2005, PNAS 102,15653-
> 15658) where they state that the FC values after RMA 
> preprocessing "always" remain below a maximum of 2.0 Is this right?

No, this is not correct. There are other factual errors in that paper as 
well, which makes me wonder if there was a breakdown in communication 
between the statisticians and those who wrote the paper.

That said, it is my understanding that fold change values in brain are 
often very small, so they may simply be trying to indicate that using a 
fold change of two is not reasonable in that context.

Best,

Jim


> 
> I have performed some analyses using RMA, quantile normalisation and 
> limma and am getting M values higher than 2, and if M = log2(FC), then 
> FC values are higher than 4. What am I missing? Any comments on this?
> 
> Thanks in advance,
> 
> David
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor


-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



------------------------------

Message: 12
Date: Wed, 15 Feb 2006 11:28:39 -0800
From: Nianhua Li <nli at fhcrc.org>
Subject: [BioC]  ANN: BioC2006 Conference Scheduled for August in
	Seattle
To: bioconductor at stat.math.ethz.ch, r-help at stat.math.ethz.ch
Message-ID: <43F380E7.9010603 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

==================================
    BioC2006
Where Software and Biology Connect
==================================

This conference will highlight current developments within and
beyond Bioconductor, a world-wide open source and open development
software project for the analysis and comprehension of genomic data.

Our goal is to provide a forum in which to discuss the use and
design of software for analyzing data arising in biology with a
focus on Bioconductor and genomic data.

Where: Fred Hutchinson Cancer Research Center
Seattle WA.

When: August 3 and 4, 2006

What:    Morning Talks: 8:30-12:00
    Afternoon Practicals: 2:00-5:00
    Thursday Evening 5:00-7:30 Posters and Wine & Cheese

Fees:    300 USD for attendees registered before July 1
    250 USD for Bioconductor package maintainers
            or FHCRC employees
    125 USD for enrolled full-time students
    
The online registration form and conference details are now available at
    http://www.bioconductor.org/BioC2006
(You will be redirected to our secure server: 
https://cobra.fhcrc.org/BioC2006



------------------------------

Message: 13
Date: Wed, 15 Feb 2006 20:41:53 -0500
From: knaxerov at ix.urz.uni-heidelberg.de
Subject: [BioC] Run GOHyperG without specifying a chip
To: bioconductor at stat.math.ethz.ch
Message-ID:
	<20060215204153.9bs7tl84f4c4s0sw at wwwmail-new.urz.uni-heidelberg.de>
Content-Type: text/plain;	charset=ISO-8859-1

Hello everyone,

here's a really trivial question, but I can't find the answer anywhere: can I
use GOHyperG() in GOstats without specifying a particular microarray chip? I'd
simply like to pass a list of EntrezGene IDs as my "total population".

Thanks!!!!
Kamila



------------------------------

Message: 14
Date: Thu, 16 Feb 2006 09:07:58 +0100
From: John.Brozek at it-omics.com
Subject: [BioC]  Data Frame Error in Affycomp
To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
Message-ID:
	<OFBF9D5073.DF6D05AD-ONC1257117.002C3B0E-C1257117.002CAD01 at genfit.com>
Content-Type: text/plain

Dear all,

I encountered the same problem described by monnie McGee using the 
Affycomp library. As adviced by the vignette I've used the "
read.newspikein" function with HG-U133A spikein data.


>library(affy)
>library(affycomp)
>spike133 <- ReadAffy(...)
>eset <- expresso(spike133, bgcorrect.method="rma", 
normalize.method="quantiles", pmcorrect.method="pmonly", 
summary.method="medianpolish")
>new.eset <- exprs(eset)
>write.table(data.frame(new.eset,check.names=FALSE),"rma-133.csv",sep=",",col.names=NA,quote=FALSE)
>read.newspikein("rma-133.csv")
Error in "[.data.frame"(s, , rownames(pData(pd))) :
        undefined columns selected

Any suggestions ?

Thanks

John Brozek

	[[alternative HTML version deleted]]



------------------------------

Message: 15
Date: Wed, 15 Feb 2006 10:15:34 -0800
From: Ben Bolstad <bmb at bmbolstad.com>
Subject: Re: [BioC] Fold Change values after RMA
To: kfbargad at ehu.es
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <1140027334.3781.19.camel at localhost.localdomain>
Content-Type: text/plain

Two points:

1. In general estimates of FC off microarrays tend to be smaller than
the truth, irrespective of processing algorithm. 

2. There is no specific reason why RMA should limit to FC values of 2.0
(and it does not do this in general, as you have observed with your own
dataset). 

In the case of Choudary et al they are studying gene expression changes
in the brain and my understanding is that these fold changes are
typically small, perhaps explaining the comment.

Ben



On Tue, 2006-02-14 at 10:30 +0100, kfbargad at ehu.es wrote:
> Dear List,
> 
> I have come across an article (Choudary et al. 2005, PNAS 102,15653-
> 15658) where they state that the FC values after RMA 
> preprocessing "always" remain below a maximum of 2.0 Is this right?
> 
> I have performed some analyses using RMA, quantile normalisation and 
> limma and am getting M values higher than 2, and if M = log2(FC), then 
> FC values are higher than 4. What am I missing? Any comments on this?
> 
> Thanks in advance,
> 
> David
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



------------------------------

Message: 16
Date: Thu, 16 Feb 2006 10:24:34 +0100
From: "Benjamin Otto" <b.otto at uke.uni-hamburg.de>
Subject: [BioC] Differences: mas5/mas5calls vs.
	call.expr/pairwise.comparison
To: "BioClist" <bioconductor at stat.math.ethz.ch>
Message-ID: <NOEOKKCPBGIAIPPDONMGCELJCBAA.b.otto at uke.uni-hamburg.de>
Content-Type: text/plain;	charset="iso-8859-1"

Dear BioC members,

in my last calculations I noticed that the affy-package combination of
mas5() and mas5calls() results in different Present/Absent calls than the
simpleaffy-package version with call.exprs() and pairwise.comparison(). That
was the more surprising for me as I thought simpleaffy was just some wrapper
around the affy-package automating some high-level steps. A comparison of
the results with the ones returned by the affymetrix software revealed that
the simpleaffy version is nearly identical while the affy version is
different one. Is there some error in my code?
The exact commands I used were:

x <- read.affy()

#version 1:
mas <- mas5(x,sc=sometgt)
mas.call <- mas5calls(x)

#version 2:
simplemas <- call.exprs(x,"mas5",sc=sometgt)
simplemas.cmp <- pairwise.comparison(simplemas,"treatment",spots=x)


regards
Benjamin



------------------------------

Message: 17
Date: Thu, 16 Feb 2006 10:03:50 +0000 (GMT)
From: Ilhem Diboun <idiboun at biochemistry.ucl.ac.uk>
Subject: [BioC] comparing correlation coefficients (fwd)
To: bioconductor at stat.math.ethz.ch
Message-ID: <Pine.LNX.4.44.0602161001490.19097-100000 at w3pain>
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear all

I would greaty appreaciate any help with the following. Can Pearson 
correlation coefficients from data on different range or scale be 
compared??.For example, if I compute the (r) value from a pair of intensity 
ratio datasets spanning the range -10 to +10, can I compare it with an 
the (r) value from another pair of intensity ratio datasets spanning a 
different range say -5 to +5. Similarily, can I compare the (r) value from 
correlating a pair of intensity ratio datasets with that from 
correlating a pair of absolute intensity datasets (where the scale of the 
data is different). The question that I would want to address from such 
comparison is whether the ratios covary better than the raw intensities.

Please let me know if this is not clear enough...
Many thanks.



------------------------------

Message: 18
Date: Thu, 16 Feb 2006 11:21:44 +0100
From: "Benjamin Otto" <b.otto at uke.uni-hamburg.de>
Subject: Re: [BioC] Differences: mas5/mas5calls vs.
	call.expr/pairwise.comparison
To: "Benjamin Otto" <b.otto at uke.uni-hamburg.de>,	"BioClist"
	<bioconductor at stat.math.ethz.ch>
Message-ID: <NOEOKKCPBGIAIPPDONMGGELKCBAA.b.otto at uke.uni-hamburg.de>
Content-Type: text/plain;	charset="us-ascii"

I just checked the expresson values returned by the two methods. The
interesting thing is, mas5(x,sc=sometgt) yields results nearly identical to
the affymetrix ones. The simplaffy call.exprs(x,"mas5",sc=sometgt) return
values a little bit different different. Looking the resulting values it
makes no big difference if I call call.exprs() with "mas5" or "mas5-R". So
why do I get such different results? Here are my first five rows:

> smp2 <- call.exprs(x,"mas5-R",sc=sometgt)
> exprs(smp2)[1:5,]
          1026_HG-U133.CEL 62_HG-U133.CEL
1007_s_at         4.850861       4.657905
1053_at           2.438049       3.517322
117_at            4.199809       4.803548
121_at            6.758776       6.241071
1255_g_at         4.450771       1.855238

> exprs(simplemas)[1:5,]
          1026_HG-U133.CEL 62_HG-U133.CEL
1007_s_at         4.851125       4.658149
1053_at           2.438313       3.517566
117_at            4.200072       4.803792
121_at            6.759039       6.241314
1255_g_at         4.451034       1.855482


> exprs(mas)[1:5,]
          1026_HG-U133.CEL 62_HG-U133.CEL
1007_s_at        28.857236      25.244638
1053_at           5.419085      11.450371
117_at           18.376736      27.926217
121_at          108.291468      75.639653
1255_g_at        21.868322       3.618115


> log(exprs(mas))[1:5,]
          1026_HG-U133.CEL 62_HG-U133.CEL
1007_s_at         3.362361       3.228614
1053_at           1.689927       2.438022
117_at            2.911086       3.329566
121_at            4.684826       4.325981
1255_g_at         3.085039       1.285953


and here is a copy of the corresponding function calls of mas5() and
call.exprs():

> mas5
function (object, normalize = TRUE, sc = 500, analysis = "absolute",
    ...)
{
    res <- expresso(object, bgcorrect.method = "mas", pmcorrect.method =
"mas",
        normalize = FALSE, summary.method = "mas", ...)
    if (normalize)
        res <- affy.scalevalue.exprSet(res, sc = sc, analysis = analysis)
    return(res)
}

#calls.exprs
...
    else if (algorithm == "mas5-R") {
        if (is.na(method)) {
            tmp1 <- expresso(x, normalize = FALSE, bgcorrect.method = "mas",
                pmcorrect.method = "mas", summary.method = "mas")
            tmp <- affy.scalevalue.exprSet(tmp1, sc = sc)
        }
        else {
            tmp1 <- expresso(x, normalize.method = method, bgcorrect.method
= "mas",
                pmcorrect.method = "mas", summary.method = "mas")
            tmp <- affy.scalevalue.exprSet(tmp1, sc = sc)
        }
...
    else if (algorithm == "mas5") {
        tmp <- justMAS(x, tgt = sc)
        if (!do.log) {
            exprs(tmp) <- 2^exprs(tmp)
        }
...



I don't really see the difference.

regards,
benjamin

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Benjamin
> Otto
> Sent: 16 February 2006 10:25
> To: BioClist
> Subject: [BioC] Differences: mas5/mas5calls vs.
> call.expr/pairwise.comparison
>
>
> Dear BioC members,
>
> in my last calculations I noticed that the affy-package combination of
> mas5() and mas5calls() results in different Present/Absent calls than the
> simpleaffy-package version with call.exprs() and
> pairwise.comparison(). That
> was the more surprising for me as I thought simpleaffy was just
> some wrapper
> around the affy-package automating some high-level steps. A comparison of
> the results with the ones returned by the affymetrix software
> revealed that
> the simpleaffy version is nearly identical while the affy version is
> different one. Is there some error in my code?
> The exact commands I used were:
>
> x <- read.affy()
>
> #version 1:
> mas <- mas5(x,sc=sometgt)
> mas.call <- mas5calls(x)
>
> #version 2:
> simplemas <- call.exprs(x,"mas5",sc=sometgt)
> simplemas.cmp <- pairwise.comparison(simplemas,"treatment",spots=x)
>
>
> regards
> Benjamin
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



------------------------------

Message: 19
Date: Thu, 16 Feb 2006 11:55:11 +0100
From: Antoine Lucas <antoinelucas at libertysurf.fr>
Subject: [BioC] GeneR
To: bioconductor at stat.math.ethz.ch
Message-ID: <20060216115511.1817fcb4.antoinelucas at libertysurf.fr>
Content-Type: text/plain; charset=ISO-8859-15

Dear R users,

There is a new release of package GeneR.

Briefly, GeneR package provides tools to manipulate large DNA/protein
sequences (like a whole chromosome).

By "manipulate" I mean of course extract, append, concatenate; but also
look for "word", orfs or masked positions, or to returns compositions
of nono-di-tri nucleotides.

It has been designed to work with large vectors so that it can concatenate all exons
of a chromosome at once, or the composition of same exons.

There are I/O functions to work with standard bank files like Fasta, 
Genebank and Embl.

And last, but not least we provide a complete arithmetic on "segments" i.e.
functions like "union", "intersection", "not" between two sets of segments. 
One application could be to "substract" all CDS from genes and deduce UTR
regions.

In new release we add some functions working on Embl files (to read headers, 
or features), and correct some bugs.

I hope you will enjoy it ! 

Regards,

	Antoine.

-- 
Antoine Lucas
Centre de g?n?tique Mol?culaire, CNRS
91198 Gif sur Yvette Cedex
Tel: (33)1 69 82 38 89
Fax: (33)1 69 82 38 77



------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor


End of Bioconductor Digest, Vol 36, Issue 14
********************************************



More information about the Bioconductor mailing list