[BioC] Data analysis

Naomi Altman naomi at stat.psu.edu
Thu Dec 4 23:14:10 MET 2003


I also have a data set with differing numbers of spot replications.  I used 
lme to analyze these data, gene by gene.

Basically, I wrote a little function that pulls the spot information out of 
the array, removes the flagged spots and does other data cleaning, and then 
runs lme (using "try" in case it bombs).  Then I use
"split" to split the array data by geneID, and lapply to apply the function 
to every gene.

Is this slow?  Yes.  But once it is tested I just get it started on Friday 
at 5, and by Monday at 9 I have my results.
The major drawback is that I am doing a gene by gene ANOVA.  The major 
advantage is that I can safely remove flagged spots, instead of trying to 
fudge in some values to maintain the balance.

--Naomi Altman

At 11:40 PM 10/16/2003, Gordon Smyth wrote:
>At 11:53 PM 16/10/2003, Jason Skelton wrote:
>>Gordon Smyth wrote:
>>>
>>>I would use the limma commands lmFit (or lm.series or gls.series) 
>>>followed by makeContrasts, eBayes and classifyTests. See the earliers posts:
>>Thanks for this infomation Gordon I'll try this and see what results I 
>>get.........
>>
>>On a different note
>>The arrays I have tested LIMMA on have 2 duplicates and are spaced evenly 
>>throughout the array and so have no problems running your functions.
>>
>>Someone else at the Sanger Insitite would like to be able to use LIMMA 
>>but the number of duplicates for each gene differs on their array e.g for 
>>some genes their are two copies and for others there would be four copies 
>>or more which inturn obviously effects spacing etc between replicates.
>>I'm not sure why they would want differing numbers of copies of genes but 
>>they would like to be able to estimate the correlation between these 
>>genes anyway and obviously see the results as one data point per merged gene.
>
>I haven't implemented this in limma because it seems to me that it might 
>invalidate the assumptions behind the duplicate correlation approach. See 
>the earlier post:
>
>https://stat.ethz.ch/pipermail/bioconductor/2003-August/002224.html
>
>>I've tried to think of how this can be done but it seems overly complex 
>>and I'm not sure if it is at all possible in R or Limma.
>>
>>I'm guessing there is no way of carryout the correlation, series model 
>>fits etc based simply on the "Name" specified in the GAL files ?
>
>No.
>
>Cheers
>Gordon
>
>>or some how specifying the duplicate number for each gene seperately
>>and somehow merging this information for use as a parameter ?
>>
>>I'm doubting very much that this can be done at all but it's worth 
>>asking  ;-)
>>
>>thanks
>>
>>Jason
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list