[BioC] Data analysis
Naomi Altman
naomi at stat.psu.edu
Thu Dec 4 23:14:10 MET 2003
I also have a data set with differing numbers of spot replications. I used
lme to analyze these data, gene by gene.
Basically, I wrote a little function that pulls the spot information out of
the array, removes the flagged spots and does other data cleaning, and then
runs lme (using "try" in case it bombs). Then I use
"split" to split the array data by geneID, and lapply to apply the function
to every gene.
Is this slow? Yes. But once it is tested I just get it started on Friday
at 5, and by Monday at 9 I have my results.
The major drawback is that I am doing a gene by gene ANOVA. The major
advantage is that I can safely remove flagged spots, instead of trying to
fudge in some values to maintain the balance.
--Naomi Altman
At 11:40 PM 10/16/2003, Gordon Smyth wrote:
>At 11:53 PM 16/10/2003, Jason Skelton wrote:
>>Gordon Smyth wrote:
>>>
>>>I would use the limma commands lmFit (or lm.series or gls.series)
>>>followed by makeContrasts, eBayes and classifyTests. See the earliers posts:
>>Thanks for this infomation Gordon I'll try this and see what results I
>>get.........
>>
>>On a different note
>>The arrays I have tested LIMMA on have 2 duplicates and are spaced evenly
>>throughout the array and so have no problems running your functions.
>>
>>Someone else at the Sanger Insitite would like to be able to use LIMMA
>>but the number of duplicates for each gene differs on their array e.g for
>>some genes their are two copies and for others there would be four copies
>>or more which inturn obviously effects spacing etc between replicates.
>>I'm not sure why they would want differing numbers of copies of genes but
>>they would like to be able to estimate the correlation between these
>>genes anyway and obviously see the results as one data point per merged gene.
>
>I haven't implemented this in limma because it seems to me that it might
>invalidate the assumptions behind the duplicate correlation approach. See
>the earlier post:
>
>https://stat.ethz.ch/pipermail/bioconductor/2003-August/002224.html
>
>>I've tried to think of how this can be done but it seems overly complex
>>and I'm not sure if it is at all possible in R or Limma.
>>
>>I'm guessing there is no way of carryout the correlation, series model
>>fits etc based simply on the "Name" specified in the GAL files ?
>
>No.
>
>Cheers
>Gordon
>
>>or some how specifying the duplicate number for each gene seperately
>>and somehow merging this information for use as a parameter ?
>>
>>I'm doubting very much that this can be done at all but it's worth
>>asking ;-)
>>
>>thanks
>>
>>Jason
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list