[BioC] Read single channel GenePix in limma [was: Analyze miRNA experiment in Bioconductor]

Wed May 14 15:44:51 CEST 2008

On Wed, May 14, 2008 at 9:34 AM, Paul Geeleher <paulgeeleher at gmail.com> wrote:
> By the way, I've uploaded one of the .gpr files if it would help to have a look:
>
> http://frink.nuigalway.ie/~pat/2007-02-19_130_0532.gpr

Hi, Paul.  I'm not sure what the "ID" column represents, but it
appears that each miR could be represented by several IDs.  If ID
represents a unique sequence, then I think you could summarize across
IDs, but I'm not sure that I would suggest summarizing each miR, as
each sequence will likely have different hyb characteristics.

All that said, I have not used Exiqon arrays, so I am just guessing
about how they are designed.

Sean

> On Wed, May 14, 2008 at 1:29 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Wed, May 14, 2008 at 7:54 AM, Paul Geeleher <paulgeeleher at gmail.com> wrote:
>>> Hi Gordon,
>>>
>>> Thanks for you email. I've followed your steps and am getting some output now.
>>>
>>> One problem though. When should the summarization step occur? What I
>>> mean is that, between miRNA and control signals, my GPR file contains
>>> about 3000 entries and when I am done with analysis topTable will
>>> return all of these individually. But many of the miRNAs have multiple
>>> entries in the ".gpr" file. So how, and when, should I go about
>>> combining these into one value?
>>
>> Paul,
>>
>> What is the manufacturer of these arrays?  The summarization method
>> may depend on that somewhat.
>>
>> Sean
>>> On Sun, May 11, 2008 at 4:59 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>> Dear Paul,
>>>>
>>>>  The limma User's Guide doesn't discuss how to read single channel data, but
>>>> how to do this has been described half a dozen times on this mailing list.
>>>> Since limma is designed for two colours, you can fool it by giving two
>>>> column names and ignoring the one you don't need.  If you only have the Cy3
>>>> channel foreground for example you might use
>>>>
>>>>   Cy3 <- "F532 Mean"
>>>>   RG <- read.maimages(source="genepix",columns=list(R=Cy3,G=Cy3))
>>>>
>>>>  then
>>>>
>>>>   RG$R <- NULL
>>>>
>>>>  to remove the extraneous values.
>>>>
>>>>  Then RG$G could be given as input to vsnMatrix() and the output analysed
>>>> with lmFit().
>>>>
>>>>  Please don't edit your GenePix files manually, there's no need.  It's prone
>>>> to introducing errors and is non-reproducible.
>>>>
>>>>  The error message "number of items to replace is not a multiple of
>>>> replacement length" is not caused by having only one channel.  limma gives a
>>>> far more informative message in that case.  The most likely explanation is
>>>> that your GenePix files are not of equal lengths.  If that is indeed the
>>>> problem, then the limma package doesn't offer any easy solution.  Your only
>>>> approach would be to read the files in individually, then align the
>>>> expression values yourself.
>>>>
>>>>  You cannot use read.maimages() with source="imagene" because you do not
>>>> have ImaGene files.
>>>>
>>>>  Best wishes
>>>>  Gordon
>>>>
>>>>
>>>>
>>>> > Date: Fri, 9 May 2008 15:54:39 +0100
>>>> > From: "Paul Geeleher" <paulgeeleher at gmail.com>
>>>> > Subject: Re: [BioC] Analyze miRNA experiment in Bioconductor
>>>> > To: "Wolfgang Huber" <huber at ebi.ac.uk>
>>>> > Cc: bioconductor at stat.math.ethz.ch
>>>> >
>>>> > Doesn't seem to be anything in the users guide specific to this kind
>>>> > of analysis unfortunately.
>>>> >
>>>> > -Paul
>>>> >
>>>> > On Thu, May 8, 2008 at 10:31 AM, Wolfgang Huber <huber at ebi.ac.uk> wrote:
>>>> >
>>>> > > Dear Paul,
>>>> > >
>>>> > >
>>>> > > > Hmm interesting. I might try introducing the extra columns into the
>>>> > > > files and specifying all the values as 0. I can't see why that
>>>> > > > shouldn't work?
>>>> > > >
>>>> > >
>>>> > > It might, but Narendra's suggestion of reading the limma users guide is
>>>> a
>>>> > > worthwhile option to consider.
>>>> > >
>>>> > >  Best wishes
>>>> > >       Wolfgang
>>>> > >
>>>> > > ------------------------------------------------------------------
>>>> > > Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>>>> > >
>>>> > >
>>>> > > >
>>>> > > > -Paul
>>>> > > >
>>>> > > > On Wed, May 7, 2008 at 1:39 PM, Narendra Kaushik
>>>> > > > <kaushiknk at cardiff.ac.uk> wrote:
>>>> > > >
>>>> > > > >
>>>> > > > > You can specify your red channel like this:
>>>> > > > >
>>>> > > > >  RG <- read.maimages(files,source="genepix",  columns=list(R="F635
>>>> > > > > Median",G="F532
>>>> > > > >  Median",Rb="B635",Gb="B532"))
>>>> > > > >
>>>> > > > >  I will suggest you read limma guide.
>>>> > > > >
>>>> > > > >  But I think your have data from Imagene package which gives one
>>>> file for
>>>> > > > > each channel, you can:
>>>> > > > >
>>>> > > > >  files <- targets[,c("FileNameCy3","FileNameCy5")]
>>>> > > > >  RG <- read.maimages(files, source="imagene")
>>>> > > > >
>>>> > > > >  Hope, this helps
>>>> > > > >
>>>> > > > >  Narendra
>>>> > > > >
>>>> > > > > >>> "Paul Geeleher" <paulgeeleher at gmail.com> 07/05/2008 13:24:01 >>>
>>>> > > > >
>>>> > > > >
>>>> > > > > Hi Deepayan,
>>>> > > > >
>>>> > > > >  Thanks for your reply. I suppose my main concern is how I should
>>>> read
>>>> > > > >  in the data initially in order to be able to use the normal tools
>>>> to
>>>> > > > >  analyze the data. Reading the data normally like this:
>>>> > > > >
>>>> > > > >  RG <- read.maimages( files, source="genepix")
>>>> > > > >
>>>> > > > >  Gives the following error:
>>>> > > > >
>>>> > > > >  Error in RG[[a]][, i] <- obj[, columns[[a]]] :
>>>> > > > >  number of items to replace is not a multiple of replacement length
>>>> > > > >
>>>> > > > >
>>>> > > > >  I'm assuming this is down to the fact that the files only contain
>>>> > > > >  intensity data for one color rather than two?
>>>> > > > >
>>>> > > > >  How should I go about reading the data?
>>>> > > > >
>>>> > > > >  Thanks alot,
>>>> > > > >
>>>> > > > >  -Paul.
>>>> > > > >
>>>> > > > >  On Tue, May 6, 2008 at 10:15 PM, Deepayan Sarkar
>>>> > > > >  <deepayan.sarkar at gmail.com> wrote:
>>>> > > > > > On 5/6/08, Paul Geeleher <paulgeeleher at gmail.com> wrote:
>>>> > > > > > > Dear Members,
>>>> > > > > > >
>>>> > > > > > >  I've inherited a bunch of GenePix files from an miRNA
>>>> experiment.
>>>> > > > > They
>>>> > > > > > >  are single color arrays, ( as opposed to 2 color as is the norm
>>>> > > > > for
>>>> > > > > > >  GenePix I think). There is a subset of 7 arrays and I wish to
>>>> > > > > compare
>>>> > > > > > >  a group of 4 of these to the other group of 3 and analyze
>>>> > > > > differential
>>>> > > > > > >  expression between the two groups. I was hoping somebody could
>>>> > > > > point
>>>> > > > > > >  me in the right direction of how I'd go about doing this with
>>>> > > > > > >  Bioconductor? Is it possible using the Limma package? Is there
>>>> any
>>>> > > > > > >  code out there to assist me?
>>>> > > > > > >
>>>> > > > > > >  I've experience in analyzing Affymetrix data using Limma and
>>>> PUMA,
>>>> > > > > but
>>>> > > > > > >  not GenePix, and the Limma Users Guide seems to focus on
>>>> analyzing
>>>> > > > > two
>>>> > > > > > >  dye experiments.
>>>> > > > > >
>>>> > > > > >  Any analysis ultimately boils down to some sort of normalization,
>>>> and
>>>> > > > > >  the actual differential expression analysis. The second part in
>>>> limma
>>>> > > > > >  (lmFit, etc.) can work with any expression matrix, irrespective
>>>> of
>>>> > > > > >  whether it's 2-color or 1-color (or affy).
>>>> > > > > >
>>>> > > > > >  We have been working with a miRNA array dataset recently, and we
>>>> used
>>>> > > > > >  limma to read in the GPR files and do the differential expression
>>>> > > > > >  analysis (on one channel). For normalization, many of the
>>>> standard
>>>> > > > > >  microarray algorithms probably don't make much sense, but VSN
>>>> seems
>>>> > > > > to
>>>> > > > > >  work fine.
>>>> > > > > >
>>>> > > > > >  We don't really have code (beyond what's already in limma and
>>>> vsn)
>>>> > > > > >  that is generally useful; most of the work is in figuring out
>>>> which
>>>> > > > > >  rows are of interest (i.e., those representing human miRNAs),
>>>> > > > > >  combining the replicates (you seem to have four of each), etc.
>>>> I'm
>>>> > > > > >  happy to give you more details if you are interested.
>>>> > > > > >
>>>> > > > > >  -Deepayan
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>