[BioC] annotation files for agilent: a bit off-topic

Wed Jul 18 01:20:35 CEST 2007

Hi Weiwei,

Look at the $genes element from read.maimages. Hopefully it'll have what
you need.

Otherwise, you'll have to depend on the data provider. It's a custom
chip and only they will have the annotations for it.

Francois

On Tue, 2007-07-17 at 19:04 -0400, Weiwei Shi wrote:
> HI, Francois:
> first, thanks for the detailed reply.
> 
> The matching is done and only ~7700 probes out of ~10,100 are matched
> ( and I assume they start with A_)
> 
> However, some probeID are like
> > tail(x0, 10)
>  [1] "A_24_P913609" "Hs345093.1"   "A_23_P144999" "A_23_P399001"
>  [5] "A_23_P340617" "A_32_P104088" "A_32_P34372"  "A_23_P62764"
>  [9] "Hs132898.3"   "A_32_P370539"
> 
> since it is a customized array, I think they might use UnigeneID(?),
> but what's ".3"? Should it be Hs.132898? confused!
> 
> FeatureExtractor_DesignFileName gives
> D:\Array_Data\Kinder-Onko\Design Files
> KinderOnko\Custom_Final_280904\012714_d_20040819.xml
> 
> Is that right?
> 
> Be honest, I hate people providing data w/o good annotation :(
> 
> Kinda asking us to play the guessing game.
> 
> Best,
> 
> Weiwei
> 
> 
> 
> On 7/17/07, Francois Pepin <fpepin at cs.mcgill.ca> wrote:
> > Hi Weiwei,
> >
> > I'd assume the last one:
> >
> > 1st is a very old chip
> > 2nd & 3rd are for CGH, not expression
> > 4th is their basic human gene expression.
> >
> > Keep in mind that they now have 4x44 arrays that use the same non-
> > control probes but has them in different positions. If the chips were
> > purchased recently, they are likely the 4x44 ones, as they end up being
> > a lot cheaper.
> >
> > The quick and dirty way of finding out: look in the feature extraction
> > file, you'll see a column that says FeatureExtractor_DesignFileName in
> > the header. With this should be a file that looks like
> > 014868_D_F_20060807.xml. The first part (014868) says the chip type
> > (design ID, actually) while the 2nd gives the annotation release date.
> > Then go to http://www.chem.agilent.com/cag/bsp/array_list.asp and search
> > in the list. In this case, you'd see this is the 4x44 whole genome mouse
> > chip.
> >
> > There is a bioconductor package for the human whole genome chips:
> > hgug4112a. This does not include any non-control probes, so it should
> > work with both the 1x44 and 4x44.
> >
> > Also, the read.maimages should also grab the gene annotation that is
> > included in the feature extractor software. They might be out of date,
> > but it should help you to keep going.
> >
> > Hope this helps,
> >
> > Francois
> >
> > On Tue, 2007-07-17 at 17:43 -0400, Weiwei Shi wrote:
> > > I am doing the latter now b/c I don't know the answer to the first
> > > question. The data provider is sloooooowwww in reply.
> > >
> > > On 7/17/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> > > > Weiwei Shi wrote:
> > > > > Hi, there:
> > > > >
> > > > > I knew this is a bit off-topic but hope someone has knowledge to share:
> > > > >
> > > > > I found 4 zipped files about annotation from agilent:
> > > > >
> > > > > Human 1A(v2)
> > > > > Human Genome CGH 44A
> > > > > Human Genome CGH 44B
> > > > > Human Genome, Whole
> > > > >
> > > > > I assume I can use the last one for my arrays but w/o knowing the
> > > > > difference b/w them, I am not quite sure.
> > > >
> > > > You will need to find out what platform your arrays use or do some probe
> > > > ID matching between your arrays and the annotation packages.  The former
> > > > is preferred.
> > > >
> > > > Sean
> > > >
> > >
> > >
> >
> >
> 
>