[BioC] problems with pd.genomewidesnp.6
MacDonald, James
jmacdon at med.umich.edu
Wed Dec 21 14:56:44 CET 2011
Hi Sebastian,
On 12/20/11 6:01 PM, Sebastian Thieme wrote:
> Hi at all,
>
> I have some problems with the pd.genomewidesnp.6 package and I hope
> some one can help me. The info with
> get(objects("package:pd.genomewidesnp.6")) is
>
> #Class........: AffySNPCNVPDInfo
> #Manufacturer.: Affymetrix
> #Genome Build.: HG19
> #Chip Geometry: 2572 rows x 2680 columns
>
> I want match the man_festid of each prob to one gene, therefore I look
> in the gene_assoc part and call the gene with minimum distance to the
> respective prob as corresponding gene. My commands for get the raw
> informations are:
>
> snp.f<- dbGetQuery(con6, "select * from featureSet")
> snp.f<- snpfeature[,c("fsetid","man_fsetid","chrom","physical_pos","strand","cytoband","gene_assoc")]
>
> cn.f<- dbGetQuery(con6, "select * from featureSetCNV")
> cn.f<- cn.f[,c("fsetid","man_fsetid","chrom","chrom_start","strand","cytoband","gene_assoc")]
>
> snp6.f<- rbind(snp.f,cn.f)
>
> and process the gene_assoc part. Now the problem within the gene_assoc
> part is that there are genes which are not on the same chromosome as
> the respective probs e.g.
>
> fsetid man_fsetid chrom physical_pos strand cytoband
> 650443 CN_618877 12 93793083 - q22
> gene_assoc
> ENST00000358888 // upstream // 315610 // Hs.112553 // RPL41 // 6171
> //ribosomal protein L41 /// ENST00000318066 // downstream // 8981 //
> Hs.524630 // UBE2N // 7334 // ubiquitin-conjugating enzyme E2N (UBC13
> homolog, yeast) /// NR_002212 // exon // 0 // --- // NUDT4P1 // 440672
> // nudix (nucleoside diphosphate linked moiety X)-type motif 4
> pseudogene 1 /// NM_199040 // CDS // 0 // Hs.506325 // NUDT4 // 11163
> // nudix (nucleoside diphosphate linked moiety X)-type motif 4
> ///NM_019094 // CDS // 0 // Hs.506325 // NUDT4 // 11163 // nudix
> (nucleoside diphosphate linked moiety X)-type motif 4
>
> gene "NUDT4P1" is annotated on Chromosome 1 not 12 and this is only
> one. An other example is
In what build is that true? UCSC claims that NUDT4 and NUDT4P1 are
overlapping, on chr12 (hg19).
Anyway, the larger point here is a discussion of what a SNP is, and how
they are localized. Essentially, a SNP is a single base that has been
found to vary with a certain frequency in a population. They are
localized by the flanking sequence, which means that in the case of a
pseudogene (which may or may not be on the same chromosome), you will
see the same flanking sequence and cannot reliably say where the SNP is
really located.
Since DNA chips work by binding to the SNP and its flanking sequence,
you cannot say whether you have measured the gene, the pseudogene, or
some combination thereof.
Listing all possibilities for the SNP location is therefore not a
'problem', it just reflects our lack of precision.
Best,
Jim
> fsetid man_fsetid chrom physical_pos strand cytoband
> 186938 SNP_A-4227519 12 31784081 - p11.21
>
> gene_assoc
> ENST00000294419 // upstream // 14576 // Hs.10862 // AK3L1 // 205 //
> adenylate kinase 3-like 1 /// ENST00000412352 // upstream // 16012 //
> Hs.585084 // C12orf72 // 254013 // chromosome 12 open reading frame 72
> /// NM_013410 // upstream // 14564 // Hs.10862 // AK3L1 // 205 //
> adenylate kinase 3-like 1 /// NM_001135864 // upstream // 16012 //
> Hs.585084 // C12orf72 // 254013 // chromosome 12 open reading frame 72
>
> AK3L1 is annotated at chromosome 9 not 12. The corresponding ensembl
> ID (ENST00000294419 ) is mapped to AK4-201 which is annotated on
> chromosome 1 . This are only two examples there are a lot more. Can
> some one help?
>
>
> best regards
>
> Basti
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list