[BioC] pd.mapping250k.sty package: featureSet:fragment_length
Julie.Zhu at umassmed.edu
Fri Sep 17 19:25:28 CEST 2010
Thank you very much for the detailed information! It all makes sense.
On 9/17/10 12:58 PM, "James W. MacDonald" <jmacdon at med.umich.edu> wrote:
> Hi Julie,
> On 9/17/2010 9:53 AM, Zhu, Julie wrote:
>> Could someone please tell me whether the fragment_length in the featureSet
>> of pd.mapping250k.sty is the fragment_length of the sample? Are there
>> documentations available for looking up the meanings of each field?
> The fragment_length is the length of the restriction fragment. You could
> hypothetically have figured this out yourself by comparing the fragment
> length to the data on the netaffx site. Unfortunately, it looks like the
> current version of the pd.mapping250k.sty package is out of date when
> compared to what netaffx has, as the fragment length data for these two
> probesets don't agree.
> This is not true of the pd.genomewidesnp.6 package, which is what I have
> installed. So for instance,
>> dbGetQuery(con, "select fragment_length, fragment_length2, man_fsetid
> from featureSet limit 10;")
> fragment_length fragment_length2 man_fsetid
> 1 395 217 SNP_A-2131660
> 2 NA 702 SNP_A-1967418
> 3 633 883 SNP_A-1969580
> 4 831 399 SNP_A-4263484
> 5 970 611 SNP_A-1978185
> 6 1508 711 SNP_A-4264431
> 7 NA 921 SNP_A-1980898
> 8 NA 243 SNP_A-1983139
> 9 NA 194 SNP_A-4265735
> 10 420 858 SNP_A-1995832
> the fragment_length and fragment_length2 data here do agree (well, at
> least the two I checked agree ;-P) with netaffx.
> As for the other field names, most seem clear to me. Is there one in
> particular that is not clear?
>> Some rows have NAs for most the fields even though the allele information is
>> known, is this expected?
> It is expected, depending on when the package was built. We are simply
> taking data from Affymetrix and re-packaging into an object that is
> easier to use, so we are dependent on the data we get from Affy. Since
> annotation of genetic data is a moving target, things are always changing.
> We only build these packages on a semi-annual basis, so we end up out of
> date quite quickly. This is a tradeoff between having the most
> up-to-date data, and having stable data packages that people can rely on.
> We do provide the functionality to build your own, so if you desire the
> most up-to-date package, you can build a personal package using the
> pdInfoBuilder package.
>> Thanks so much for your help!
>> con = db(pd.mapping250k.sty)
>> dbListFields(con, "featureSet")
>>  "fsetid" "man_fsetid" "dbsnp_rs_id" "chrom"
>>  "physical_pos" "strand" "cytoband" "allele_a"
>>  "allele_b" "gene_assoc" "fragment_length" "dbsnp"
>>  "cnv"
>> dbGetQuery(con, "select * from featureSet order by fsetid desc limit 2")
>> fsetid man_fsetid dbsnp_rs_id chrom physical_pos strand cytoband
>> allele_a allele_b
>> 1 238378 SNP_A-4301986 rs6989223 8 5214036 - p23.2
>> A G
>> 2 238377 SNP_A-2291495 rs11644392<NA> NA<NA> <NA>
>> A G
>> fragment_length dbsnp
>> 1 1667 0
>> 2 NA NA
>> Best regards,
>> R version 2.11.1 (2010-05-31)
>>  en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>> attached base packages:
>>  stats graphics grDevices utils datasets methods base
>> other attached packages:
>>  pd.mapping250k.sty_1.0.0 RSQLite_0.9-2 DBI_0.2-5
>>  oligo_1.12.2 oligoClasses_1.10.0 Biobase_2.8.0
>>  affxparser_1.20.0
>> loaded via a namespace (and not attached):
>>  affyio_1.16.0 Biostrings_2.16.9 IRanges_1.6.11
>>  splines_2.11.1 tools_2.11.1
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
More information about the Bioconductor