[BioC] pd.mapping250k.sty package: featureSet:fragment_length

James W. MacDonald jmacdon at med.umich.edu
Fri Sep 17 18:58:42 CEST 2010


Hi Julie,

On 9/17/2010 9:53 AM, Zhu, Julie wrote:
> Hi,
>
> Could someone please tell me whether the fragment_length in the featureSet
> of pd.mapping250k.sty is the fragment_length of the sample? Are there
> documentations available for looking up the meanings of each field?

The fragment_length is the length of the restriction fragment. You could 
hypothetically have figured this out yourself by comparing the fragment 
length to the data on the netaffx site. Unfortunately, it looks like the 
current version of the pd.mapping250k.sty package is out of date when 
compared to what netaffx has, as the fragment length data for these two 
probesets don't agree.

This is not true of the pd.genomewidesnp.6 package, which is what I have 
installed. So for instance,

 > dbGetQuery(con, "select fragment_length, fragment_length2, man_fsetid 
  from featureSet limit 10;")
    fragment_length fragment_length2    man_fsetid
1              395              217 SNP_A-2131660
2               NA              702 SNP_A-1967418
3              633              883 SNP_A-1969580
4              831              399 SNP_A-4263484
5              970              611 SNP_A-1978185
6             1508              711 SNP_A-4264431
7               NA              921 SNP_A-1980898
8               NA              243 SNP_A-1983139
9               NA              194 SNP_A-4265735
10             420              858 SNP_A-1995832

the fragment_length and fragment_length2 data here do agree (well, at 
least the two I checked agree ;-P) with netaffx.

As for the other field names, most seem clear to me. Is there one in 
particular that is not clear?

>
> Some rows have NAs for most the fields even though the allele information is
> known, is this expected?

It is expected, depending on when the package was built. We are simply 
taking data from Affymetrix and re-packaging into an object that is 
easier to use, so we are dependent on the data we get from Affy. Since 
annotation of genetic data is a moving target, things are always changing.

We only build these packages on a semi-annual basis, so we end up out of 
date quite quickly. This is a tradeoff between having the most 
up-to-date data, and having stable data packages that people can rely on.

We do provide the functionality to build your own, so if you desire the 
most up-to-date package, you can build a personal package using the 
pdInfoBuilder package.

Best,

Jim


>
> Thanks so much for your help!
>
> library("pd.mapping250k.sty")
> con = db(pd.mapping250k.sty)
> dbListFields(con, "featureSet")
>   [1] "fsetid"          "man_fsetid"      "dbsnp_rs_id"     "chrom"
>   [5] "physical_pos"    "strand"          "cytoband"        "allele_a"
>   [9] "allele_b"        "gene_assoc"      "fragment_length" "dbsnp"
> [13] "cnv"
>
> dbGetQuery(con, "select * from featureSet order by fsetid desc limit 2")
>    fsetid    man_fsetid dbsnp_rs_id chrom physical_pos strand cytoband
> allele_a allele_b
> 1 238378 SNP_A-4301986   rs6989223     8      5214036      -    p23.2
> A        G
> 2 238377 SNP_A-2291495  rs11644392<NA>            NA<NA>      <NA>
> A        G
>    fragment_length dbsnp
> 1            1667     0
> 2              NA    NA
>
>
> Best regards,
>
> Julie
>
> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] pd.mapping250k.sty_1.0.0 RSQLite_0.9-2            DBI_0.2-5
> [4] oligo_1.12.2             oligoClasses_1.10.0      Biobase_2.8.0
> [7] affxparser_1.20.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0         Biostrings_2.16.9     IRanges_1.6.11
> preprocessCore_1.10.0
> [5] splines_2.11.1        tools_2.11.1
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list