[BioC] How can I remove control probesets from the expressionset object in gene expression analysis with Affy Human Gene 1.0ST microarray
James W. MacDonald
jmacdon at med.umich.edu
Tue Jun 21 22:37:50 CEST 2011
Hi Virginia,
On 6/21/2011 6:46 AM, Virginia Garcia wrote:
> Dear list,
>
> I am quite new to R as well as to microarray analysis.
> I am dealing with some gene expression analysis performed on Affymetrix Human
> Gene 1.0ST microarray.
>
> So far, I have learnt how to filtrate data using genefilter using nsFilter
> functions.
>
> Now, I would like to know how to filter out from the expressionset object all
> the control probesets (~4000) that Affymetrix includes in the microarray (for
> quality assay, normalization, background correction, etc.). However, none of
> the aforementioned functions worked for me.
>
> How can I recognize those probesets and remove them? I would like to filter
> them out before statistical analysis with limma package.
How much do you like database stuff? Lots? Great, I have some fun for you.
Assuming you have pd.hugene.1.0.st.v1 installed (I have 1.1 installed,
but the queries will be the same).
> library(pd.hugene.1.1.st.v1)
First, get a connection to the database
> con <- db(pd.hugene.1.1.st.v1)
Now, what's in this thing?
> dbListTables(con)
[1] "chrom_dict" "core_mps" "featureSet" "level_dict" "pmfeature"
[6] "table_info" "type_dict"
OK, let's dig.
> dbGetQuery(con, "select * from pmfeature limit 5;")
fid fsetid atom x y
1 704656 7892501 1 765 711
2 1060101 7892501 2 800 1070
3 1046459 7892501 3 28 1057
4 403586 7892501 4 655 407
5 473527 7892502 5 306 478
Boring.
> dbGetQuery(con, "select * from featureSet limit 5;")
fsetid strand start stop transcript_cluster_id exon_id crosshyb_type
level
1 7892501 NA 0 0 0 0 0
NA
2 7892502 NA 0 0 0 0 0
NA
3 7892503 NA 0 0 0 0 0
NA
4 7892504 NA 0 0 0 0 0
NA
5 7892505 NA 0 0 0 0 0
NA
chrom type
1 NA 6
2 NA 7
3 NA 7
4 NA 7
5 NA 7
Maybe more interesting. What's this 'type' business?
> dbGetQuery(con, "select * from type_dict limit 5;")
type type_id
1 1 main
2 2 control->affx
3 3 control->chip
4 4 control->bgp->antigenomic
5 5 control->bgp->genomic
Now that looks like some reasonable info. What different types are there?
> dbGetQuery(con, "select * from type_dict;")
type type_id
1 1 main
2 2 control->affx
3 3 control->chip
4 4 control->bgp->antigenomic
5 5 control->bgp->genomic
6 6 normgene->exon
7 7 normgene->intron
8 8 rescue->FLmRNA->unmapped
So it looks like pretty much everything but type 1 are controls of some
type.
> tab <- dbGetQuery(con, "select * from featureSet;")
> table(tab$type)
1 2 4 6 7
253002 57 45 1195 2904
So that's about 4200 control probes (2,4,6,7).
How to subset from here depends on the package you are using for
analysis (oligo, affy, xps), so I won't go into that. But you can now
get the IDs of the probesets you care about and use them to filter.
Best,
Jim
>
> Thank you very much in advance for your help.
>
> Virginia.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list