[BioC] Help with sub setting data frame of DE genes
Ochsner, Scott A
sochsner at bcm.tmc.edu
Fri Apr 4 17:31:34 CEST 2008
Dear list,
I have a data frame with three columns. First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat:
hgu133a ID Gene Symbol Combined p-value
217757_at A2M 0.787923912
214440_at NAT1 0.240689023
206797_at NAT2 0.497092074
202376_at SERPINA3 3.88E-13
Etc....
I would like to end up with a data frame where each row is a unique Gene Symbol. In the case of multiple gene symbols I want to include the row with the lowest Combined p-value. The above case would transform into:
hgu133a ID Gene Symbol Combined p-value
217757_at A2M 0.787923912
214440_at NAT1 0.240689023
202376_at SERPINA3 3.88E-13
Etc....
Could someone point me to a function which would help me in this regard? If this is more of an R mailing list post I apologize and will post there.
Thanks,
> sessionInfo()
R version 2.6.0 (2007-10-03)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] splines tools stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] lumi_1.4.0 mgcv_1.3-29 affycoretools_1.10.0
[4] annaffy_1.10.0 KEGG_2.0.0 GO_2.0.0
[7] gcrma_2.10.0 matchprobes_1.10.0 biomaRt_1.12.0
[10] RCurl_0.8-1 GOstats_2.4.0 Category_2.4.0
[13] genefilter_1.16.0 survival_2.32 RBGL_1.14.0
[16] annotate_1.16.0 xtable_1.5-1 GO.db_2.0.0
[19] AnnotationDbi_1.0.4 RSQLite_0.6-3 DBI_0.2-3
[22] graph_1.16.1 affy_1.16.0 preprocessCore_1.0.0
[25] affyio_1.6.0 Biobase_1.16.0 limma_2.12.0
loaded via a namespace (and not attached):
[1] cluster_1.11.10 XML_1.93-2.2
Scott A. Ochsner, Ph.D.
NURSA Bioinformatics
Molecular and Cellular Biology
Baylor College of Medicine
Houston, TX. 77030
phone: 713-798-6227
More information about the Bioconductor
mailing list