[R] problem with merge
Mark W Kimpel
mwkimpel at gmail.com
Tue Mar 18 01:40:47 CET 2008
I have used merge regularly and thought I understood how it worked, but
I must not. I have two dataframes with identical colnames from two
different experiments, TL01 and LC01. Each dataframe has a column named
"Entrez.Gene", which I have converted to "as.character" just to make
sure merge is not looking at factor levels. Because I have done some
filtering, the Entrez.Gene values in each experiment overlap but are not
identical. I want to produce a summary report with only those
identifiers found in each experiment. I could do this with intersect and
matching, but I thought merge could easily do this.
Below is my code and sessionInfo. For some reason there are over twice
as many rows as I would expect. I can't quite figure out which arguments
I have screwed up. What am I missing? It has to be something simple, I'm
just not seeing it. Thanks, Mark
> TL01.LC01.data <- merge(TL01.data, LC01.data, by = "Entrez.Gene",
all.x = FALSE, all.y = FALSE, suffixes = c(".TL01",".LC01"))
> length(intersect(TL01.data$Entrez.Gene, LC01.data$Entrez.Gene))
[1] 13401
> dim(TL01.LC01.data)
[1] 29471 57
> dim(TL01.data)
[1] 16479 29
> dim(LC01.data)
[1] 16479 29
--
> sessionInfo()
R version 2.7.0 Under development (unstable) (2008-03-05 r44683)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] splines tools stats graphics grDevices datasets utils
[8] methods base
other attached packages:
[1] affycoretools_1.11.4 annaffy_1.11.5 KEGG.db_2.1.3
[4] gcrma_2.11.4 matchprobes_1.11.1 biomaRt_1.13.9
[7] RCurl_0.8-3 GOstats_2.5.2 Category_2.5.7
[10] genefilter_1.17.12 survival_2.34 RBGL_1.15.7
[13] annotate_1.17.11 xtable_1.5-2 GO.db_2.1.3
[16] AnnotationDbi_1.1.26 RSQLite_0.6-8 DBI_0.2-4
[19] graph_1.17.17 limma_2.13.6 affy_1.17.9
[22] preprocessCore_1.1.5 affyio_1.7.15 Biobase_1.99.2
loaded via a namespace (and not attached):
[1] cluster_1.11.10 XML_1.93-2
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)
mwkimpel<at>gmail<dot>com
More information about the R-help
mailing list