[R] use table function with data frame subsets
David L Carlson
dcarlson at tamu.edu
Mon Feb 20 22:27:28 CET 2017
The default for read.csv() is stringsAsFactors=TRUE when creating a data frame so all the character strings in your .csv file were converted to factors:
> testtable <- read.csv("clipboard", header=F)
> str(testtable)
'data.frame': 6 obs. of 5 variables:
$ V1: int 20170101 20170101 20170101 20170102 20170102 20170102
$ V2: int 10020 10020 10020 20001 20001 20001
$ V3: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 3 4 5
$ V4: Factor w/ 4 levels "a","b","d","m": 2 2 3 3 4 1
$ V5: Factor w/ 2 levels "N","Y": 2 1 2 2 2 2
When you subset a data frame, the empty factor levels are not automatically removed:
> testtablea<-testtable[grep('^10',testtable[,2]),]
> str(testtablea)
'data.frame': 3 obs. of 5 variables:
$ V1: int 20170101 20170101 20170101
$ V2: int 10020 10020 10020
$ V3: Factor w/ 5 levels "A","B","C","D",..: 1 2 3
$ V4: Factor w/ 4 levels "a","b","d","m": 2 2 3
$ V5: Factor w/ 2 levels "N","Y": 2 1 2
To drop the missing levels from all of the factors, use the droplevels() function:
> testtablea <- droplevels(testtablea)
> str(testtablea)
'data.frame': 3 obs. of 5 variables:
$ V1: int 20170101 20170101 20170101
$ V2: int 10020 10020 10020
$ V3: Factor w/ 3 levels "A","B","C": 1 2 3
$ V4: Factor w/ 2 levels "b","d": 1 1 2
$ V5: Factor w/ 2 levels "N","Y": 2 1 2
> table(testtablea[,4],testtablea[,5])
N Y
b 1 1
d 0 1
OR use stringsAsFactors=FALSE with read.csv() when you create the original data frame:
> testtable <- read.csv("clipboard", header=F, stringsAsFactors=FALSE)
> str(testtable)
'data.frame': 6 obs. of 5 variables:
$ V1: int 20170101 20170101 20170101 20170102 20170102 20170102
$ V2: int 10020 10020 10020 20001 20001 20001
$ V3: chr "A" "B" "C" "C" ...
$ V4: chr "b" "b" "d" "d" ...
$ V5: chr "Y" "N" "Y" "Y" ...
> testtablea<-testtable[grep('^10',testtable[,2]),]
> table(testtablea[,4],testtablea[,5])
N Y
b 1 1
d 0 1
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of message
Sent: Monday, February 20, 2017 3:10 PM
To: r-help at r-project.org
Subject: [R] use table function with data frame subsets
Readers,
Data set:
20170101,10020,A,b,Y
20170101,10020,B,b,N
20170101,10020,C,d,Y
20170102,20001,C,d,Y
20170102,20001,D,m,Y
20170102,20001,L,a,Y
testtable<-read.csv('~/tmp/data.csv',header=F)
testtablea<-testtable[grep('^10',testtable[,2]),]
> testtable
V1 V2 V3 V4 V5
1 20170101 10020 A b Y
2 20170101 10020 B b N
3 20170101 10020 C d Y
4 20170102 20001 C d Y
5 20170102 20001 D m Y
6 20170102 20001 L a Y
> testtablea
V1 V2 V3 V4 V5
1 20170101 10020 A b Y
2 20170101 10020 B b N
3 20170101 10020 C d Y
> table(testtable[,4],testtable[,5])
N Y
a 0 1
b 1 1
d 0 2
m 0 1
> table(testtablea[,4],testtablea[,5])
N Y
a 0 0
b 1 1
d 0 1
m 0 0
Wy do values for rows beginning 'a' and 'm' appear when they do not
satisfy the regular expression for the object 'testtablea'?
Please, how to use the 'table' function to show:
> table(testtablea[,4],testtablea[,5])
N Y
b 1 1
d 0 1
Thanks.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list