[R] parts of data frames: subset vs. [-c()]
Stefan Th. Gries
stgries_lists at arcor.de
Fri Aug 26 18:17:51 CEST 2005
Dear all
I have a problem with splitting up a data frame called ReVerb:
» str(ReVerb)
`data.frame': 92713 obs. of 16 variables:
$ CHILD : Factor w/ 7 levels "ABE","ADA","EVE",..: 1 1 1 1 1 1 1 1 1 1 ...
$ AGE : Factor w/ 484 levels "1;06.00","1;06.16",..: 43 43 43 99 99 99 99 99 99 99 ...
$ AGE_Q : num 2.0 2.0 2.0 2.4 2.4 ...
$ INTERVALS: num 2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
$ RND : int 34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 ...
$ SYNTAX : Factor w/ 17 levels "Acmp","Amats",..: 15 12 8 15 7 16 7 7 16 7 ...
$ LEXICAL : Factor w/ 1643 levels "$ACHE","$ACT",..: 194 803 803 294 299 803 1562 299 679 1562 ...
$ MORPH : Factor w/ 337 levels "$","$ =inf","$ =prs",..: 9 20 9 39 184 231 57 67 231 39 ...
$ COMPLEM : Factor w/ 1989 levels "$","$ V PR=Lp [1.2]",..: 203 547 220 203 1101 368 1834 1667 368 1834 ...
$ MATRIX : Factor w/ 906 levels "$ ???","$ be PR=Aen",..: 5 5 5 308 5 856 5 5 856 308 ...
$ SITUATION: Factor w/ 9 levels "[imitation of Mom: you know what I said]",..: 2 2 2 2 2 2 2 2 2 2 ...
$ V_ANN : int 1 1 1 4 4 4 4 3 3 3 ...
$ QUEST : int 0 0 0 0 0 0 0 0 0 0 ...
$ EXCL : int 0 0 0 1 1 1 1 0 0 0 ...
$ U_LEN : int 3 4 5 13 13 13 13 8 8 8 ...
$ UTTERANCE: Factor w/ 55113 levels "","# (be)cause he wanted to .",..: 5696 39091 52180 2262 2262 2262 2262 3593 3593 3593 ...
The level causing the problem is SYNTAX:
» as.data.frame(sort(table(SYNTAX)))
sort(table(SYNTAX))
Particles 100
PR=N1 144
Amats 271
Trans_PR=A2 787
Ditrans 1181
Intrans_PR=A1 1399
Acmp 2402
Trans_PR=V2 2433
CPcmps 2769
Vpreps 4896
Intrans_V0 5182
Trans_PR=L2 7653
Trans_V02 8117
Intrans_PR=L1 8457
Intrans_V1 9643
Intrans_PR=V1 14987
Trans_V12 22288
I would like to extract all cases where SYNTAX=="Ditrans" from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code:
» ditrans<-which(SYNTAX=="Ditrans")
» ReVerb1<-ReVerb[-c(ditrans),]; dim(ReVerb1)
[1] 91532 16
»
» # ok, so the 92713-91532=1181 cases where SYNTAX=="Ditrans" have been removed, but ...
»
» ReVerb1<-subset(ReVerb, SYNTAX!="Ditrans"); dim(ReVerb1)
[1] 91528 16
»
» # ... so why don't I get 91532 again as the number of rows?
»
Any ideas??
» R.version # on Windows XP with service Pack 2
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 1.1
year 2005
month 06
day 20
language R
Thanks a lot,
STG
--
Stefan Th. Gries
----------------------------------------
Max Planck Inst. for Evol. Anthropology
http://people.freenet.de/Stefan_Th_Gries
----------------------------------------
Machen Sie aus 14 Cent spielend bis zu 100 Euro!
Die neue Gaming-Area von Arcor - über 50 Onlinespiele im Angebot.
http://www.arcor.de/rd/emf-gaming-1
More information about the R-help
mailing list