[R] Data Manipulation using R
Charilaos Skiadas
skiadas at hanover.edu
Wed Apr 18 03:09:26 CEST 2007
On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote:
> Dear Friends,
>
> I have data set with around 220,000 rows and 17 columns. One of the
> columns is an id variable which is grouped from 1000 through 9000.
> I need to perform the following operations.
>
> 1) Remove all the observations with id's between 6000 and 6999
>
> I tried using this method.
>
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
>
> I check the last and first entry and found that it did not have ID
> values 6000. Therefore I think that this might be correct, but is
> this the most efficient way of doing this?
>
The rbind is a bit unnecessary probably.
I think all you are missing for both questions is the "or" operator,
"|". ( ?"|" )
Simply:
donedat <- subset(data, ID< 6000 | ID >=7000)
would do for this. Not sure about efficiency, but if the code is fast
as it stands I wouldn't worry too much about it.
> 2) I need to remove observations within columns 3, 4, 6 and 8 when
> they are negative. For instance if the number in column 3 is -4,
> then I need to delete the entire observation. Can somebody help me
> with this too.
The following should do it (untested, not sure if it would handle NA's):
toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8] < 0
data[!toremove,]
If you want more columns than those 4, then we could perhaps look for
a better line than the first line above.
> Thank and Regards
>
> Anup
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College
More information about the R-help
mailing list