[R] Data Manipulation using R

Wed Apr 18 03:09:26 CEST 2007

On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote:

> Dear Friends,
>
> I have data set with around 220,000 rows and 17 columns. One of the  
> columns is an id variable which is grouped from 1000 through 9000.  
> I need to perform the following operations.
>
> 1) Remove all the observations with id's between 6000 and 6999
>
> I tried using this method.
>
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
>
> I check the last and first entry and found that it did not have ID  
> values 6000. Therefore I think that this might be correct, but is  
> this the most efficient way of doing this?
>
The rbind is a bit unnecessary probably.

I think all you are missing for both questions is the "or" operator,   
"|".  ( ?"|" )

Simply:

donedat <- subset(data, ID< 6000 | ID >=7000)

would do for this. Not sure about efficiency, but if the code is fast  
as it stands I wouldn't worry too much about it.

> 2) I need to remove observations within columns 3, 4, 6 and 8 when  
> they are negative. For instance if the number in column 3 is -4,  
> then I need to delete the entire observation. Can somebody help me  
> with this too.

The following should do it (untested, not sure if it would handle NA's):

toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8] < 0
data[!toremove,]

If you want more columns than those 4, then we could perhaps look for  
a better line than the first line above.

> Thank and Regards
>
> Anup

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College