[R] Delete the first instances of the unique values of a vector in R

Rui Barradas ruipbarradas at sapo.pt
Wed Jan 11 18:31:54 CET 2017


Hello,

Just see the following.

x <- scan(text = "
1
4
4
4
4
4
4
6
6")
dat <- data.frame(x, y = rnorm(length(x)))

dat[-which(c(TRUE, dat$x[-1] != dat$x[-length(dat$x)])), ]

And now instead of 'dat' call your dataset 'rwrdatafile', and the same 
for the column of interess.

Hope this helps,

Rui Barradas

Em 10-01-2017 21:39, Tunga Kantarcı escreveu:
> Consider a data frame which I name as rwrdatafile. It includes several
> variables stored in columns. For each variable there are 1000
> observations and hence 1000 rows. The interest lies in the values of
> the second column of this data frame, that is in rwrdatafile[,2]. What
> I am trying to accomplish is to delete the rows of the data frame if
> it is the first instance of a unique value in rwrdatafile[,2]. That
> is, the values stored in rwrdatafile[,2] look like
>
> 1
> 4
> 4
> 4
> 4
> 4
> 4
> 6
> 6
>
> and the routine should delete 1 (and the other values in that row),
> the first 4 (and the other values in that row), and the first 6 (and
> the other values in that row). I did an online search, and indeed
> there are similar examples, but they did not help for what I am trying
> to achieve. What is specific to what I am trying to achieve is that
> the routine should use a for loop. I have written a routine that is
> not using a for loop and it works fine and I paste it below
> (Vector-oriented coding in R). I need to write a for loop that
> accomplishes the same task. In fact, I have written this for loop but
> it has a problem (Scalar-oriened coding in R pasted below). Note that
> the data stored in rwrdatafile[,2] has three unique values (there are
> more but for making the example that does not matter) which are 1, 4,
> 6. The for loop I have written first determines the number of unique
> values in rwrdatafile[,2], with length(unique(rwrdatafile[,2])), and
> uses that number in the sequence of the for loop. The length is 3 so
> the sequence is 1:3. But there is a catch! When 1 is deleted (and
> other values row wise), the length decreases to 2 but the for loop
> attempts 3 and therefore it returns NULL at the end of the loop.
> Therefore I subtract 1 from the length. But this is not good coding. I
> wondered about the NULL result and it took me a while to figure out
> the problem, and worse is that I could have never found the problem.
> So the for loop here is not reliable because it requires that the user
> knows that there are multiple instances of the unique values (so
> multiple instances of 1). How can I fix the problem? The restriction I
> have is that I need to keep the for loop and it should resemble the
> for loop I have written for MATLAB (pasted below). The aim is to
> translate the MATLAB routine as close as possible in R. So I do not
> want to deviate (much) from the MATLAB version of the code because
> otherwise I cannot compare the routines while I am teaching this. That
> is, I need to use a function in the for loop in R that is as close as
> possible to the find function (with the first option) of MATLAB.
>
> # Scalar-oriented coding in R
> length(unique(rwrdatafile[,2]))
> for (i in 1:(.Last.value-1)){
>    rwrdatafile = rwrdatafile[-(which(rwrdatafile[,2] ==
> unique(rwrdatafile[,2])[i])[1]),]
> }
>
> # Vector-oriented coding in R
> unique(rwrdatafile[,2])
> tag = match(.Last.value,rwrdatafile[,2])
> rwrdatafile = rwrdatafile[!row.names(rwrdatafile) %in% tag,]
>
> # Scalar-oriented coding in MATLAB
> unique(mwmatfile.data(:,2));
> for i = ans'
>      mwmatfile.data(find(mwmatfile.data(:,2) == i,1,'first'),:) = [];
> end
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list