[R] Help with dplyr

peter dalgaard pdalgd at gmail.com
Fri Nov 6 01:44:49 CET 2015


> On 06 Nov 2015, at 00:59 , Axel Urbiz <axel.urbiz at gmail.com> wrote:
> 
> Hello, 
> 
> Is there a way to avoid the warning below in dplyr. I’m performing an operation within groups, and the warning says that the factors created from each group do not have the same levels, and so it coerces the factor to character. I’m using this inside a package I’m developing. I’d appreciate your recommendation on how to handle this.

Well, what did you intend? If you cut according to quantiles, the levels of the result will reflect the value of the quantiles, as in

> y <- runif(10)
> cut(y, quantile(y,c(0,.25,.5,.75, 1)), include.lowest=T)
 [1] (0.65,0.765]  [0.108,0.281] [0.108,0.281] (0.65,0.765]  (0.281,0.528]
 [6] [0.108,0.281] (0.528,0.65]  (0.281,0.528] (0.65,0.765]  (0.528,0.65] 
Levels: [0.108,0.281] (0.281,0.528] (0.528,0.65] (0.65,0.765]

If you do it in different groups, the quantiles will differ, hence the factor levels too. Concatenating the resulting factors will get you in trouble.

If you don't mind losing the information about that the quantile intervals are, you could consider standardizing the levels with somthing like levels(bin$bin) <- 1:nBins.

-pd

> 
> library(dplyr)
> 
> set.seed(4)
> df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = c("model1", "model2")))
> 
> create_bins <- function (pred, nBins) {
>  Breaks <- unique(quantile(pred, probs = seq(0, 1, 1/nBins)))
>  bin <- data.frame(pred = pred, bin = cut(pred, breaks = Breaks, include.lowest = TRUE))
>  bin
> }
> 
> res_dplyr <- df %>% group_by(models) %>% do(create_bins(.$pred, 10))
> Warning message:
>  In rbind_all(out[[1]]) : Unequal factor levels: coercing to character
> 
> Thank you,
> Axel.
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list