[R] MASS fitdistr with plyr or data.table?
Dennis Murphy
djmuser at gmail.com
Thu Apr 28 01:31:22 CEST 2011
Hi:
Here's one way to do this with plyr and data.table.
# plyr
As Hadley inferred, when using ddply(), it's convenient to write a
function for a generic (sub-)data frame and have it return a data
frame. Here's my function and call:
f <- function(d) {
require(MASS)
est <- fitdistr(d$wind_speed, 'weibull')$estimate
data.frame(shape = est[1], scale = est[2])
}
Notice that f() takes a data frame d as input, uses the wind_speed
component of d to fit the Weibull, and returns a data frame with names
for the estimates.
> ddply(weib.test.too, 'site', f)
site shape scale
1 1 3.063853 8.049467
2 2 2.945982 8.067252
3 3 2.879392 7.999636
4 4 3.097191 8.084453
5 5 3.091117 8.012450
6 6 2.943254 7.912792
7 7 2.957455 7.947545
8 8 2.975732 7.901587
9 9 3.045563 8.061838
10 10 2.995324 8.056820
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In dweibull(x, shape, scale, log) : NaNs produced
<snip the other eight>
## data.table
In writing a function for data.table, you want a variable as input and
a list as output. We therefore modify the above function slightly:
g <- function(x) {
require(MASS)
est <- fitdistr(x, 'weibull')$estimate
list(shape = est[1], scale = est[2])
}
library(data.table)
weib.test.dt <- data.table(weib.test.too, key = 'site')
> weib.test.dt[, g(wind_speed), by = site]
site shape scale
[1,] 1 3.063853 8.049467
[2,] 2 2.945982 8.067252
[3,] 3 2.879392 7.999636
[4,] 4 3.097191 8.084453
[5,] 5 3.091117 8.012450
[6,] 6 2.943254 7.912792
[7,] 7 2.957455 7.947545
[8,] 8 2.975732 7.901587
[9,] 9 3.045563 8.061838
[10,] 10 2.995324 8.056820
## <warnings snipped>
HTH,
Dennis
On Wed, Apr 27, 2011 at 1:55 PM, Justin Haynes <jtor14 at gmail.com> wrote:
> I am trying to extract the shape and scale parameters of a wind speed
> distribution for different sites. I can do this in a clunky way, but
> I was hoping to find a way using data.table or plyr. However, when I
> try I am met with the following:
>
> set.seed(144)
> weib.dist<-rweibull(10000,shape=3,scale=8)
> weib.test<-data.table(cbind(1:10,weib.dist))
> names(weib.test)<-c('site','wind_speed')
>
> fitted<-weib.test[,fitdistr(wind_speed,'weibull'),by=site]
>
> Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
> invalid to set the class to matrix unless the dimension attribute is
> of length 2 (was 0)
> In addition: Warning messages:
> 1: In dweibull(x, shape, scale, log) : NaNs produced
> ...
> 10: In dweibull(x, shape, scale, log) : NaNs produced
>
> (the warning messages are normal from what I can tell)
>
> or using plyr:
>
> set.seed(144)
> weib.dist<-rweibull(10000,shape=3,scale=8)
> weib.test.too<-data.frame(cbind(1:10,weib.dist))
> names(weib.test.too)<-c('site','wind_speed')
>
> fitted<-ddply(weib.test.too,.(site),fitdistr,'weibull')
>
> Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector
>
> those sound like similar errors to me, but I can't figure out how to
> make them go away!
>
> to prove I'm not crazy:
>
> fitdistr(weib.dist,'weibull')$estimate
> shape scale
> 2.996815 8.009757
> Warning messages:
> 1: In dweibull(x, shape, scale, log) : NaNs produced
> 2: In dweibull(x, shape, scale, log) : NaNs produced
> 3: In dweibull(x, shape, scale, log) : NaNs produced
> 4: In dweibull(x, shape, scale, log) : NaNs produced
>
> Thanks
>
> Justin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list