[R] Optimization algorithm to be applied to S4 classes - specifically sparse matrices

spencerg spencer.graves at prodsyse.com
Fri May 15 18:21:33 CEST 2009


Dear Doug, et al.: 


      What would you recommend for analyzing a longitudinal abundance 
survey of 22 species, when the species were not selected at random?  A 
prominent scientist tried to tell me that mixed-effects modeling is 
inappropriate in that case because the species were selected 
purposefully not at random. 


      My response is that even in that case, one should still use 
mixed-effects modeling, because it will tend to produce more appropriate 
estimates for the deviations of individual species from the average of 
all species -- potentially much lower variance with slight bias -- than 
naive ordinary least squares.  The estimated variance components will 
not represent the between-species variance for the actual population of 
all hypothetical species of the particular type, but will represent the 
between-species variability in a hypothetical population from which the 
selected species might be considered a random sample. 


      Best Wishes,
      Spencer Graves
p.s.  I appreciate very much Doug's comment on this.  I thought about 
adding something like that to my reply but didn't feel I could afford 
the time then. 


Douglas Bates wrote:
> On Wed, May 13, 2009 at 5:21 PM,  <Avraham.Adler at guycarp.com> wrote:
>   
>> Hello.
>>
>> I am trying to optimize a set of parameters using /optim/ in which the
>> actual function to be minimized contains matrix multiplication and is of
>> the form:
>>
>> SUM ((A%*%X - B)^2)
>>
>> where A is a matrix and X and B are vectors, with X as parameter vector.
>>     
>
> As Spencer Graves pointed out, what you are describing here is a
> linear least squares problem, which has a direct (i.e. non-iterative)
> solution.  A comparison of the speed of various ways of solving such a
> system is given in one of the vignettes in the Matrix package.
>
>   
>> This has worked well so far. Recently, I was given a data set A of size
>> 360440 x 1173, which could not be handled as a normal matrix. I brought it
>> into 'R' as a sparse matrix (dgCMatrix - using sparseMatrix from the Matrix
>> package), and the formulæ and gradient work, but /optim/ returns an error
>> of the form "no method for coercing this S4 class to a vector".
>>     
>
> If you just want the least squares solution X then
>
> X <- solve(crossprod(A), crossprod(A, B))
>
> will likely be the fastest method where A is the sparse matrix.
>
> I do feel obligated to point out that the least squares solution for
> such large systems is rarely a sensible solution to the underlying
> problem.  If you have over 1000 columns in A and it is very sparse
> then likely at least parts of A are based on indicator columns for a
> categorical variable.  In such situations a model with random effects
> for the category is often preferable to the fixed-effects model you
> are fitting.
>
>
>   
>> After briefly looking into methods and classes, I realize I am in way over
>> my head. Is there any way I could use /optim/ or another optimization
>> algorithm, on sparse matrices?
>>
>> Thank you very much,
>>
>> --Avraham Adler
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>     
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list