[R] [FORGED] Regression with factors ?
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Wed Jul 13 20:09:35 CEST 2016
The formula interface as used in lm and nls searches for separate
coefficients for each variable.. it will take someone more clever than I
to figure out how to get the formula interface to think of two variables
as instances of one factor.
However, R can do nonlinear optimization just fine:
##------------
# as if read in using read.csv( fname, as.is=TRUE )
dta <- data.frame( y = observed_data$y
, p1 = as.character( observed_data$p1 )
, p2 = as.character( observed_data$p2 )
, stringsAsFactors = FALSE
)
lvls <- with( dta, unique( c( p1, p2 ) ) )
dta$p1f <- factor( dta$p1, levels = lvls )
dta$p2f <- factor( dta$p2, levels = lvls )
idxvmult <- length( lvls ) + 1L
idxvoffs <- length( lvls ) + 2L
# all values in a numeric vector
# x = c( valice, vbob, ..., vmult, voffs )
calcY <- function( x ) {
vmult <- x[ idxvmult ]
voffs <- x[ idxvoffs ]
vp1 <- x[ dta$p1f ]
vp2 <- x[ dta$p2f ]
vmult * ( voffs - ( vp1 - vp2 )^2 )
}
optfcn <- function( x ) {
sum( ( dta$y - calcY( x ) ) ^ 2 )
}
oresult <- optim( par = rep( 1, idxvoffs ), optfcn)
result <- list( multiplier = oresult$par[ idxvmult ]
, offset = oresult$par[ idxvoffs ]
, values = data.frame( lvls = lvls
, values = oresult$par[ seq.int(
length( lvls ) ) ] )
)
result
#---------
I highly recommend reading the help page for optim and the CRAN Task View
on optimization [1]
[1] https://cran.r-project.org/web/views/Optimization.html
On Wed, 13 Jul 2016, stn021 wrote:
>> Is this what is intended?
>>
>>> observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) ]
>>> observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) ]
>
>
> Hello David,
>
> thank you for your answer.
>
>
> The code in my previous post was intended as an answer to the question
> in an earlier post about example-data, quote:
>
>>> Would you like me to make a complete example dataset with more records and noise ?
>> Yes. And preferably do it with R code.
>
> I should have re-stated this connection in the post.
>
>
> The code generates a matrix 'observed_data' which is the data the
> experimenter would get during the experiment.
>
> This matrix is output in the last line. All other output is only meant
> to document the generation-process.
>
> So the only thing visible to the experimenter before analysis is
> exactly that matrix 'observed_data' (usually in the form of some
> written documentation which is later entered into statistical
> software). Everything before that last line simulates those unknown
> parameters that the experiment is supposed to reveal.
>
> The unknown parameters are specifically
> - the matrix 'persons'
> - and the variable 'multiplyer'
>
> Both are supposed to be revealed by the analyis. p1ab and p2ab would
> therefore depend on the unknown parameters and could not be added to
> 'observed_data' before the analysis.
>
> Sorry again for omitting the back-reference.
>
>
> I would like to know:
>
> - how to get R to use p1 and p2 as levels of the same factor
> (=persons) instead of levels of two different factors.
>
> - how to get R to multiply the numerical levels of factors during the
> search for the solution. Factors cannot be multiplied before running
> lm() or some other package because before the analysis their numerical
> values are not known.
>
>
> THX, Stefan
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list