[R] Running multidimensional regressions
andrews Nikolaiv
andrewstayloy2 at hotmail.com
Tue Jan 14 18:22:36 CET 2014
Dear R helpers!,
I have a question on how to run a regression with many indices.
To give you a practical example, let y_{itabp} be an dependent variable (representing prices) indexed by
i=country, t=time, a=area, b=brand and p=package size.
In particular, we collected prices on a particular product from i=1...,I countries over a period of t=1,...,T_{i} months. For example, for Italy we have
price information over 24 months whereas for Germany we have price information over 36 months.
For each country, we have price information by area (a=1,...,A_{i}- for example, for Italy we have
price information for 5 areas whereas for Germany we have price information for 9 areas). For each area we have information on prices by brand (b=1,...,4 )
Finally, for each brand prices are broken down by package size (p=1,2,3)
I want to run a semiparametric regression to see the effect of package size on y_{itabp}.
I display an example of my data structure
x = data.frame( c("AA", "AA","AA","AA","AA","AA","AA","AA","AA","AA","AA","AA", "AA", "AA", "AA", "AA", "AA", "AA","BB","BB","BB","BB","BB","BB"),c("AA1", "AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","BB1","BB1","BB1","BB1","BB1","BB1"),c("b1", "b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1"),c("ps1", "ps1","ps1", "ps2", "ps2","ps2", "ps3", "ps3","ps3","ps1", "ps1","ps1", "ps2", "ps2","ps2", "ps3", "ps3","ps3","ps1", "ps1","ps2", "ps2","ps3", "ps3"),c("01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/01/2009", "01/02/2009", "01/01/2009" , "01/02/2009", "01/01/2009" , "01/02/2009"),c(1.760342013, 1.786738677, 1.786393476, 1.725465745, 1.678327481, 1.843653536, 1.930568273, 1.941369212, 1.874848166, 1.902682713, 1.769559151, 1.802301798, 1.730695908, 1.89508712, 1.860365501, 1.907621204, 1.776247731, 1.81093127, 1.84776311, 1.801920074, 1.804968098, 1.830213005, 1.783453061, 1.58952),c(0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.05,0.05,0.05,0.05,0.05,0.05))
colnames(x) = c("Country", "Area", "brand", "packsize", "dates", "price", "Package_size")
> x
Country Area brand packsize dates price Package_size
1 AA AA1 b1 ps1 01/10/2008 1.760342 0.075
2 AA AA1 b1 ps1 01/11/2008 1.786739 0.075
3 AA AA1 b1 ps1 01/12/2008 1.786393 0.075
4 AA AA1 b1 ps2 01/10/2008 1.725466 0.075
5 AA AA1 b1 ps2 01/11/2008 1.678327 0.075
6 AA AA1 b1 ps2 01/12/2008 1.843654 0.075
7 AA AA1 b1 ps3 01/10/2008 1.930568 0.075
8 AA AA1 b1 ps3 01/11/2008 1.941369 0.075
9 AA AA1 b1 ps3 01/12/2008 1.874848 0.075
10 AA AA2 b1 ps1 01/10/2008 1.902683 0.075
11 AA AA2 b1 ps1 01/11/2008 1.769559 0.075
12 AA AA2 b1 ps1 01/12/2008 1.802302 0.075
13 AA AA2 b1 ps2 01/10/2008 1.730696 0.075
14 AA AA2 b1 ps2 01/11/2008 1.895087 0.075
15 AA AA2 b1 ps2 01/12/2008 1.860366 0.075
16 AA AA2 b1 ps3 01/10/2008 1.907621 0.075
17 AA AA2 b1 ps3 01/11/2008 1.776248 0.075
18 AA AA2 b1 ps3 01/12/2008 1.810931 0.075
19 BB BB1 b1 ps1 01/01/2009 1.847763 0.050
20 BB BB1 b1 ps1 01/02/2009 1.801920 0.050
21 BB BB1 b1 ps2 01/01/2009 1.804968 0.050
22 BB BB1 b1 ps2 01/02/2009 1.830213 0.050
23 BB BB1 b1 ps3 01/01/2009 1.783453 0.050
24 BB BB1 b1 ps3 01/02/2009 1.589520 0.050
I also created the variables
countryN that takes 1 for AA, 2 for BB etc,
AreaN that takes 1 for AA1, 2 for AAA2, etc,
packsizeN that takes 1 for ps1, 2 for ps2 etc,
and timeN; that is,
x = data.frame( c("AA", "AA","AA","AA","AA","AA","AA","AA","AA","AA","AA","AA", "AA", "AA", "AA", "AA", "AA", "AA","BB","BB","BB","BB","BB","BB"),c("AA1", "AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","BB1","BB1","BB1","BB1","BB1","BB1"),c("b1", "b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1"),c("ps1", "ps1","ps1", "ps2", "ps2","ps2", "ps3", "ps3","ps3","ps1", "ps1","ps1", "ps2", "ps2","ps2", "ps3", "ps3","ps3","ps1", "ps1","ps2", "ps2","ps3", "ps3"),c("01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/01/2009", "01/02/2009", "01/01/2009" , "01/02/2009", "01/01/2009" , "01/02/2009"),c(1.760342013, 1.786738677, 1.786393476, 1.725465745, 1.678327481, 1.843653536, 1.930568273, 1.941369212, 1.874848166, 1.902682713, 1.769559151, 1.802301798, 1.730695908, 1.89508712, 1.860365501, 1.907621204, 1.776247731, 1.81093127, 1.84776311, 1.801920074, 1.804968098, 1.830213005, 1.783453061, 1.58952),c(0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.05,0.05,0.05,0.05,0.05,0.05),c(1,2,3, 1,2,3, 1,2,3,1,2,3,1,2,3,1,2,3,1,2,1,2,1,2),c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1),c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,2,2,3,3))
colnames(x) = c("Country", "Area", "brand", "packsize", "dates", "price", "Package_size", "timeN", "countryN", "AreaN", "packsizeN")
I then, run the following semiparametric regression to see the effect of package size on y_{itabp}, taking into account that prices are broken down by country, area,brand, packsize and time.
attach(x)
require(np)
model <- npreg(price~factor(Package_size)+factor(timeN)+factor(countryN)+factor(AreaN)+ ordered(packsizeN))
summary(model)
plot(model,common.scale=FALSE)
Do you think that these commands serve my goal (to estimate the effect of package size on y_{itcabp})?
Any code provided is greatly appreciated.
Thank you very much in advance,
andrews
More information about the R-help
mailing list