[R] Running multidimensional regressions

andrews Nikolaiv andrewstayloy2 at hotmail.com
Tue Jan 14 18:22:36 CET 2014

```Dear R helpers!,

I have a question on how to run a regression with many indices.
To give you a practical example, let y_{itabp} be an dependent variable (representing   prices) indexed by
i=country, t=time,  a=area, b=brand and p=package size.
In particular, we  collected prices on a particular product  from  i=1...,I countries over a period of t=1,...,T_{i} months. For example, for Italy we  have
price information over 24 months whereas for Germany we  have  price information over 36 months.
For each country, we have price information  by area (a=1,...,A_{i}- for example, for Italy we  have
price information for 5 areas whereas for Germany we  have  price information for 9  areas). For each area we have  information on prices by brand (b=1,...,4 )
Finally, for each brand prices are broken down by package size (p=1,2,3)

I want to run a semiparametric regression to see the effect of package size on   y_{itabp}.

I display an example of my data structure

x = data.frame(  c("AA", "AA","AA","AA","AA","AA","AA","AA","AA","AA","AA","AA", "AA", "AA", "AA", "AA", "AA", "AA","BB","BB","BB","BB","BB","BB"),c("AA1", "AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","BB1","BB1","BB1","BB1","BB1","BB1"),c("b1", "b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1"),c("ps1", "ps1","ps1",   "ps2", "ps2","ps2",  "ps3", "ps3","ps3","ps1", "ps1","ps1",   "ps2", "ps2","ps2",  "ps3", "ps3","ps3","ps1", "ps1","ps2", "ps2","ps3", "ps3"),c("01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/01/2009", "01/02/2009", "01/01/2009" , "01/02/2009",  "01/01/2009" , "01/02/2009"),c(1.760342013, 1.786738677, 1.786393476, 1.725465745, 1.678327481, 1.843653536, 1.930568273, 1.941369212, 1.874848166, 1.902682713, 1.769559151, 1.802301798, 1.730695908, 1.89508712, 1.860365501, 1.907621204, 1.776247731, 1.81093127, 1.84776311, 1.801920074, 1.804968098, 1.830213005, 1.783453061, 1.58952),c(0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.05,0.05,0.05,0.05,0.05,0.05))

colnames(x) = c("Country", "Area", "brand", "packsize", "dates", "price", "Package_size")

>  x
Country Area brand packsize      dates    price Package_size
1       AA  AA1    b1      ps1 01/10/2008 1.760342        0.075
2       AA  AA1    b1      ps1 01/11/2008 1.786739        0.075
3       AA  AA1    b1      ps1 01/12/2008 1.786393        0.075
4       AA  AA1    b1      ps2 01/10/2008 1.725466        0.075
5       AA  AA1    b1      ps2 01/11/2008 1.678327        0.075
6       AA  AA1    b1      ps2 01/12/2008 1.843654        0.075
7       AA  AA1    b1      ps3 01/10/2008 1.930568        0.075
8       AA  AA1    b1      ps3 01/11/2008 1.941369        0.075
9       AA  AA1    b1      ps3 01/12/2008 1.874848        0.075
10      AA  AA2    b1      ps1 01/10/2008 1.902683        0.075
11      AA  AA2    b1      ps1 01/11/2008 1.769559        0.075
12      AA  AA2    b1      ps1 01/12/2008 1.802302        0.075
13      AA  AA2    b1      ps2 01/10/2008 1.730696        0.075
14      AA  AA2    b1      ps2 01/11/2008 1.895087        0.075
15      AA  AA2    b1      ps2 01/12/2008 1.860366        0.075
16      AA  AA2    b1      ps3 01/10/2008 1.907621        0.075
17      AA  AA2    b1      ps3 01/11/2008 1.776248        0.075
18      AA  AA2    b1      ps3 01/12/2008 1.810931        0.075
19      BB  BB1    b1      ps1 01/01/2009 1.847763        0.050
20      BB  BB1    b1      ps1 01/02/2009 1.801920        0.050
21      BB  BB1    b1      ps2 01/01/2009 1.804968        0.050
22      BB  BB1    b1      ps2 01/02/2009 1.830213        0.050
23      BB  BB1    b1      ps3 01/01/2009 1.783453        0.050
24      BB  BB1    b1      ps3 01/02/2009 1.589520        0.050

I also created the variables

countryN that takes 1 for AA, 2 for BB etc,
AreaN  that takes 1  for AA1, 2 for AAA2, etc,
packsizeN that takes 1 for ps1, 2 for ps2 etc,
and timeN; that is,

x = data.frame(  c("AA", "AA","AA","AA","AA","AA","AA","AA","AA","AA","AA","AA", "AA", "AA", "AA", "AA", "AA", "AA","BB","BB","BB","BB","BB","BB"),c("AA1", "AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA1","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","AA2","BB1","BB1","BB1","BB1","BB1","BB1"),c("b1", "b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1","b1"),c("ps1", "ps1","ps1",   "ps2", "ps2","ps2",  "ps3", "ps3","ps3","ps1", "ps1","ps1",   "ps2", "ps2","ps2",  "ps3", "ps3","ps3","ps1", "ps1","ps2", "ps2","ps3", "ps3"),c("01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" , "01/10/2008", "01/11/2008" , "01/12/2008" ,  "01/01/2009", "01/02/2009", "01/01/2009" , "01/02/2009",  "01/01/2009" , "01/02/2009"),c(1.760342013, 1.786738677, 1.786393476, 1.725465745, 1.678327481, 1.843653536, 1.930568273, 1.941369212, 1.874848166, 1.902682713, 1.769559151, 1.802301798, 1.730695908, 1.89508712, 1.860365501, 1.907621204, 1.776247731, 1.81093127, 1.84776311, 1.801920074, 1.804968098, 1.830213005, 1.783453061, 1.58952),c(0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.075,0.05,0.05,0.05,0.05,0.05,0.05),c(1,2,3, 1,2,3, 1,2,3,1,2,3,1,2,3,1,2,3,1,2,1,2,1,2),c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1),c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,2,2,3,3))

colnames(x) = c("Country", "Area", "brand", "packsize", "dates", "price", "Package_size", "timeN", "countryN", "AreaN", "packsizeN")

I then, run the following  semiparametric regression to see the effect of package size on   y_{itabp}, taking into account that prices are broken down by country, area,brand, packsize and time.

attach(x)
require(np)
model <- npreg(price~factor(Package_size)+factor(timeN)+factor(countryN)+factor(AreaN)+ ordered(packsizeN))
summary(model)
plot(model,common.scale=FALSE)

Do you think that these commands serve my goal (to estimate the effect of package size on   y_{itcabp})?

Any code provided is greatly appreciated.

Thank you very much in advance,

andrews

```