[R] How to create MULTILEVELS in a dataset??

Ista Zahn istazahn at gmail.com
Mon Oct 19 18:13:51 CEST 2009


Hi,
I wouldn't combine the year and country codes in the first place, and
certainly not as a numeric value. Do you have the raw data with
country and year listed separately? From the output you listed it
looks like you indeed have a single value (2e+07) for yearctry. You
can check with

unique(e$yearctry)

to see how many unique values there are. But combined with the fact
that lmer is telling you that you only have one, I'm guessing there
really is only one value. You've got your data in an unmanageable
state I think. Go back to the raw data. How many countries do you
have? How many years does the data span?

On Mon, Oct 19, 2009 at 11:43 AM, saurav pathak <pathak.saurav at gmail.com> wrote:
> Hi Ista
> You got that correct, yearctry is a composite created as yearctry =
> year*10000+country, so that say for example USA with country code 1 and year
> 2000 will be 2000001, for year 2005, it will be 2005001, the years are
> listed from 2000 to 2008, for many countries, for UK say it will be 2000044
> and 2005044 and so on for various years from 2000-2008 and various
> countries, I am listing the result of str(e) here,
>
> 'data.frame':   902533 obs. of  18 variables:
>  $ yearctry    : num  2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 ...
>  $ discent     : int  0 0 0 NA 0 1 0 0 0 NA ...
>  $ age         : int  51 46 26 24 19 18 20 19 25 19 ...
>  $ gender      : int  1 2 1 1 1 1 1 1 1 1 ...
>  $ gemeduc     : int  0 0 111 111 111 111 111 111 111 111 ...
>  $ gemhhinc    : int  33 33 33 33 33 33 33 33 33 33 ...
>  $ ref_group   : int  1 2 3 3 3 3 3 3 3 3 ...
>  $ fearfail_ref: num  1 NA 0.473 0.473 0.473 ...
>  $ knowent_ref : num  0 NA 0.484 0.484 0.484 ...
>  $ nbgoodc_ref : num  NA 0 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.84 ...
>  $ nbstatus_ref: num  NA 1 0.846 0.846 0.846 ...
>  $ estbbuso_ref: num  0 0 0.0172 0.0172 0.0172 ...
>  $ lngdp       : num  8.99 9.08 9.29 9.13 8.99 ...
>  $ lngdpsq     : num  19.5 19.4 19.2 19.4 19.5 ...
>  $ es_gdppcppp : num  7995 8804 10872 9189 7995 ...
>  $ sq_gdppcppp : num  3.01e+08 2.74e+08 2.10e+08 2.61e+08 3.01e+08 2.74e+08
> 2.10e+08 3.01e+08 2.10e+08 2.61e+08 ...
>  $ estbbo_m    : num  0.1063 0.078 0.049 0.0355 0.1063 ...
>  $ es_gdpchg   : num  -10.9 8.837 9.179 -0.789 -10.9 ...
>
> a portion of >yearctry is also listed
>
>  2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07
> [65391] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65417] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65443] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65469] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65495] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65521] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65547] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65573] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
> 2e+07 2e+07 2e+07
> [65599] 2e+07 2e+07 2e+0
>
> By looking at the above I dont know whether R recognises them as different
> numbers, here the exponential format of representing yearctry does not
> reveal whether it takes yearctry as I explained above( ie whether it
> recognises 2000001 different from 2005001 , all of them appear to be 2e+07),
> if that is the case how do I make R to recognise it as the number 2000001
> and so on, Stata too lists as an exponential format but I know that it
> recognises yearctry values different for different yeras and countries,
> please help
>
> I have to shift to R because Stata is taking days and days to run gllamm
>
> Kindly help
>
>
>
> On Mon, Oct 19, 2009 at 2:19 PM, Ista Zahn <istazahn at gmail.com> wrote:
>>
>> HI,
>> Please keep r-help copied on the reply -- hopefully someone will pick
>> up this thread and help us out.
>>
>> On Mon, Oct 19, 2009 at 2:17 AM, saurav pathak <pathak.saurav at gmail.com>
>> wrote:
>> > Dear Ista
>> > Thanks for answering, the previous question was a primer to what I
>> > wanted, I
>> > did just what you said with "yearctry " below as the country code or
>> > group
>> > variable, ie yearctry (data grouped by yearctry) was the variable I was
>> > using to pass as the country id.
>>
>> I suggested using country as the grouping variable. what is yearcty?
>> From the name it sounds like a composite of year and country.
>>
>>  Kindly notice that after running the lmer
>> > model, it recognises yearctry as the group, but shows no of groups
>> > :Groups:
>> > yearctry,1, this means it did not recognise yearctry as the variable by
>> > which the data is grouped. The number should be 239 and not 1
>>
>> That's weird. What does
>>
>> str(e)
>>
>> say?
>>
>> >
>> > But please see below:
>> >
>> > My data set is e
>> >
>> >> names(e)
>> >  [1] "yearctry"     "discent"      "age"          "gender"
>> > "gemeduc"      "gemhhinc"     "ref_group"    "fearfail_ref"
>> > "knowent_ref"
>> > "nbgoodc_ref"
>> > [11] "nbstatus_ref" "estbbuso_ref" "lngdp"        "lngdpsq"
>> > "es_gdppcppp"  "sq_gdppcppp"  "estbbo_m"     "es_gdpchg"
>> >
>> > hear I have variables representing two levels, namely individual level
>> > and
>> > country level. My data is thus a 2 level data. the country level
>> > variables
>> > (level-2) are "lngdp"        "lngdpsq"      "es_gdppcppp"  "sq_gdppcppp"
>> > "estbbo_m"     "es_gdpchg" grouped by "yearctry" and the rest of the
>> > variables are individual level (level-1).
>> >
>> > the  number of Individual observations are 655078 and number of yearctry
>> > ie
>> > groups =239, however when I model a probit to see the influence of 4
>> > individual level var (ie age gender gemeduc and gemhhinc) and one
>> > country
>> > level var (es_gdppcppp) using
>> >
>> >> prb1<-lmer(discent~age+gender+gemeduc+gemhhinc+es_gdppcppp+(1 |
>> >> yearctry),family=binomial(link="probit"),data=e)
>> >
>> > I get
>> >
>> > Generalized linear mixed model fit by the Laplace approximation
>> > Formula: discent ~ age + gender + gemeduc + gemhhinc + es_gdppcppp + (1
>> > |      yearctry)
>> >    Data: e
>> >     AIC    BIC logLik deviance
>> >  194043 194122 -97014   194029
>> > Random effects:
>> >  Groups   Name        Variance   Std.Dev.
>> >  yearctry (Intercept) 4.0708e-06 0.0020176
>> > Number of obs: 655078, groups: yearctry, 1
>> > Fixed effects:
>> >               Estimate Std. Error z value Pr(>|z|)
>> > (Intercept) -7.578e-01  1.839e-02  -41.20  < 2e-16 ***
>> > age         -2.441e-03  2.990e-04   -8.16 3.30e-16 ***
>> > gender      -2.886e-01  7.710e-03  -37.43  < 2e-16 ***
>> > gemeduc      9.244e-05  6.930e-06   13.34  < 2e-16 ***
>> > gemhhinc    -8.938e-07  1.359e-07   -6.58 4.75e-11 ***
>> > es_gdppcppp -2.459e-05  2.691e-07  -91.40  < 2e-16 ***
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> > Correlation of Fixed Effects:
>> >             (Intr) age    gender gemedc gmhhnc
>> > age         -0.580
>> > gender      -0.563 -0.138
>> > gemeduc     -0.373  0.166  0.011
>> > gemhhinc    -0.009 -0.132 -0.024 -0.201
>> > es_gdppcppp -0.490  0.071  0.314 -0.297  0.256
>> > The model did not recognise group to be yearctry and shows 1 instead of
>> > 239,
>> > can somebody help me as to how to make my model recognise es_gdppcppp as
>> > a
>> > country level variable grouped by yearctry (such that yeractry no of obs
>> > should be 239)
>>
>> I think we need more information. How many levels does str(e) say
>> yearctry has? Also do you really have data from 239  countries, or is
>> yearctry a composite of year and country? If the later it might make
>> sense to split it out int separate year and country variables.
>>
>> hope it helps,
>> Ista
>> >
>> > On Mon, Oct 19, 2009 at 5:00 AM, Ista Zahn <istazahn at gmail.com> wrote:
>> >>
>> >> Hi Saurav,
>> >> I was waiting for someone else to answer you, because I'm not sure
>> >> I'll be able to explain clearly. But since no one is jumping on it,
>> >> I'll take a stab.
>> >>
>> >> On Sun, Oct 18, 2009 at 5:52 PM, saurav pathak
>> >> <pathak.saurav at gmail.com>
>> >> wrote:
>> >> > Dear R users
>> >> >
>> >> > I have a data set which has five variables. One depenedent variable
>> >> > y,
>> >> > and 4
>> >> > Independent variables (education-level, householdincome, countrygdp
>> >> > and
>> >> > countrygdpsquare). The first two are data corresponding to the
>> >> > individual
>> >> > and the next two coorespond to the country to which the individual
>> >> > belongs
>> >> > to. My data set does not make this distinction between individual
>> >> > level
>> >> > and
>> >> > country level. Is there a way such that I can make R make countrygdp
>> >> > and
>> >> > countrygdpsquare at a different level than the individual level data.
>> >> > In
>> >> > other words I wish to transform my dataset such that it recognizes
>> >> > two
>> >> > individual level variables to be at Level-1 and the other two country
>> >> > level
>> >> > variables at Level-2.
>> >> >
>> >>
>> >> If you're using lmer I don't think you need to do anything special in
>> >> terms of data preparation. You will need an explicit country code I
>> >> think.
>> >>
>> >> > I need to run a multilevel model, but first I must make my dataset
>> >> > recognise
>> >> > data at Level-1 and Level-2. How can I create this country level
>> >> > group
>> >> > (gdp
>> >> > and gdp^2) such that I can perform a multilevel model as follows:
>> >> >
>> >> > lmer(y ~ education-level + householdincome + countrygdp +
>> >> > countrygdpsquare +
>> >> > (1 I Level2),family=binomial(link="probit),data=dataset)
>> >>
>> >> I think you just need to specify country as the grouping variable:
>> >>
>> >>  lmer(y ~ education-level + householdincome + countrygdp +
>> >> countrygdpsquare + (1 I
>> >> country),family=binomial(link="probit),data=dataset)
>> >>
>> >> >
>> >> > Please kindly help me with the relevant commands for creating this
>> >> > Level2
>> >> > (having two variables)
>> >>
>> >> I hope this helps -- I thinks it's less complicated than you were
>> >> assuming.
>> >>
>> >> -Ista
>> >> >
>> >> > Thanks
>> >> > Saurav
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Dr.Saurav Pathak
>> >> > PhD, Univ.of.Florida
>> >> > Mechanical Engineering
>> >> > Doctoral Student
>> >> > Innovation and Entrepreneurship
>> >> > Imperial College Business School
>> >> > s.pathak08 at imperial.ac.uk
>> >> > 0044-7795321121
>> >> >
>> >> >        [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Ista Zahn
>> >> Graduate student
>> >> University of Rochester
>> >> Department of Clinical and Social Psychology
>> >> http://yourpsyche.org
>> >
>> >
>> >
>> > --
>> > Dr.Saurav Pathak
>> > PhD, Univ.of.Florida
>> > Mechanical Engineering
>> > Doctoral Student
>> > Innovation and Entrepreneurship
>> > Imperial College Business School
>> > s.pathak08 at imperial.ac.uk
>> > 0044-7795321121
>> >
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>
>
>
> --
> Dr.Saurav Pathak
> PhD, Univ.of.Florida
> Mechanical Engineering
> Doctoral Student
> Innovation and Entrepreneurship
> Imperial College Business School
> s.pathak08 at imperial.ac.uk
> 0044-7795321121
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org




More information about the R-help mailing list