[R] Equivalent to Stata egen

ronggui ronggui.huang at gmail.com
Fri Apr 17 03:35:31 CEST 2009


It is sure thing that different person has different expectation of
the help system. Personally, I think Stata's on-line help system is
too brief, though the manual may be a different story. Perhaps, it is
all about the habit and the extent to which you are used to (and how
much you know about it).

2009/4/17 Stas Kolenikov <skolenik at gmail.com>:
> See, we just jave different expectations of what is to be seen in the
> help system, and are used to different formats. Yes, Stata thinks of
> data as a rectangular array (although it stores it in memory, unlike
> SAS). The inputs to -egen-, as well as the values produced, depend on
> the particular function -fcn- and are described in subsections on
> those individual functions. That is mentioned at the top of the page.
> There is a pretty much standard syntax of most Stata commands (command
> name followed by variables it is applied to or expression to be
> computed followed by if conditions on observations followed by comma
> options ), and -egen- more or less satisfies that syntax. A Stata user
> equipped with the basic concepts of the assignment command -generate-
> (which -egen- is said to extend) and variable lists (-varlist- here
> and there in the help file) would be able to make sense of this all.
>
> I would rather translate R's ave() to Stata's -by- expression. Not all
> of the -egen- functionality can be implemented via ave().
>
> Looks like terseness is a prerequisite to doing anything in R though.
> If I am telling you I am a newbie, the book abbreviations although
> standard to everybody on this list may not mean much to me. I could
> figure out "Regression Modeling Strategies" (although I was not
> thinking about it as a book on R -- I probably did not read it far
> enough :) ), and V&R is Venables & Ripley. Right?
>
> On 4/16/09, David Winsemius <dwinsemius at comcast.net> wrote:
>> Terse is OK by me as long as I get told what goes in (allowable data types,
>> argument names and effects) and what comes out. What seemed to be lacking in
>> that Stata doc for egen was a description of the purpose or behavior and
>> then could find no description of the values produced. Perhaps it is because
>> Stata has an approach that everything is a rectangular array? Is everything
>> assumed to create a new column of data as in SAS?
>>
>>  At any rate it looked to this casual non-user, reading that document, that
>> egen creates a new variable aligned with its argument variables by applying
>> various functions within groupings. That is pretty much what ave does. "ave"
>> is not restricted to mean as a functional argument. As I said it was a
>> guess.
>>
>>  The texts I used to get up to speed in R are several downloaded from the
>> Contributed documents (including anything written by Venables), V&R MASS v
>> 2, Harrell's RMS, Sarkar's Lattice, Chambers&Hastie SMiS and reading a lot
>> of Q&A on this list.
>>
>>  --
>>  David Winsemius
>>
>>  On Apr 16, 2009, at 11:57 AM, Stas Kolenikov wrote:
>>
>>
>> > http://www.stata.com/help.cgi?egen -- it creates new
>> variables dealing
>> > with some special relatively non-standard tasks that don't boil down
>> > to a one-line arithmetic expressions. For that reason, there will be
>> > no equivalent to -egen- in general, as it has so many functions that
>> > are so different. -rowtotal- is of course just a shorthand for sum(),
>> > except for treatment of missing values ( ifelse(is.na(x),0,x ). But
>> > -anycount- is a moderately complicated double cycle over variables and
>> > list of values (40 lines of underlying Stata code, including parsing
>> > and labeling the resulting variables)... which will probably become a
>> > triple R cycle including the cycle over observations, although the
>> > latter can probably be avoided.
>> >
>> > Yes, R documentation looks exteremely terse to me as a regular Stata
>> > user. I am used to seeing the concpets explained well, even in the
>> > help files, and certainly more so in the shelved books. As every
>> > option and every part of the syntax is devoted at least three to five
>> > sentences, and the most common uses are exemplified, I can usually
>> > figure out how to run a particular task relatively quickly. (The data
>> > management tricks, which is what Peter was asking about above, are
>> > probably an exception: you either know them, or you don't. In this
>> > example, I don't know the corresponding R tricks, although I can
>> > probably brute force the solution if I needed to.) The fraction of
>> > commands in R that I personally have been coming across that are
>> > comparably well documented is about a quarter. For other, it is either
>> > a guesswork+CRANning+googling around or "Forget it, I'll just go back
>> > to Stata to do it" after a few futile attempts. May be I just don't
>> > know where to look for the good stuff, but it is certainly outside R
>> > as a package+its documentation.
>> >
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
HUANG Ronggui, Wincent
PhD Candidate
Dept of Public and Social Administration
City University of Hong Kong
Home page: http://asrr.r-forge.r-project.org/rghuang.html




More information about the R-help mailing list