[R] Aggregate

Jagat Sheth shethj at epi.wustl.edu
Mon Nov 6 18:51:32 CET 2000



>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 11/06 9:01 AM >>>

> Date: Mon, 06 Nov 2000 08:48:04 -0600
> From: "Jagat Sheth" <shethj at epi.wustl.edu>
> 
> Hello to all,
> 
> I recently downloaded R to my PC and am enjoying getting acquainted with it. 
Thank you to everyone involved in the R-project!
> 
> I am interested in doing a log-linear analysis with R on a data set
with dichotomous variables. There are 11 variables (columns) and around
1000 subjects (rows).  How do I aggregate my data, i.e. how do I make a
new dataset that includes the variable giving  the counts for rows with
the same configuration of responses? I know this is possible using the
package 'cfa' (configural frequency analysis, a contributed package in
R) but I can't coerce the output from the cfa command into a data
frame.

I'm not fully sure I understand.  You have a data frame, one row per
subject with 11 columns?  What's the response here?  

Sorry for my ambiguity! The response variable I want is not in my original dataset having one row per observation and 11 columns. I would like to make a new dataset having one row per 'configuration' from my original dataset and 12 columns. The 12-th column will be the dependent variable I want, namely 'freq', the  number of times the given 'configuration' appeared in the original data set. I would like to do a log-linear analysis on this new data set, eg. loglm( freq ~., newdataset).  

If I try to set 'freq' equal to 1 for each row in my original data set, then I am prompted by R to increase the heapsize for memory when running loglin or loglm.

Thanks again.

J. Sheth

 

If one of the
columns is the response (and it is dichotomous) then you can just use
logistic regression without any transformation.  To do a log-linear
analysis using a few of the variables as a joint response you can use
multinom from package nnet on the original data, and it will summarize
the data as you request en route.  To use loglin all you need to do is
use table on the data frame

do.call("table", dataset)

*BUT* I am not sure that is at all appropriate as an analysis.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk 
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list