[R] Splitting up large set of survey data into categories

ak13 andreas.karpf at gmail.com
Tue Jan 24 17:54:35 CET 2012


Hi Tal,

thank you very much for your reply. The xts-dataframe I am using is like:

example <- as.data.frame(structure(c(" 1", " 2", " 1", " 2", " 1", " 1", "
2", " 1", " 2", 
" 1", " 2", " 3", " 1", " 1", " 2", " 2", " 3", " 1", " 2", " 2", 
" 1", " 2", " 1", " 1", " 2", NA, " 2", NA, NA, " 1", " 3", " 1", 
" 3", " 3", " 2", " 3", " 3", " 3", " 2", " 2", " 2", " 3", " 3", 
" 3", " 2", " 2", " 3", " 3", " 3", " 3", " 1", " 2", " 1", " 2", 
" 2", " 1", " 2", " 1", " 2", " 2", " 2", " 3", " 1", " 1", " 2", 
" 2", " 3", " 3", " 2", " 2", " 1", " 2", " 1", " 1", " 2", NA, 
" 2", NA, NA, " 1", " 3", " 2", " 3", " 2", " 0", " 3", " 3", 
" 3", " 2", " 0", " 2", " 3", " 3", " 3", " 0", " 2", " 2", " 3", 
" 3", " 0", "12", " 5", " 9", "14", " 5", "tra", "tra", "man", 
"inf", "agc", "07-2011", "07-2011", "07-2011", "07-2011", "07-2011"
), .indexCLASS = c("POSIXlt", "POSIXt"), .indexTZ = "", class = c("xts", 
"zoo"), .indexFORMAT = "%U-%Y", index = structure(c(1297642226, 
1297672737, 1297741204, 1297748893, 1297749513), tzone = "", tclass =
c("POSIXlt", 
"POSIXt")), .Dim = c(5L, 23L), .Dimnames = list(NULL, c("rev_sit", 
"prof_sit", "emp_nr_sit", "inv_sit", "ord_home_sit", "ord_abr_sit", 
"emp_cost_sit", "usage_cost_sit", "tax_cost_sit", "gov_cost_sit", 
"rev_exp", "prof_exp", "emp_nr_exp", "inv_exp", "ord_home_exp", 
"ord_abr_exp", "emp_cost_exp", "usage_cost_exp", "tax_cost_exp", 
"gov_cost_exp", "land", "nace", "index")))) 

What I want to do know is to use the category values from "land" and "nace",
count the occurence of {1,2,3} (these were answering options in the survey)
in the variables and aggregate the whole thing to calendar weeks (the
observations are in a not even frequency distributed over calendar weeks). i
achieved to do this with a normal data frame but then the data is not in the
right time order. I used this command (i.e. for the first variable and the
answering option 1):

pos <- as.data.frame(tapply((example[,1]==3)*1, list(example$index,
example$land, example$nace), sum)) 

If I use this same command for a xts data frame it gives me the following
warning:  "some methods for “zoo” objects do not work if the index entries
in 'order.by' are not unique". 

example$index was produced by myself with the normal time commands because i
couldn't find not other way to get near to the results i am wanting. The
result for Variable 1 (i.e.) should look like this:

 <table border="1">
  <tr>
    <th> . </th>
    <th>nace.land</th>
    <th>nace.land</th>
    <th>nace.land</th>
    <th>nace.land</th>
</tr>
  <tr>
     <th> 10-1995</th>
    <th>32</th>
    <th>45</th>
    <th>10</th>
    <th>9</th>
  </tr>
 <tr>
    <th> 15-1995</th>
    <th>2</th>
    <th>47</th>
    <th>5</th>
    <th>6</th>
  </tr>
</table> 

where line 2 gives the observations of answering option 1 in the 10th
calendar week 1995 for the respective region (land) and the respective
sector (nace). It would be absolutely nice if you could help me with this!

Best regards, 

Andreas Karpf  



--
View this message in context: http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4324340.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list