[R] Operating on count lists of non-equal lengths
Kari Manninen
kari at econadvisor.com
Sun Jan 9 07:19:51 CET 2011
This is my first post to R-help and I look forward receiving some
advice for a novice like me...
Ive got a simple repeated (4 periods so far) 10-question survey data
that is very easy to work on Excel. However, Id like to move the
compilation to R but Im having some trouble operating on count list
data in a neat way.
The data C
> str(C)
'data.frame': 551 obs. of 13 variables:
$ TIME : int 1 1 1 1 1 1 1 1 1 1 ...
$ Sector : Factor w/ 6 levels "D","F","G","H",..: 1 1 1 1 1 1 1 1 1 1 ...
$ COMP : Factor w/ 196 levels " (_____ __ _____) ",..: 73 133 128
109 153 147 56 26 142 34 ...
$ Q1 : int 0 0 1 1 0 -1 -1 1 1 -1 ...
$ Q2 : int 0 0 0 -1 0 -1 0 0 1 -1 ...
$ Q3 : int 0 0 0 1 0 -1 -1 1 1 -1 ...
$ Q4 : int -1 0 0 0 0 -1 0 -1 0 -1 ...
$ Q5 : int 0 0 0 -1 0 -1 0 -1 0 0 ...
$ Q6 : int 0 0 0 1 0 -1 0 -1 0 0 ...
$ Q7 : int 0 1 1 0 0 0 1 0 1 1 ...
$ Q8 : int 0 0 0 0 0 -1 0 0 1 0 ...
$ Q9 : int 0 1 0 0 0 -1 0 -1 1 -1 ...
$ Q10 : int 0 0 0 0 -1 -1 0 -1 0 0 ...
> summary(C)
TIME Sector COMP Q1 Q2
Min. :1.000 D:130 A: 4 Min. :-1.000 Min. :-1.0000
1st Qu.:2.000 F:126 B: 4 1st Qu.: 0.000 1st Qu.: 0.0000
Median :3.000 G:158 C: 4 Median : 1.000 Median : 0.0000
Mean :2.684 H: 26 D: 4 Mean : 0.446 Mean : 0.2178
3rd Qu.:4.000 I: 20 E: 4 3rd Qu.: 1.000 3rd Qu.: 1.0000
Max. :4.000 J: 91 F: 4 Max. : 1.000 Max. : 1.0000
(Other):527 NA's :60.000 NA's :69.0000
The aim is to produce balance scores between positive and negative
answers shares in the data. First counts of -1, 0 and 1 (negative,
neutral, positive) and missing NA (it would be som much simple without
the missing values) for each question Q1-Q10 for each period (TIME) in
6 Sectors:
b<-apply(C[,4:13], 2, function (x) tapply(x,C[,1:2], count))
I know that b is a list of data.frames dim(4x6) for each question,
where each cell is a count list.
For example, for Question 1, Time period 2, Sector 1:
> str(b$Q1[2,1])
List of 1
$ :data.frame: 4 obs. of 2 variables:
..$ x : int [1:4] -1 0 1 NA
..$ freq : int [1:4] 3 9 12 2
Now I would like to group questions (C[, 4:6], C[, 7], C[8:9],
C[10:11] and C[, 12:13]) and sum counts (-1, 0, 1) for these groups
and present them in percentage terms. I dont know how to this
efficiently for the whole data. I would not like to go through each
cell separately
Then Id give each group a balance score based on something like:
Score = 100 + 100*[ pos% - neg%] for each group by TIME, Sector, while
excluding the missing observations.
### This is not working
Score <- 100 + 100*[sum(count( =="1")/sum(count(list( "-1", "0","1")
- sum(count( =="-1")/sum(count(list( "-1", "0","1")] for each 5
groups defined above and by TIME, Sector
I would greatly appreciate your help on this.
Regards,
- Kari Manninen
More information about the R-help
mailing list