[R] creating a scale (factor) based on a continuous variable nested within levels of factor
hind lazrak
hindstata at gmail.com
Sun Nov 7 07:15:24 CET 2010
Hello R-helpers
I hope that my subject line is not detering anyone from helping me out:)
I have been stuck of a few hours now, and I don't seem to pinpoint
where the problem is.
I have a data.frame which is structured as follow:
str(hDatPretty)
'data.frame': 1665 obs. of 8 variables:
$ time : num 0 1.02 2.05 3.07 4.09 ...
$ hr : num 62.4 63.6 64.6 65.5 66.2 ...
$ emg : num 3.3 3.42 3.52 3.57 3.6 ...
$ respRate: num 50.4 50.6 50.7 50.8 50.9 ...
$ scr : num 1.7 1.72 1.73 1.74 1.75 ...
$ skinTemp: num 28.1 28.2 28.2 28.2 28.2 ...
$ rating : num 4 4 4 4 4 4 4 4 4 4 ...
$ songId : Factor w/ 37 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
It consists of ratings ($rating) given by people (here the id variable
is not indicated as this is a subset with only one person) for each of
the 37 songs ($songId) they listen to.
While they are listening we measure physiological responses (emg,
hr,...) every second over a period of 45 seconds.
Here's a quick peek at the data
head(hDatPretty)
time hr emg respRate scr skinTemp rating songId
1.1 0.000000 62.42135 3.300562 50.40538 1.703105 28.14489 4 1
1.2 1.022727 63.59057 3.424884 50.59292 1.718110 28.16189 4 1
1.3 2.045455 64.59840 3.515219 50.73523 1.730594 28.17836 4 1
1.4 3.068182 65.47707 3.573151 50.83909 1.740594 28.19422 4 1
1.5 4.090909 66.22192 3.597183 50.90466 1.748086 28.20948 4 1
1.6 5.113636 66.89209 3.588530 50.91911 1.753385 28.22414 4 1
So, every study participant gives one rating (from -10 to 10) for each song
If we tab the data this is what we have (for the first 10 songs)
table(hDatPretty$songId, hDatPretty$rating)
-10 -9 -7 -3 0 1 3 4 5 7 8 9 10
1 0 0 0 0 0 0 0 45 0 0 0 0 0 # song 1 gets a score of 4
2 0 0 0 0 0 0 45 0 0 0 0 0 0 # song 2 gets a score of 3
3 0 0 45 0 0 0 0 0 0 0 0 0 0 #.
4 0 45 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 45 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 45
7 0 0 0 0 0 0 0 0 0 0 45 0 0 #song 7 gets a score of 8
8 0 0 0 45 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 45 0 0 0 0 0
10 0 0 0 0 0 45 0 0 0 0 0 0 0
What I would like to do is to create another scale ( a factor) based
on the ratings with the following levels
-10;-4 == dislike where -4 is included
-4;4 == neutral where -4 is excluded
4;10 == like where 4 is excluded
My code to obtain this new variable
liking <- numeric(length(hDatPretty$rating))
liking[hDatPretty$rating <= -4] <- 'dislike'
liking[hDatPretty$rating > -4 & hDatPretty$rating <= 4] <- 'neutral'
liking[hDatPretty$rating > 4] <- 'like'
hDatPretty['liking']<- factor(liking)
The problem that I have is that for some reasons it does assign
different values to the same rating for some songs but not all (?)
See for example
dislike like neutral
1 0 8 37 ## Here is one problem where the song #
1gets two 'liking' scores while the rating is constant
2 0 0 45
3 45 0 0
4 45 0 0
5 0 45 0
6 0 45 0
7 0 45 0
8 0 0 45
9 0 10 35 ## here is a similar problem
Could you PLEASE help me with the proper code to obtain my 'liking'
variable for each of the song based on the rating each song gets?
Many thanks.
Hind
p.s.: I have also tried the cut() in the code as follow...unsuccesfully
hDatPretty$liking <- by(hDatPretty$rating, hDatPretty$songId,
function (z) { cut(hDatPretty$z, c(-10, -4,4,10),
labels=c('dislike', 'neutral', 'like'))})
Error in cut.default(hDatPretty$z, c(-10, -4, 4, 10), labels = c("dislike", :
'x' must be numeric
again thank you.
More information about the R-help
mailing list