[R] creating a scale (factor) based on a continuous variable nested within levels of factor

hind lazrak hindstata at gmail.com
Sun Nov 7 07:15:24 CET 2010


Hello R-helpers


I hope that my subject line is not detering anyone from helping me out:)
I have been stuck of a few hours now, and I don't seem to pinpoint
where the problem is.


I have a data.frame which is structured as follow:
str(hDatPretty)
'data.frame': 1665 obs. of  8 variables:
$ time    : num  0 1.02 2.05 3.07 4.09 ...
$ hr      : num  62.4 63.6 64.6 65.5 66.2 ...
$ emg     : num  3.3 3.42 3.52 3.57 3.6 ...
$ respRate: num  50.4 50.6 50.7 50.8 50.9 ...
$ scr     : num  1.7 1.72 1.73 1.74 1.75 ...
$ skinTemp: num  28.1 28.2 28.2 28.2 28.2 ...
$ rating  : num  4 4 4 4 4 4 4 4 4 4 ...
$ songId  : Factor w/ 37 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

It consists of ratings ($rating) given by people (here the id variable
is not indicated as this is a subset with only one person) for each of
the 37 songs ($songId) they listen to.
While they are listening we measure physiological responses (emg,
hr,...) every second over a period of 45 seconds.
Here's a quick peek at the data
head(hDatPretty)

        time       hr      emg respRate      scr skinTemp rating songId
1.1 0.000000 62.42135 3.300562 50.40538 1.703105 28.14489      4      1
1.2 1.022727 63.59057 3.424884 50.59292 1.718110 28.16189      4      1
1.3 2.045455 64.59840 3.515219 50.73523 1.730594 28.17836      4      1
1.4 3.068182 65.47707 3.573151 50.83909 1.740594 28.19422      4      1
1.5 4.090909 66.22192 3.597183 50.90466 1.748086 28.20948      4      1
1.6 5.113636 66.89209 3.588530 50.91911 1.753385 28.22414      4      1

So, every study participant gives one rating (from -10 to 10) for each song
If we tab the data this is what we have (for the first 10 songs)
table(hDatPretty$songId, hDatPretty$rating)


    -10 -9 -7 -3  0  1  3  4  5  7  8  9 10
 1    0  0  0  0  0  0  0 45  0  0  0  0  0  # song 1 gets a score of 4
 2    0  0  0  0  0  0 45  0  0  0  0  0  0  # song 2 gets a score of 3
 3    0  0 45  0  0  0  0  0  0  0  0  0  0  #.
 4    0 45  0  0  0  0  0  0  0  0  0  0  0
 5    0  0  0  0  0  0  0  0  0 45  0  0  0
 6    0  0  0  0  0  0  0  0  0  0  0  0 45
 7    0  0  0  0  0  0  0  0  0  0 45  0  0  #song 7 gets a score of 8
 8    0  0  0 45  0  0  0  0  0  0  0  0  0
 9    0  0  0  0  0  0  0 45  0  0  0  0  0
 10   0  0  0  0  0 45  0  0  0  0  0  0  0

What I would like to do is to create another scale ( a factor) based
on the ratings with the following levels
-10;-4 == dislike where -4 is included
-4;4 == neutral where -4 is excluded
4;10 == like  where 4 is excluded

My code to obtain this new variable

liking <- numeric(length(hDatPretty$rating))
liking[hDatPretty$rating <= -4] <- 'dislike'
liking[hDatPretty$rating > -4 & hDatPretty$rating <= 4] <- 'neutral'
liking[hDatPretty$rating > 4] <- 'like'

hDatPretty['liking']<- factor(liking)

The problem that I have is that for some reasons it does assign
different values to the same rating for some songs but not all (?)
See for example

  dislike like neutral
1        0    8      37   ## Here is one problem where the song #
1gets two 'liking' scores while the rating is constant
2        0    0      45
3       45    0       0
4       45    0       0
5        0   45       0
6        0   45       0
7        0   45       0
8        0    0      45
9        0   10      35  ## here is a similar problem

Could you PLEASE help me with the proper code to obtain my 'liking'
variable for each of the song based on the rating each song gets?

Many thanks.


Hind
p.s.: I have also tried the cut() in the code as follow...unsuccesfully

hDatPretty$liking <- by(hDatPretty$rating, hDatPretty$songId,
   function (z) { cut(hDatPretty$z, c(-10, -4,4,10),
   labels=c('dislike', 'neutral', 'like'))})

Error in cut.default(hDatPretty$z, c(-10, -4, 4, 10), labels = c("dislike",  :
 'x' must be numeric

again thank you.



More information about the R-help mailing list