[R] set dataframe field value from lookup table

Jon Erik Ween jween at klaru-baycrest.on.ca
Thu Dec 9 19:10:10 CET 2010


Just for clarification: This all has to do with the Wechsler Memory Scale revised (WMS-R). The task is digit span forward and backwards (digit span total). A raw score is recorded based on subject performance and translated into an age-adjusted standard score in one table and the standard score is then translated into a z-score in a second table. My DSTz table combines these two steps, going from the raw score to the tabulated age-adjusted z-score that was calculated based on a normative sample by the WMS-R folks.

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit 
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
    University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644 
Baycrest Centre
3560 Bathurst Street 
Toronto, Ontario M6A 2E1
Canada 

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca


Confidential: This communication and any attachment(s) may contain confidential or privileged information and is intended solely for the address(es) or the entity representing the recipient(s). If you have received this information in error, you are hereby advised to destroy the document and any attachment(s), make no copies of same and inform the sender immediately of the error. Any unauthorized use or disclosure of this information is strictly prohibited.



On 2010-12-09, at 12:53 PM, William Dunlap wrote:

>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jon Erik Ween
>> Sent: Thursday, December 09, 2010 8:27 AM
>> To: David Winsemius
>> Cc: r-help at r-project.org
>> Subject: Re: [R] set dataframe field value from lookup table
>> 
>> Sorry, I should have included the error I get when using the 
>> initial vesion of step 2):
>> 
>> Error in `$<-.data.frame`(`*tmp*`, "DSTz", value = list(Age7 
>> = c(-1.55,  : 
>>  replacement has 20 rows, data has 955
>> In addition: Warning message:
>> In DSTzlook[, 1] == df$DSF + df$DSB :
>>  longer object length is not a multiple of shorter object length
>> 
>> So, regardless of how you calculate [r,c], the step
>> 
>> df$DSTz<-DSTzlook[r,c]
> 
> You probably want to use [cbind(r,c)], where r
> and c are vectors of row and column numbers.
> 
> Supplying an example that helpers could copy and
> paste into an R session would really help.  E.g.,
> instead of showing the usual printout of the table
> of zscores, show the output of dput(thatTable) or
> the command you used to build it.  Here is my
> guess, given your printout
> 
>  ZScoreTable <- matrix(byrow=TRUE,
>     c( 2.6, 2.6, 2.6, 2.6, 2.6, 2.6,
>        1.8, 1.8, 1.8, 2.0, 2.6, 2.6,
>        1.0, 1.0, 1.8, 1.8, 2.6, 2.6,
>        0.0, 0.5, 1.0, 1.8, 2.6, 2.6,
>        -.5, 0.0, 0.0, 1.0, 1.8, 2.6),
>     nrow=5,
>     ncol=6,
>     dimnames = list(
>        StdScore=c("30", "29", "28", "27", "26"),
>        AgeClass=c("17", "19", "24", "29", "34", "44")
>     )
>  )
> 
> Your structure may be different, but given that
> that is your table of that encodes the mapping of
> the order pair (StdScore,AgeClass) to a z-score
> here is some code to do the mapping:
> 
>  ZScore <- function(age, stdScore) {
>     AgeToColumnNumber <- function(age,
>        ageClassBottoms=as.numeric(colnames(ZScoreTable)))
>     {
>        retval <- findInterval(age, c(ageClassBottoms, Inf))
>        retval[retval==0] <- NA
>        retval
>     }
> 
>     StdScoreToRowNumber <- function(stdScore,
>        knownScores = as.numeric(rownames(ZScoreTable)))
>     {
>        match(stdScore, knownScores)
>     }
> 
>     ZScoreTable[cbind(StdScoreToRowNumber(stdScore),
> AgeToColumnNumber(age))]
>  }
> 
> where a typical usage would be
> 
>> ZScore(age=c(29,44, 10), stdScore=c(28,29,30))
>  [1] 1.8 2.6  NA
> 
> (Age 10 is not in the table so it gets an NA for a z-score).
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
> 
>> doesn't work. I've tried various permutations with "apply", 
>> but that didn't work either. Any suggestions?
>> 
>> Jon
>> 
>> Soli Deo Gloria
>> 
>> Jon Erik Ween, MD, MS
>> Scientist, Kunin-Lunenfeld Applied Research Unit 
>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>    University of Toronto Faculty of Medicine
>> 
>> Kimel Family Building, 6th Floor, Room 644 
>> Baycrest Centre
>> 3560 Bathurst Street 
>> Toronto, Ontario M6A 2E1
>> Canada 
>> 
>> Phone: 416-785-2500 x3648
>> Fax: 416-785-2484
>> Email: jween at klaru-baycrest.on.ca
>> 
>> 
>> Confidential: This communication and any attachment(s) may 
>> contain confidential or privileged information and is 
>> intended solely for the address(es) or the entity 
>> representing the recipient(s). If you have received this 
>> information in error, you are hereby advised to destroy the 
>> document and any attachment(s), make no copies of same and 
>> inform the sender immediately of the error. Any unauthorized 
>> use or disclosure of this information is strictly prohibited.
>> 
>> 
>> 
>> On 2010-12-09, at 11:06 AM, David Winsemius wrote:
>> 
>>> 
>>> On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
>>> 
>>>> Thanks David
>>>> 
>>>> What I am trying to do is set up a script that assigns 
>> z-scores to a large dataframe (2500x300, but has Age in years 
>> and test scores as columns.) from a published table of 
>> age-corrected standard scores on this cognitive test.
>>>> 
>>>> 1) The age intervals in the lookup table are given and not 
>> my choice.
>>> 
>>> You may want to skip the intermediate translation to the 
>> row and column labels and just use the results of findInterval:
>>> 
>>>> findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 
>> 69, 74, 79, 84, 89) )
>>> [1] 1
>>>> findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 
>> 69, 74, 79, 84, 89) )
>>> [1] 14
>>> 
>>> Those look like appropriate indices for the column argument
>>>> 
>>>> 2) Sorry I didn't post an example table, it looks 
>> something like this ("Age" is in the first row, standard 
>> scores in the first column):
>>>> 
>>>>      17   19   24   29   34   44 ....
>>>> 30   2.6  2.6  2.6  2.6  2.6  2.6
>>>> 29  1.8  1.8  1.8  2.0  2.6  2.6
>>>> 28  1.0  1.0  1.8  1.8  2.6  2.6
>>>> 27  0.0  0.5  1.0  1.8  2.6  2.6
>>>> 26   -.5  0.0  0.0  1.0  1.8  2.6
>>>> .
>>>> .
>>>> .
>>>> .
>>>> 
>>>> So, if a subject (row) has age==29 and a standard score of 
>> 28, the value should be 1.8, etc.
>>> 
>>> Looks like a job for two findInterval indices to be used 
>> used with "[ r , c ] ".
>>> 
>>> -- 
>>> David.
>>> 
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> Jon
>>>> 
>>>> Soli Deo Gloria
>>>> 
>>>> Jon Erik Ween, MD, MS
>>>> Scientist, Kunin-Lunenfeld Applied Research Unit
>>>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>>>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>>>  University of Toronto Faculty of Medicine
>>>> 
>>>> Kimel Family Building, 6th Floor, Room 644
>>>> Baycrest Centre
>>>> 3560 Bathurst Street
>>>> Toronto, Ontario M6A 2E1
>>>> Canada
>>>> 
>>>> Phone: 416-785-2500 x3648
>>>> Fax: 416-785-2484
>>>> Email: jween at klaru-baycrest.on.ca
>>>> 
>>>> 
>>>> Confidential: This communication and any attachment(s) may 
>> contain confidential or privileged information and is 
>> intended solely for the address(es) or the entity 
>> representing the recipient(s). If you have received this 
>> information in error, you are hereby advised to destroy the 
>> document and any attachment(s), make no copies of same and 
>> inform the sender immediately of the error. Any unauthorized 
>> use or disclosure of this information is strictly prohibited.
>>>> 
>>>> 
>>>> 
>>>> On 2010-12-09, at 10:33 AM, David Winsemius wrote:
>>>> 
>>>>> 
>>>>> On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
>>>>> 
>>>>>> 
>>>>>> Hi
>>>>>> 
>>>>>> This is (hopefully) a bit more cogent phrasing of a 
>> previous post. I'm
>>>>>> trying to compute a z-score to rows in a large dataframe 
>> based on values in
>>>>>> another dataframe. Here's the script (that does not 
>> work). 2 questons,
>>>>>> 
>>>>>> 1) Anyone know of a more elegant way to calculate the 
>> "rounded" age value
>>>>>> than the nested ifelse's I've used?
>>>>>> 
>>>>>> 2) how to reference the lookup table based on computed indices?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Jon
>>>>>> 
>>>>>> # Define tables
>>>>>> DSTzlook <-
>>>>>> 
>> read.table("/Users/jween/Documents/ResearchProjects/ABC/data/D
>> STz.txt",
>>>>>> header=TRUE, sep="\t", na.strings="NA", dec=".", 
>> strip.white=TRUE)
>>>>>> df<-stroke
>>>>>> 
>>>>>> # Compute rounded age.
>>>>>> df$Agetmp
>>>>>> 
>> <-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df$Age>=79,
> 79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=>
> 64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>
>> =34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df$Age
>> =19,19,17))))))))))))
>>>>> 
>>>>> Ew, painful. If you want categorized ages (since what the 
>> above coding is producing is not "rounded" in any sense of 
>> that word as I understand it, then why not findInterval() as 
>> an index into the ages you wnat to label these case with?
>>>>> 
>>>>> df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[  # 
>> note Extract operation
>>>>>          findInterval(runif(100,0,100), 
>> c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
>>>>>         ]  # close extraction
>>>>> 
>>>>> 
>>>>> The other option, of course, and a more "honest" one in 
>> this instance would be
>>>>> 
>>>>> cut(vec, breaks=c(...), labels=c(...) )
>>>>> 
>>>>> (It's not clear why you are not picking midpoint ages 
>> within those brackets to me.)
>>>>> 
>>>>>> 
>>>>>> # Reference the lookup table based on computed indices
>>>>>> df$DSTz
>>>>>> 
>> <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==
> df$DSF+df$DSB)]
>>>>> 
>>>>> I have not been able to figure out what you are trying to 
>> do here. Trying to use a 2d lookup looks promising a a way to 
>> emulate what an Excel user might attempt, but an example (as 
>> requested in the message at the bottom of every posting) 
>> would really be of great help in making this more concrete 
>> for those of us with insufficient abstractive abilities.
>>>>> 
>>>>> -- 
>>>>> David.
>>>>> 
>>>>>> 
>>>>>> # Cleanup
>>>>>> #rm(df)
>>>>>> #df$Agetmp<-NULL
>>>>>> -- 
>>>>>> View this message in context: 
>> http://r.789695.n4.nabble.com/set-dataframe-field-value-from-l
> ookup-table-tp3080245p3080245.html
>>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>> 
>>>>> 
>>>>> 
>>>>> David Winsemius, MD
>>>>> West Hartford, CT
>>>>> 
>>>> 
>>> 
>>> David Winsemius, MD
>>> West Hartford, CT
>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 



More information about the R-help mailing list