[R] set dataframe field value from lookup table
David Winsemius
dwinsemius at comcast.net
Thu Dec 9 17:58:09 CET 2010
On Dec 9, 2010, at 11:27 AM, Jon Erik Ween wrote:
> Sorry, I should have included the error I get when using the initial
> vesion of step 2):
>
> Error in `$<-.data.frame`(`*tmp*`, "DSTz", value = list(Age7 =
> c(-1.55, :
> replacement has 20 rows, data has 955
> In addition: Warning message:
> In DSTzlook[, 1] == df$DSF + df$DSB :
> longer object length is not a multiple of shorter object length
>
> So, regardless of how you calculate [r,c], the step
>
> df$DSTz<-DSTzlook[r,c]
Possible that mapply would offer a mechanism. The devil is in the
details.
>
> doesn't work. I've tried various permutations with "apply", but that
> didn't work either. Any suggestions?
I have yet to see a fully responsive reply to the request for a
reproducible example. That is my suggestion at this point.
--
David.
>
> Jon
>
> Soli Deo Gloria
>
> Jon Erik Ween, MD, MS
> Scientist, Kunin-Lunenfeld Applied Research Unit
> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
> Assistant Professor, Dept. of Medicine, Div. of Neurology
> University of Toronto Faculty of Medicine
>
> Kimel Family Building, 6th Floor, Room 644
> Baycrest Centre
> 3560 Bathurst Street
> Toronto, Ontario M6A 2E1
> Canada
>
> Phone: 416-785-2500 x3648
> Fax: 416-785-2484
> Email: jween at klaru-baycrest.on.ca
>
>
> Confidential: This communication and any attachment(s) may contain
> confidential or privileged information and is intended solely for
> the address(es) or the entity representing the recipient(s). If you
> have received this information in error, you are hereby advised to
> destroy the document and any attachment(s), make no copies of same
> and inform the sender immediately of the error. Any unauthorized use
> or disclosure of this information is strictly prohibited.
>
>
>
> On 2010-12-09, at 11:06 AM, David Winsemius wrote:
>
>>
>> On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
>>
>>> Thanks David
>>>
>>> What I am trying to do is set up a script that assigns z-scores to
>>> a large dataframe (2500x300, but has Age in years and test scores
>>> as columns.) from a published table of age-corrected standard
>>> scores on this cognitive test.
>>>
>>> 1) The age intervals in the lookup table are given and not my
>>> choice.
>>
>> You may want to skip the intermediate translation to the row and
>> column labels and just use the results of findInterval:
>>
>>> findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,
>>> 84, 89) )
>> [1] 1
>>> findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,
>>> 84, 89) )
>> [1] 14
>>
>> Those look like appropriate indices for the column argument
>>>
>>> 2) Sorry I didn't post an example table, it looks something like
>>> this ("Age" is in the first row, standard scores in the first
>>> column):
>>>
>>> 17 19 24 29 34 44 ....
>>> 30 2.6 2.6 2.6 2.6 2.6 2.6
>>> 29 1.8 1.8 1.8 2.0 2.6 2.6
>>> 28 1.0 1.0 1.8 1.8 2.6 2.6
>>> 27 0.0 0.5 1.0 1.8 2.6 2.6
>>> 26 -.5 0.0 0.0 1.0 1.8 2.6
>>> .
>>> .
>>> .
>>> .
>>>
>>> So, if a subject (row) has age==29 and a standard score of 28, the
>>> value should be 1.8, etc.
>>
>> Looks like a job for two findInterval indices to be used used with
>> "[ r , c ] ".
>>
>> --
>> David.
>>
>>>
>>> Thanks
>>>
>>>
>>> Jon
>>>
>>> Soli Deo Gloria
>>>
>>> Jon Erik Ween, MD, MS
>>> Scientist, Kunin-Lunenfeld Applied Research Unit
>>> Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
>>> Assistant Professor, Dept. of Medicine, Div. of Neurology
>>> University of Toronto Faculty of Medicine
>>>
>>> Kimel Family Building, 6th Floor, Room 644
>>> Baycrest Centre
>>> 3560 Bathurst Street
>>> Toronto, Ontario M6A 2E1
>>> Canada
>>>
>>> Phone: 416-785-2500 x3648
>>> Fax: 416-785-2484
>>> Email: jween at klaru-baycrest.on.ca
>>>
>>>
>>> Confidential: This communication and any attachment(s) may contain
>>> confidential or privileged information and is intended solely for
>>> the address(es) or the entity representing the recipient(s). If
>>> you have received this information in error, you are hereby
>>> advised to destroy the document and any attachment(s), make no
>>> copies of same and inform the sender immediately of the error. Any
>>> unauthorized use or disclosure of this information is strictly
>>> prohibited.
>>>
>>>
>>>
>>> On 2010-12-09, at 10:33 AM, David Winsemius wrote:
>>>
>>>>
>>>> On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>> This is (hopefully) a bit more cogent phrasing of a previous
>>>>> post. I'm
>>>>> trying to compute a z-score to rows in a large dataframe based
>>>>> on values in
>>>>> another dataframe. Here's the script (that does not work). 2
>>>>> questons,
>>>>>
>>>>> 1) Anyone know of a more elegant way to calculate the "rounded"
>>>>> age value
>>>>> than the nested ifelse's I've used?
>>>>>
>>>>> 2) how to reference the lookup table based on computed indices?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Jon
>>>>>
>>>>> # Define tables
>>>>> DSTzlook <-
>>>>> read.table("/Users/jween/Documents/ResearchProjects/ABC/data/
>>>>> DSTz.txt",
>>>>> header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
>>>>> df<-stroke
>>>>>
>>>>> # Compute rounded age.
>>>>> df$Agetmp
>>>>> <-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df
>>>>> $Age>=79,79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df
>>>>> $Age>=64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df
>>>>> $Age>=34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df
>>>>> $Age>=19,19,17))))))))))))
>>>>
>>>> Ew, painful. If you want categorized ages (since what the above
>>>> coding is producing is not "rounded" in any sense of that word as
>>>> I understand it, then why not findInterval() as an index into the
>>>> ages you wnat to label these case with?
>>>>
>>>> df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[ # note
>>>> Extract operation
>>>> findInterval(runif(100,0,100),
>>>> c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
>>>> ] # close extraction
>>>>
>>>>
>>>> The other option, of course, and a more "honest" one in this
>>>> instance would be
>>>>
>>>> cut(vec, breaks=c(...), labels=c(...) )
>>>>
>>>> (It's not clear why you are not picking midpoint ages within
>>>> those brackets to me.)
>>>>
>>>>>
>>>>> # Reference the lookup table based on computed indices
>>>>> df$DSTz
>>>>> <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==df
>>>>> $DSF+df$DSB)]
>>>>
>>>> I have not been able to figure out what you are trying to do
>>>> here. Trying to use a 2d lookup looks promising a a way to
>>>> emulate what an Excel user might attempt, but an example (as
>>>> requested in the message at the bottom of every posting) would
>>>> really be of great help in making this more concrete for those of
>>>> us with insufficient abstractive abilities.
>>>>
>>>> --
>>>> David.
>>>>
>>>>>
>>>>> # Cleanup
>>>>> #rm(df)
>>>>> #df$Agetmp<-NULL
>>>>> --
>>>>> View this message in context: http://r.789695.n4.nabble.com/set-dataframe-field-value-from-lookup-table-tp3080245p3080245.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list