[R] Help with Hmisc, cut2, split and quantile

David Freedman 3.14david at gmail.com
Tue Mar 9 04:26:23 CET 2010


try 
as.numeric(read_data$DEC)

this should turn it into a numeric variable that you can work with

hth
David Freedman
CDC, Atlanta


Guy Green wrote:
> 
> Hi Peter & others,
> 
> Thanks (Peter) - that gets me really close to what I was hoping for.
> 
> The one problem I have is that the "cut" approach breaks the data into
> intervals based on the absolute value of the "Target" data, rather than
> their frequency.  In other words, if the data ranged from 0 to 50, the
> data would be separated into 0-5, 5-10 and so on, regardless of the
> frequency within those categories.  However I want to get the data into
> deciles.
> 
> The code that does this (incorporating Peter's) is:
> 
> read_data=read.table("C:/Sample table.txt", head = T)
> read_data$DEC <- with(read_data, cut(Target, breaks=10, labels=1:10))
> L <- split(read_data, read_data$DEC)
> 
> This means that I can get separate data frames, such as L$'10', which
> comes out tidy, but only containing 2 data items (the sample has 63 rows,
> so each decile should have 6+ data items):
>      Actual    Target       DEC
> 9   0.572     0.3778386   10
> 31  0.299    0.3546606   10
> 
> If I try to adjust this to get deciles using cut2(), I can break the data
> into deciles as follows:
> 
> read_data=read.table("C:/Sample table.txt", head = T)
> read_data$DEC <- with(read_data, cut2(read_data$Target, g=10),
> labels=1:10)
> L <- split(read_data, read_data$DEC)
> 
> However this time, while the data is broken into even data frames, the
> labels for the separate data frames are unuseable, e.g.:
> $`[ 0.26477, 0.37784]`
>     Actual    Target                 DEC
> 6   0.243   0.2650960    [ 0.26477, 0.37784]
> 9   0.572   0.3778386    [ 0.26477, 0.37784]
> 10 -0.049  0.3212681    [ 0.26477, 0.37784]
> 15  0.780  0.2778518    [ 0.26477, 0.37784]
> 31  0.299  0.3546606    [ 0.26477, 0.37784]
> 33  0.105  0.2647676    [ 0.26477, 0.37784]
> 
> Could anyone suggest a way of rearranging this to make the labels useable
> again?  Sample data is reattached
> http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .
> 
> Thanks,
> Guy
> 
> 
> 
> Peter Ehlers wrote:
>> 
>> On 2010-03-08 8:47, Guy Green wrote:
>>>
>>> Hello,
>>> I have a set of data with two columns: "Target" and "Actual".  A
>>> http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
>>> attached but the data looks like this:
>>>
>>> Actual	Target
>>> -0.125	0.016124906
>>> 0.135		0.120799865
>>> ...		...
>>> ...		...
>>>
>>> I want to be able to break the data into tables based on quantiles in
>>> the
>>> "Target" column.  I can see (using cut2, and also quantile) how to get
>>> the
>>> barrier points between the different quantiles, and I can see how I
>>> would
>>> achieve this if I was just looking to split up a vector.  However I am
>>> trying to break up the whole table based on those quantiles, not just
>>> the
>>> vector.
>>>
>>> However I would like to be able to break the table into ten separate
>>> tables,
>>> each with both "Actual" and "Target" data, based on the "Target" data
>>> deciles:
>>>
>>> top_decile = ...(top decile of "read_data", based on Target data)
>>> next_decile = ...and so on...
>>> bottom_decile = ...
>> 
>> I would just add a factor variable indicating to which decile
>> a particular observation belongs:
>> 
>>   dat$DEC <- with(dat, cut(Target, breaks=10, labels=1:10))
>> 
>> If you really want to have separate data frames you can then
>> split on the decile:
>> 
>>   L <- split(dat, dat$DEC)
>> 
>>     -Peter Ehlers
>> -- 
>> Peter Ehlers
>> University of Calgary
>> 
>> 
> 
> 
-- 
View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list