[R] Help with Hmisc, cut2, split and quantile

Guy Green guygreen at netvigator.com
Tue Mar 9 02:00:23 CET 2010


Hi Peter & others,

Thanks (Peter) - that gets me really close to what I was hoping for.

The one problem I have is that the "cut" approach breaks the data into
intervals based on the absolute value of the "Target" data, rather than
their frequency.  In other words, if the data ranged from 0 to 50, the data
would be separated into 0-5, 5-10 and so on, regardless of the frequency
within those categories.  However I want to get the data into deciles.

The code that does this (incorporating Peter's) is:

read_data=read.table("C:/Sample table.txt", head = T)
read_data$DEC <- with(read_data, cut(Target, breaks=10, labels=1:10))
L <- split(read_data, read_data$DEC)

This means that I can get separate data frames, such as L$'10', which comes
out tidy, but only containing 2 data items (the sample has 63 rows, so each
decile should have 6+ data items):
     Actual    Target       DEC
9   0.572     0.3778386   10
31  0.299    0.3546606   10

If I try to adjust this to get deciles using cut2(), I can break the data
into deciles as follows:

read_data=read.table("C:/Sample table.txt", head = T)
read_data$DEC <- with(read_data, cut2(read_data$Target, g=10), labels=1:10)
L <- split(read_data, read_data$DEC)

However this time, while the data is broken into even data frames, the
labels for the separate data frames are unuseable, e.g.:
$`[ 0.26477, 0.37784]`
    Actual    Target                 DEC
6   0.243   0.2650960    [ 0.26477, 0.37784]
9   0.572   0.3778386    [ 0.26477, 0.37784]
10 -0.049  0.3212681    [ 0.26477, 0.37784]
15  0.780  0.2778518    [ 0.26477, 0.37784]
31  0.299  0.3546606    [ 0.26477, 0.37784]
33  0.105  0.2647676    [ 0.26477, 0.37784]

Could anyone suggest a way of rearranging this to make the labels useable
again?  Sample data is reattached
http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .

Thanks,
Guy



Peter Ehlers wrote:
> 
> On 2010-03-08 8:47, Guy Green wrote:
>>
>> Hello,
>> I have a set of data with two columns: "Target" and "Actual".  A
>> http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt  is
>> attached but the data looks like this:
>>
>> Actual	Target
>> -0.125	0.016124906
>> 0.135		0.120799865
>> ...		...
>> ...		...
>>
>> I want to be able to break the data into tables based on quantiles in the
>> "Target" column.  I can see (using cut2, and also quantile) how to get
>> the
>> barrier points between the different quantiles, and I can see how I would
>> achieve this if I was just looking to split up a vector.  However I am
>> trying to break up the whole table based on those quantiles, not just the
>> vector.
>>
>> However I would like to be able to break the table into ten separate
>> tables,
>> each with both "Actual" and "Target" data, based on the "Target" data
>> deciles:
>>
>> top_decile = ...(top decile of "read_data", based on Target data)
>> next_decile = ...and so on...
>> bottom_decile = ...
> 
> I would just add a factor variable indicating to which decile
> a particular observation belongs:
> 
>   dat$DEC <- with(dat, cut(Target, breaks=10, labels=1:10))
> 
> If you really want to have separate data frames you can then
> split on the decile:
> 
>   L <- split(dat, dat$DEC)
> 
>     -Peter Ehlers
> -- 
> Peter Ehlers
> University of Calgary
> 
> 

-- 
View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585427.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list