[R] Help with Hmisc, cut2, split and quantile
David Freedman
3.14david at gmail.com
Tue Mar 9 04:26:23 CET 2010
try
as.numeric(read_data$DEC)
this should turn it into a numeric variable that you can work with
hth
David Freedman
CDC, Atlanta
Guy Green wrote:
>
> Hi Peter & others,
>
> Thanks (Peter) - that gets me really close to what I was hoping for.
>
> The one problem I have is that the "cut" approach breaks the data into
> intervals based on the absolute value of the "Target" data, rather than
> their frequency. In other words, if the data ranged from 0 to 50, the
> data would be separated into 0-5, 5-10 and so on, regardless of the
> frequency within those categories. However I want to get the data into
> deciles.
>
> The code that does this (incorporating Peter's) is:
>
> read_data=read.table("C:/Sample table.txt", head = T)
> read_data$DEC <- with(read_data, cut(Target, breaks=10, labels=1:10))
> L <- split(read_data, read_data$DEC)
>
> This means that I can get separate data frames, such as L$'10', which
> comes out tidy, but only containing 2 data items (the sample has 63 rows,
> so each decile should have 6+ data items):
> Actual Target DEC
> 9 0.572 0.3778386 10
> 31 0.299 0.3546606 10
>
> If I try to adjust this to get deciles using cut2(), I can break the data
> into deciles as follows:
>
> read_data=read.table("C:/Sample table.txt", head = T)
> read_data$DEC <- with(read_data, cut2(read_data$Target, g=10),
> labels=1:10)
> L <- split(read_data, read_data$DEC)
>
> However this time, while the data is broken into even data frames, the
> labels for the separate data frames are unuseable, e.g.:
> $`[ 0.26477, 0.37784]`
> Actual Target DEC
> 6 0.243 0.2650960 [ 0.26477, 0.37784]
> 9 0.572 0.3778386 [ 0.26477, 0.37784]
> 10 -0.049 0.3212681 [ 0.26477, 0.37784]
> 15 0.780 0.2778518 [ 0.26477, 0.37784]
> 31 0.299 0.3546606 [ 0.26477, 0.37784]
> 33 0.105 0.2647676 [ 0.26477, 0.37784]
>
> Could anyone suggest a way of rearranging this to make the labels useable
> again? Sample data is reattached
> http://n4.nabble.com/file/n1585427/Sample_table.txt Sample_table.txt .
>
> Thanks,
> Guy
>
>
>
> Peter Ehlers wrote:
>>
>> On 2010-03-08 8:47, Guy Green wrote:
>>>
>>> Hello,
>>> I have a set of data with two columns: "Target" and "Actual". A
>>> http://n4.nabble.com/file/n1584647/Sample_table.txt Sample_table.txt is
>>> attached but the data looks like this:
>>>
>>> Actual Target
>>> -0.125 0.016124906
>>> 0.135 0.120799865
>>> ... ...
>>> ... ...
>>>
>>> I want to be able to break the data into tables based on quantiles in
>>> the
>>> "Target" column. I can see (using cut2, and also quantile) how to get
>>> the
>>> barrier points between the different quantiles, and I can see how I
>>> would
>>> achieve this if I was just looking to split up a vector. However I am
>>> trying to break up the whole table based on those quantiles, not just
>>> the
>>> vector.
>>>
>>> However I would like to be able to break the table into ten separate
>>> tables,
>>> each with both "Actual" and "Target" data, based on the "Target" data
>>> deciles:
>>>
>>> top_decile = ...(top decile of "read_data", based on Target data)
>>> next_decile = ...and so on...
>>> bottom_decile = ...
>>
>> I would just add a factor variable indicating to which decile
>> a particular observation belongs:
>>
>> dat$DEC <- with(dat, cut(Target, breaks=10, labels=1:10))
>>
>> If you really want to have separate data frames you can then
>> split on the decile:
>>
>> L <- split(dat, dat$DEC)
>>
>> -Peter Ehlers
>> --
>> Peter Ehlers
>> University of Calgary
>>
>>
>
>
--
View this message in context: http://n4.nabble.com/Help-with-Hmisc-cut2-split-and-quantile-tp1584647p1585503.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list