[R] Data Frame housekeeping

David Winsemius dwinsemius at comcast.net
Wed May 25 20:13:37 CEST 2011


On May 25, 2011, at 1:16 PM, Scott Hatcher wrote:

> Hello Dr. Winsemius,
>
> First of all, thank you for your prompt and helpful reply. Also, for  
> providing something I hoped would be produced from joining this  
> mailing list: a means of discovering incredibly useful packages such  
> as the "reshape2" one you have introduced me too.
>
> I have a follow up question to your solution (which should produce  
> exactly what I need):
>
> when I run the cast function to reassemble the data frame I get:

I used `dcast`.

>
> Error in names(data) <- array_names(res$labels[[2]]) :
>  'names' attribute [7] must be the same length as the vector [1]

And I obviously didn't get that error, so there might be a difference  
in either the code (which you did not show), or the data (which you  
did not offer in a reproducible form).

>
> This signaled to me that the function was returning 7 values where  
> it expected only 1. To test this I applied a summary function "mean"  
> to the cast, and the result processed (however it only produced NA's  
> because my values were class:factors). What I don't understand is  
> where these multiple values are coming from; there should be only a  
> single value corresponding to the 4 id.vars given in the cast  
> function (STN_ID,YEAR,MM,variable).

If you want further effort you should address the inadequacies of your  
question. It is very possible that you will need to acquaint yourself  
with the use of either `dump` pr `dput`.

-- 
David.
>
> Thanks again for your help,
>
> Scott Hatcher
>
> On 24/05/2011 5:16 PM, David Winsemius wrote:
>>
>> On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:
>>
>>> Hello,
>>>
>>> I have a large data frame that is organized by date in a peculiar  
>>> way. I
>>> am seeking advice on how to transform the data into a format that  
>>> is of
>>> more use to me.
>>>
>>> The data is organized as follows:
>>>
>>>    STN_ID YEAR MM ELEM      X1         X2       X3          
>>> X4        X5        X6         X7
>>> 1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190   
>>> -00251  -00243  -00249
>>> 2  2402594 1997  10  1              -00003  -00005  -00001   
>>> -00039  -00031  -00036  -00033
>>> 3  2402594 1997  11  1              000025  000065  000070   
>>> 000069  000115  000072  000093
>>>
>>> Where "MM" is the month of the year, and ELEM is the variable to  
>>> which
>>> the values in the X* columns describe (in the actual data there  
>>> are 31 X
>>> columns, one for each day of the month). The values in bold are the
>>> values that are transferred into the small chart below (which is the
>>> result I hope to get). This is to give a sense of how the data is  
>>> picked
>>> out of the original data frame.
>>
>> assuming this dataframe is named 'tst':
>>
>> require(reshape2)
>> mtst <- melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
>> str(mtst)
>> #----------
>> 'data.frame':    54 obs. of  6 variables:
>> $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
>> $ YEAR    : num  1997 1997 1997 1997 1998 ...
>> $ MM      : num  9 10 11 12 1 2 3 4 5 9 ...
>> $ ELEM    : num  1 1 1 1 1 1 1 1 1 2 ...
>> $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1  
>> 1 ...
>> $ value   : chr  "-00233" "-00003" "000025" "000160" ...
>>
>> dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
>> #---------
>>    STN_ID YEAR MM variable      1      2
>> 1  2402594 1997  9       X1 -00233 -00339
>> 2  2402594 1997  9       X2 -00204 -00339
>> 3  2402594 1997  9       X3 -00119 -00343
>> 4  2402594 1997 10       X1 -00003 -00207
>> 5  2402594 1997 10       X2 -00005 -00289
>> 6  2402594 1997 10       X3 -00001 -00278
>> 7  2402594 1997 11       X1 000025 -00242
>> snipped output
>>
>>>
>>> I would like to organize the data so it looks like this:
>>>
>>>      STN_ID YEAR MM DAY    ELEM1 ELEM2
>>> 1     2402594 1997   9  X1       -00233 -00339
>>> 2     2402594 1997   9  X2       -00204 000077
>>> 3     2402594 1997   9  X3       -00119 000030
>>
>> Where is that second column coming from. I don't see it in the data  
>> example
>>>
>>> Such that I create a new column named "DAY" that is made up of the
>>> numbers following "X" in the original data.frame columns. Also,  
>>> the ELEM
>>> values are converted to columns and parsed with the ELEM code (in  
>>> this
>>> case 1 and 2).
>>>
>>> I have tried to split apart the columns, transform them, and bind  
>>> them
>>> back together, but my ability to do so just isn't there yet. I am  
>>> still
>>> fairly new to R, and would really appreciate some help in working
>>> towards organizing this data frame.
>>>
>>> Thanks in advance,
>>> Scott Hatcher
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list