[R] Data Frame housekeeping
David Winsemius
dwinsemius at comcast.net
Wed May 25 20:13:37 CEST 2011
On May 25, 2011, at 1:16 PM, Scott Hatcher wrote:
> Hello Dr. Winsemius,
>
> First of all, thank you for your prompt and helpful reply. Also, for
> providing something I hoped would be produced from joining this
> mailing list: a means of discovering incredibly useful packages such
> as the "reshape2" one you have introduced me too.
>
> I have a follow up question to your solution (which should produce
> exactly what I need):
>
> when I run the cast function to reassemble the data frame I get:
I used `dcast`.
>
> Error in names(data) <- array_names(res$labels[[2]]) :
> 'names' attribute [7] must be the same length as the vector [1]
And I obviously didn't get that error, so there might be a difference
in either the code (which you did not show), or the data (which you
did not offer in a reproducible form).
>
> This signaled to me that the function was returning 7 values where
> it expected only 1. To test this I applied a summary function "mean"
> to the cast, and the result processed (however it only produced NA's
> because my values were class:factors). What I don't understand is
> where these multiple values are coming from; there should be only a
> single value corresponding to the 4 id.vars given in the cast
> function (STN_ID,YEAR,MM,variable).
If you want further effort you should address the inadequacies of your
question. It is very possible that you will need to acquaint yourself
with the use of either `dump` pr `dput`.
--
David.
>
> Thanks again for your help,
>
> Scott Hatcher
>
> On 24/05/2011 5:16 PM, David Winsemius wrote:
>>
>> On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:
>>
>>> Hello,
>>>
>>> I have a large data frame that is organized by date in a peculiar
>>> way. I
>>> am seeking advice on how to transform the data into a format that
>>> is of
>>> more use to me.
>>>
>>> The data is organized as follows:
>>>
>>> STN_ID YEAR MM ELEM X1 X2 X3
>>> X4 X5 X6 X7
>>> 1 2402594 1997 9 1 *-00233* *-00204* *-00119* -00190
>>> -00251 -00243 -00249
>>> 2 2402594 1997 10 1 -00003 -00005 -00001
>>> -00039 -00031 -00036 -00033
>>> 3 2402594 1997 11 1 000025 000065 000070
>>> 000069 000115 000072 000093
>>>
>>> Where "MM" is the month of the year, and ELEM is the variable to
>>> which
>>> the values in the X* columns describe (in the actual data there
>>> are 31 X
>>> columns, one for each day of the month). The values in bold are the
>>> values that are transferred into the small chart below (which is the
>>> result I hope to get). This is to give a sense of how the data is
>>> picked
>>> out of the original data frame.
>>
>> assuming this dataframe is named 'tst':
>>
>> require(reshape2)
>> mtst <- melt(tst[, 1:7], id.vars=1:4) Only select idvars and X1:X3
>> str(mtst)
>> #----------
>> 'data.frame': 54 obs. of 6 variables:
>> $ STN_ID : num 2402594 2402594 2402594 2402594 2402594 ...
>> $ YEAR : num 1997 1997 1997 1997 1998 ...
>> $ MM : num 9 10 11 12 1 2 3 4 5 9 ...
>> $ ELEM : num 1 1 1 1 1 1 1 1 1 2 ...
>> $ variable: Factor w/ 3 levels "X1","X2","X3": 1 1 1 1 1 1 1 1 1
>> 1 ...
>> $ value : chr "-00233" "-00003" "000025" "000160" ...
>>
>> dcast(mtst, STN_ID +YEAR+ MM + variable ~ ELEM)
>> #---------
>> STN_ID YEAR MM variable 1 2
>> 1 2402594 1997 9 X1 -00233 -00339
>> 2 2402594 1997 9 X2 -00204 -00339
>> 3 2402594 1997 9 X3 -00119 -00343
>> 4 2402594 1997 10 X1 -00003 -00207
>> 5 2402594 1997 10 X2 -00005 -00289
>> 6 2402594 1997 10 X3 -00001 -00278
>> 7 2402594 1997 11 X1 000025 -00242
>> snipped output
>>
>>>
>>> I would like to organize the data so it looks like this:
>>>
>>> STN_ID YEAR MM DAY ELEM1 ELEM2
>>> 1 2402594 1997 9 X1 -00233 -00339
>>> 2 2402594 1997 9 X2 -00204 000077
>>> 3 2402594 1997 9 X3 -00119 000030
>>
>> Where is that second column coming from. I don't see it in the data
>> example
>>>
>>> Such that I create a new column named "DAY" that is made up of the
>>> numbers following "X" in the original data.frame columns. Also,
>>> the ELEM
>>> values are converted to columns and parsed with the ELEM code (in
>>> this
>>> case 1 and 2).
>>>
>>> I have tried to split apart the columns, transform them, and bind
>>> them
>>> back together, but my ability to do so just isn't there yet. I am
>>> still
>>> fairly new to R, and would really appreciate some help in working
>>> towards organizing this data frame.
>>>
>>> Thanks in advance,
>>> Scott Hatcher
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list