[R] how can I convert a long to wide matrix?
Jeff Newmiller
jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Wed May 2 06:33:08 CEST 2018
Here is a stab in the dark. I agree with Jim that the description of the
problem is hard to follow. The original posting being in HTML format did
not help.
#########
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
# indenting was just a side-effect of me cleaning up the HTML mess
dat <- structure( list( ID = structure( c( 1L, 1L, 1L, 2L, 2L)
, .Label = c("id_X","id_Y")
, class = "factor"
)
, EventDate = structure( c( 4L, 5L, 2L
, 3L, 1L )
, .Label = c( "9/15/16"
, "9/15/17"
, "9/7/16"
, "9/8/16"
, "9/9/16"
)
, class = "factor"
)
, timeGroup = structure( c( 1L, 1L, 2L, 1L, 2L)
, .Label = c("B1", "B2")
, class = "factor"
)
, SITE = structure( c( 1L, 1L, 2L, 1L, 2L)
, .Label = c("A", "B" )
, class = "factor"
)
)
, .Names = c( "ID", "EventDate"
, "timeGroup", "SITE")
, class = "data.frame"
, row.names = c(NA, -5L)
)
dat2 <- ( dat
%>% mutate( EventDate = as.Date( as.character( EventDate )
, format = "%m/%d/%y"
)
)
%>% arrange( ID, timeGroup, EventDate )
%>% group_by( ID, timeGroup )
%>% top_n( 1, EventDate )
%>% ungroup
)
dat2
#> # A tibble: 4 x 4
#> ID EventDate timeGroup SITE
#> <fct> <date> <fct> <fct>
#> 1 id_X 2016-09-09 B1 A
#> 2 id_X 2017-09-15 B2 B
#> 3 id_Y 2016-09-07 B1 A
#> 4 id_Y 2016-09-15 B2 B
dat3a <- ( dat2
%>% mutate( timeGroup = paste( "EventDate"
, timeGroup
, sep="_"
)
)
%>% select( ID, timeGroup, EventDate )
%>% spread( timeGroup, EventDate )
)
dat3a
#> # A tibble: 2 x 3
#> ID EventDate_B1 EventDate_B2
#> <fct> <date> <date>
#> 1 id_X 2016-09-09 2017-09-15
#> 2 id_Y 2016-09-07 2016-09-15
dat3b <- ( dat2
%>% mutate( timeGroup = paste( "SITE"
, timeGroup
, sep = "_"
)
)
%>% select( ID, timeGroup, SITE )
%>% spread( timeGroup, SITE )
)
dat3b
#> # A tibble: 2 x 3
#> ID SITE_B1 SITE_B2
#> <fct> <fct> <fct>
#> 1 id_X A B
#> 2 id_Y A B
dat4 <- ( dat3a
%>% left_join( dat3b, by = "ID" ) )
dat4
#> # A tibble: 2 x 5
#> ID EventDate_B1 EventDate_B2 SITE_B1 SITE_B2
#> <fct> <date> <date> <fct> <fct>
#> 1 id_X 2016-09-09 2017-09-15 A B
#> 2 id_Y 2016-09-07 2016-09-15 A B
#########
On Wed, 2 May 2018, Jim Lemon wrote:
> Hi Marna,
> This is a condition that the function cannot handle. It would be
> possible to reformat the result based on the time intervals, but the
> stretch_df function doesn't try to interpret the values, just
> stretches them out to a wide format.
>
> Jim
>
>
> On Wed, May 2, 2018 at 9:16 AM, Marna Wagley <marna.wagley using gmail.com> wrote:
>> Hi Jim,
>> The data set is correct. I took two readings from the "SITE A" within a
>> short time interval, therefore I want to take the first value if there are
>> repeated within a same group of "timeGroup".
>> Therefore I wanted following
>>
>> FinalData1
>>
>> B1 B2
>> id_X "A" "B"
>> id_Y "A" "B"
>>
>> thanks,
>>
>>
>>
>> On Tue, May 1, 2018 at 4:05 PM, Jim Lemon <drjimlemon using gmail.com> wrote:
>>>
>>> Hi Marna,
>>> I think this is due to having three rows for id_X and only two for
>>> id_Y. The function creates a data frame with enough columns to hold
>>> the greatest number of values for each ID variable. Notice that the
>>> SITE_n columns contain three values for id_X (A, A, B) and two for
>>> id_Y (A, B, NA) as there was no third occasion of measurement for the
>>> latter. Even though there are only two _values_ for SITE, there must
>>> be enough space for three. In your desired output, SITE for the second
>>> occasion of measurement is wrong (it should be "A"), and for the third
>>> occasion it is unknown. Even if there was only one value for SITE in
>>> the original data frame, it should be repeated for the correct number
>>> of observations. I think you may be mixing up case ID with location of
>>> observation.
>>>
>>> Jim
>>>
>>>
>>> On Wed, May 2, 2018 at 8:48 AM, Marna Wagley <marna.wagley using gmail.com>
>>> wrote:
>>>> Hi Jim,
>>>> Thank you very much for your suggestions. I used it but it gave me three
>>>> sites. But actually I do have only two sites "Id_X" and "Id_y" . In fact
>>>> "A" is repeated two times for "Id_X". If it is repeated, I would like to
>>>> take the first one among many repeated values.
>>>>
>>>> dat<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L), .Label =
>>>> c("id_X",
>>>>
>>>> "id_Y"), class = "factor"), EventDate = structure(c(4L, 5L, 2L,
>>>>
>>>> 3L, 1L), .Label = c("9/15/16", "9/15/17", "9/7/16", "9/8/16",
>>>>
>>>> "9/9/16"), class = "factor"), timeGroup = structure(c(1L, 1L,
>>>>
>>>> 2L, 1L, 2L), .Label = c("B1", "B2"), class = "factor"), SITE =
>>>> structure(c(1L,
>>>>
>>>> 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor")), .Names =
>>>> c("ID",
>>>>
>>>> "EventDate", "timeGroup", "SITE"), class = "data.frame", row.names =
>>>> c(NA,
>>>>
>>>> -5L))
>>>>
>>>> library(prettyR)
>>>>
>>>> stretch_df(dat,idvar="ID",to.stretch=c("EventDate","SITE"))
>>>>
>>>>
>>>> ID timeGroup EventDate_1 EventDate_2 EventDate_3 SITE_1 SITE_2 SITE_3
>>>> 1 id_X B1 9/8/16 9/9/16 9/15/17 A A
>>>> B
>>>> 2 id_Y B1 9/7/16 9/15/16 <NA> A B
>>>> <NA>
>>>>>
>>>>
>>>> Basically I am looking for like following table
>>>>
>>>> ID timeGroup EventDate_1 EventDate_2 EventDate_3 SITE_1 SITE_2
>>>> 1 id_X B1 9/8/16 9/9/16 9/15/17 A B
>>>> 2 id_Y B1 9/7/16 9/15/16 <NA> A B
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Tue, May 1, 2018 at 3:32 PM, Jim Lemon <drjimlemon using gmail.com> wrote:
>>>>>
>>>>> Hi Marna,
>>>>> Try this:
>>>>>
>>>>> library(prettyR)
>>>>> stretch_df(dat,idvar="ID",to.stretch=c("EventDate","SITE"))
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>> On Wed, May 2, 2018 at 8:24 AM, Marna Wagley <marna.wagley using gmail.com>
>>>>> wrote:
>>>>>> Hi R user,
>>>>>> I was trying to convert a long matrix to wide? I have an example and
>>>>>> would
>>>>>> like to get a table (FinalData1):
>>>>>>
>>>>>>
>>>>>> FinalData1
>>>>>> B1 B2
>>>>>> id_X "A" "B"
>>>>>> id_Y "A" "B"
>>>>>>
>>>>>> but I got the following table using the following code.
>>>>>>
>>>>>> FinalData1
>>>>>>
>>>>>> B1 B2
>>>>>>
>>>>>> id_X "A" "A"
>>>>>>
>>>>>> id_Y "A" "B"
>>>>>>
>>>>>>
>>>>>> the code and the example data I used are given below. Is there any
>>>>>> suggestions to fix the problem?
>>>>>>
>>>>>>
>>>>>> dat<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L), .Label =
>>>>>> c("id_X",
>>>>>>
>>>>>>
>>>>>> "id_Y"), class = "factor"), EventDate = structure(c(4L, 5L, 2L,
>>>>>>
>>>>>> 3L, 1L), .Label = c("9/15/16", "9/15/17", "9/7/16", "9/8/16",
>>>>>>
>>>>>> "9/9/16"), class = "factor"), timeGroup = structure(c(1L, 1L,
>>>>>>
>>>>>> 2L, 1L, 2L), .Label = c("B1", "B2"), class = "factor"), SITE =
>>>>>> structure(c(
>>>>>> 1L,
>>>>>>
>>>>>> 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor")), .Names =
>>>>>> c("ID",
>>>>>>
>>>>>> "EventDate", "timeGroup", "SITE"), class = "data.frame", row.names =
>>>>>> c(NA,
>>>>>>
>>>>>> -5L))
>>>>>>
>>>>>>
>>>>>> tmp <- split(dat, dat$ID)
>>>>>>
>>>>>> tmp1 <- do.call(rbind, lapply(tmp, function(dat){
>>>>>>
>>>>>> tb <- table(dat$timeGroup)
>>>>>>
>>>>>> idx <- which(tb>0)
>>>>>>
>>>>>> tb1 <- replace(tb, idx, as.character(dat$SITE))
>>>>>>
>>>>>> }))
>>>>>>
>>>>>>
>>>>>> tmp1
>>>>>>
>>>>>> FinalData<-print(tmp1, quote=FALSE)
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil using dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list