[R] Spread data.frame on 2 variables

Ulrik Stervbo ulrik.stervbo at gmail.com
Thu Jul 28 14:21:55 CEST 2016


Hi Georg,

it is difficult to figure out what happens between your expectation and the
outcome if we cannot see a minimal dataset.

Based on your description I did this

library(tidyr)
library(dplyr)

test_df <- data_frame(channel = LETTERS[1:5], unit = letters[1:5], custID =
c(1:5), dummy = 1)
test_df %>% spread(channel, dummy) %>% mutate(dummy = 1) %>% spread(unit,
dummy)

which seems to be working fine as I get wide data. If a combination is
missing in the long form it will also be missing in the wide form. Maybe
you are looking for something like this:

channel_wide <- test_df  %>% select(channel, custID) %>% spread(channel,
custID)
unit_wide <- test_df  %>% select(unit, custID) %>% spread(unit, custID)
bind_cols(channel_wide, unit_wide)

Apologies for the HTML - it's gmail

Best wishes,
Ulrik

On Thu, 28 Jul 2016 at 13:54 <G.Maubach at weinwolf.de> wrote:

> Hi Ulrik,
>
> I have included a reproducable example. I ran the code and it did exactly
> what I wanted to show you.
>
> You are right: the solution shall merge cases in the end cause the values
> on the variables are either missing or the same.
>
> Example 1: Values are the same
> If you look at 6 and 7 and variable 70 the value is 1 in both cases. This
> is in this context the same information and cases 6 and 7 with custID can
> be merged to 1 for variable 70.
>
> Example 2: Values are missing and not missing
> If you look at cases 8 and 9 the value for case 8 at variable 40, 50 and
> 2000 is missing whereas the variables 40, 50 and 2000 have all 1 for case
> 9. Case 8 and 9 could be merged together cause the missing values are
> overwritten what is correct in this case.
>
> The solution I am looking for is to transform the data from long into wide
> form and keep all but missing value information.
>
> Did I explain my problem in a comprehensible way? Are there any further
> questions?
>
> Kind regards
>
> Georg
>
>
>
>
>
> Von:    Ulrik Stervbo <ulrik.stervbo at gmail.com>
> An:     G.Maubach at weinwolf.de, r-help at r-project.org,
> Datum:  28.07.2016 12:59
> Betreff:        Re: [R] Spread data.frame on 2 variables
>
>
>
> Hi Georg,
>
> it's hard to tell without a reproducible example.
>
> Should spread really merge elements? Does spread know anything about
> CustID? Maybe you need to make a useful key of the CustIDs first and
> spread on that?
>
> Maybe I'm all off, because I'm really just guessing.
>
> Best,
> Ulrik
>
> On Thu, 28 Jul 2016 at 12:36 <G.Maubach at weinwolf.de> wrote:
> Hi All,
>
> I need to spread a data.frame on 2 variables, e. g. "channel" and "unit".
>
> If I do it in two steps spreads keeps all cases that does not look like
> the one before although it contains the same values for a specific case.
>
> Here is what I have right now:
>
> -- cut --
>
> test1$dummy <- 1
> test2 <- spread(data = test1, key = 'channel', value = "dummy")
> test2
> cat("First spread is OK!")
>
> test2$dummy <- 1
> test3 <- spread(data = test2, key = 'unit', value = 'dummy')
>
> test1
> # test2
> test3
> warning(paste0("Second spread is not OK cause spread does not merge
> cases\n",
>                "with CustID 700 and 800 into one case,\n",
>                "cause they have values on different variables,\n",
>                "although the corresponding values of the cases with",
>                "custID 700 and 800 are missing."))
>
> cat("What I would like to have is:\n")
> target4 <- structure(list(custID = c(100, 200, 300, 500, 600, 700, 800,
> 900),
>   `10` = c(1, NA, NA, NA, NA, NA, NA, NA),
>   `20` = c(1, NA, NA, NA, NA, NA, NA, NA),
>   `30` = c(NA, NA, NA, NA, NA, NA, 1, 1),
>   `40` = c(NA, NA, NA, NA, 1, NA, 1, 1),
>   `50` = c(NA, NA, 1, NA, NA, NA, 1, 1),
>   `60` = c(NA, NA, NA, NA, NA, 1, NA, NA),
>   `70` = c(NA, NA, NA, NA, NA, 1, NA, NA),
>   `99` = c(NA, 1, NA, 1, NA, NA, NA, NA),
>   `1000` = c(1, NA, NA, NA, NA, NA, 1, 1),
>   `2000` = c(NA, NA, NA, NA, 1, 1, 1, NA),
>   `3000` = c(NA, NA, 1, NA, NA, 1, NA, NA),
>   `4000` = c(NA, NA, 1, NA, NA, NA, NA, NA),
>   `6000` = c(NA, NA, NA, NA, 1, NA, NA, NA),
>   `9999` = c(NA, 1, NA, 1, NA, NA, NA, NA)),
> .Names = c("custID",
>  "10",  "20",  "30",  "40",  "50",  "60",  "70",  "99",
>  "1000",  "2000",  "3000",  "4000",  "6000",  "9999"),
> row.names = c(NA, 8L), class = "data.frame")
>
> target4
>
> cat("What would be a proper way to create target4 from test1?")
>
> -- cut --
>
> What would be the proper way to create target4 from test1?
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list