[R] Spread data.frame on 2 variables
Ulrik Stervbo
ulrik.stervbo at gmail.com
Thu Jul 28 14:21:55 CEST 2016
Hi Georg,
it is difficult to figure out what happens between your expectation and the
outcome if we cannot see a minimal dataset.
Based on your description I did this
library(tidyr)
library(dplyr)
test_df <- data_frame(channel = LETTERS[1:5], unit = letters[1:5], custID =
c(1:5), dummy = 1)
test_df %>% spread(channel, dummy) %>% mutate(dummy = 1) %>% spread(unit,
dummy)
which seems to be working fine as I get wide data. If a combination is
missing in the long form it will also be missing in the wide form. Maybe
you are looking for something like this:
channel_wide <- test_df %>% select(channel, custID) %>% spread(channel,
custID)
unit_wide <- test_df %>% select(unit, custID) %>% spread(unit, custID)
bind_cols(channel_wide, unit_wide)
Apologies for the HTML - it's gmail
Best wishes,
Ulrik
On Thu, 28 Jul 2016 at 13:54 <G.Maubach at weinwolf.de> wrote:
> Hi Ulrik,
>
> I have included a reproducable example. I ran the code and it did exactly
> what I wanted to show you.
>
> You are right: the solution shall merge cases in the end cause the values
> on the variables are either missing or the same.
>
> Example 1: Values are the same
> If you look at 6 and 7 and variable 70 the value is 1 in both cases. This
> is in this context the same information and cases 6 and 7 with custID can
> be merged to 1 for variable 70.
>
> Example 2: Values are missing and not missing
> If you look at cases 8 and 9 the value for case 8 at variable 40, 50 and
> 2000 is missing whereas the variables 40, 50 and 2000 have all 1 for case
> 9. Case 8 and 9 could be merged together cause the missing values are
> overwritten what is correct in this case.
>
> The solution I am looking for is to transform the data from long into wide
> form and keep all but missing value information.
>
> Did I explain my problem in a comprehensible way? Are there any further
> questions?
>
> Kind regards
>
> Georg
>
>
>
>
>
> Von: Ulrik Stervbo <ulrik.stervbo at gmail.com>
> An: G.Maubach at weinwolf.de, r-help at r-project.org,
> Datum: 28.07.2016 12:59
> Betreff: Re: [R] Spread data.frame on 2 variables
>
>
>
> Hi Georg,
>
> it's hard to tell without a reproducible example.
>
> Should spread really merge elements? Does spread know anything about
> CustID? Maybe you need to make a useful key of the CustIDs first and
> spread on that?
>
> Maybe I'm all off, because I'm really just guessing.
>
> Best,
> Ulrik
>
> On Thu, 28 Jul 2016 at 12:36 <G.Maubach at weinwolf.de> wrote:
> Hi All,
>
> I need to spread a data.frame on 2 variables, e. g. "channel" and "unit".
>
> If I do it in two steps spreads keeps all cases that does not look like
> the one before although it contains the same values for a specific case.
>
> Here is what I have right now:
>
> -- cut --
>
> test1$dummy <- 1
> test2 <- spread(data = test1, key = 'channel', value = "dummy")
> test2
> cat("First spread is OK!")
>
> test2$dummy <- 1
> test3 <- spread(data = test2, key = 'unit', value = 'dummy')
>
> test1
> # test2
> test3
> warning(paste0("Second spread is not OK cause spread does not merge
> cases\n",
> "with CustID 700 and 800 into one case,\n",
> "cause they have values on different variables,\n",
> "although the corresponding values of the cases with",
> "custID 700 and 800 are missing."))
>
> cat("What I would like to have is:\n")
> target4 <- structure(list(custID = c(100, 200, 300, 500, 600, 700, 800,
> 900),
> `10` = c(1, NA, NA, NA, NA, NA, NA, NA),
> `20` = c(1, NA, NA, NA, NA, NA, NA, NA),
> `30` = c(NA, NA, NA, NA, NA, NA, 1, 1),
> `40` = c(NA, NA, NA, NA, 1, NA, 1, 1),
> `50` = c(NA, NA, 1, NA, NA, NA, 1, 1),
> `60` = c(NA, NA, NA, NA, NA, 1, NA, NA),
> `70` = c(NA, NA, NA, NA, NA, 1, NA, NA),
> `99` = c(NA, 1, NA, 1, NA, NA, NA, NA),
> `1000` = c(1, NA, NA, NA, NA, NA, 1, 1),
> `2000` = c(NA, NA, NA, NA, 1, 1, 1, NA),
> `3000` = c(NA, NA, 1, NA, NA, 1, NA, NA),
> `4000` = c(NA, NA, 1, NA, NA, NA, NA, NA),
> `6000` = c(NA, NA, NA, NA, 1, NA, NA, NA),
> `9999` = c(NA, 1, NA, 1, NA, NA, NA, NA)),
> .Names = c("custID",
> "10", "20", "30", "40", "50", "60", "70", "99",
> "1000", "2000", "3000", "4000", "6000", "9999"),
> row.names = c(NA, 8L), class = "data.frame")
>
> target4
>
> cat("What would be a proper way to create target4 from test1?")
>
> -- cut --
>
> What would be the proper way to create target4 from test1?
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list