[R] Dplyr question

Wed Jun 22 01:38:41 CEST 2022

dplyr != tidyverse.

And questions that presuppose a solution method demonstrate closed-mindedness, and reduce the chance that novel solutions will be put forth, or that the questioner will actually learn something instead of using the list as a glorified search engine.

It is not that answers should avoid contributed packages, but questioners should be open to all answers. This is not the tidyverse help list, it is the R-help list, and telling people to stay quiet if they cannot conform to the biases of the questioner is self-defeating, since the point of the list is to learn how problems can be solved. An open mind can learn for instance that searching for functions that reshape dataframes in dplyr is doomed and that those functions are found elsewhere can use that info to successfully form Google queries later without coming back to the list later for the same question.

FWIW I use certain tidyverse packages frequently. But solutions using them are not always as useful for learning opportunities as base R solutions are.

Oh, and pipes are orthogonal to reshaping... searching for magrittr and reshaping isn't a productive association for searches.

On June 21, 2022 4:13:50 PM PDT, Avi Gross via R-help <r-help using r-project.org> wrote:
>Bert and Others,
>
>Now that newer versions of R support a reasonable pipeline method, I think there may be more interest in using functions designed to be easy to use in pipelines, including wrappers that just re-arrange the order for existing functions to make the first argument the one passed along the pipeline.
>
>When people say "dplyr' now it is indeed a specific package but  some use it to mean more like the "tidyverse" group of packages that are meant to operate well together and they includes the "tidyr" package.
>
>The intuitively OBVIOUS solution using base R that is shown is actually a bit restricted and does not trivially scale up to deal with lots more columns that are to be consolidated, perhaps in multiple batches and based on things like suffixes in the names and so on that the tidyverse functions are able to handle. And if it matters, you may want to keep the order of the rows relatively intact and the solution offered does not.
>
>But packages like dplyr are not a full solution and most people would be better off learning all about what core R offers and only supplementing it here and there with selected packages. If you ever have to read code others wrote or modified, ...
>
>In any case, THIS forum seems dedicated for a purpose that precludes more than an aside about packages. Very little that is in packages cannot in theory be done using mostly regular R but I am not sure if that is any longer true or wise. Many packages re-write R functionality as something like much faster code in C or C++ or make use of some R code that is more efficient than you might cobble together on your own. Some is also very general and allows programming at higher levels of abstraction and I specifically include the pipeline methods (now also in R) as such a level of abstraction.
>The topic, loosely, was how to transform your data.frame (or equivalent) from what some call WIDE form to LONG form. That is often done in pipelines where after some steps, the resulting data has to be transformed before being given to a program like one doing graphics with ggplot() and no amount of lecturing suggesting we use native R graphics for everything will in the slightest bit convince me.
>
>So the supplied method, unless suitably placed in a function that takes a data.frame as a first argument and returns the modified new one as a result, will only help for some purposes and be a pain for others as you pause and leave any pipeline to make the change and then ...
>
>As was said, intuitive is fairly meaningless as my personal intuition often intuits multiple ways of looking at something, each one being its own intuitive way and the task often is simply to pick one based on additional factors. It may be intuitively obvious to do it the shortest and easiest way imaginable but also obvious if you will need this again, to make it properly commented and documented and even at times do more error checking or do more general tasks ...
>At MY stage I think I know enough but also see no reason to waste lots of time doing things in many steps with lots of possible mistakes on my part when a few well-coordinated and tested packages make it easy.
>To each their own. But I am NOT suggesting this forum should change, there are others that can accommodate people. And there are way more packages out there that most of us are not even aware of exist!
>
>
>-----Original Message-----
>From: Bert Gunter <bgunter.4567 using gmail.com>
>To: Rui Barradas <ruipbarradas using sapo.pt>
>Cc: r-help using r-project.org <r-help using r-project.org>; Thomas Subia <thomas.subia using fmindustries.com>
>Sent: Tue, Jun 21, 2022 2:25 pm
>Subject: Re: [R] Dplyr question
>
>Heh heh. Well "intuitiveness" is in the mind of the intuiter. ;-)
>One might even say that Jeff's and John's solutions were the most
>"intuitive" as they involved nothing more than the "straightforward"
>application of standard base R functionality. (Do note the scare quotes
>around 'straightforward'.) Of course, other factors may well be decisive,
>such as efficiency, generalizability to the *real* problem and data, and so
>forth.
>
>Best to all,
>Bert
>
>On Tue, Jun 21, 2022 at 10:50 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
>> Hello,
>>
>> pivot_longer is a package tidyr function, not dplyr. I find its syntax
>> very intuitive. Here is a solution.
>>
>>
>>
>> x <- "Time_stamp    P1A0B0D P190-90D
>> 'Jun-10 10:34'  -0.000208  -0.000195
>> 'Jun-10 10:51'  -0.000228  -0.000188
>> 'Jun-10 11:02'  -0.000234  -0.000204
>> 'Jun-10 11:17'  -0.00022    -0.000205
>> 'Jun-10 11:25'  -0.000238  -0.000195"
>> df1 <- read.table(textConnection(x), header = TRUE, check.names = FALSE)
>>
>> suppressPackageStartupMessages({
>>    library(dplyr)
>>    library(tidyr)
>> })
>>
>> df1 %>%
>>    pivot_longer(
>>      cols = -Time_stamp,    # or starts_with("P1")
>>      names_to = "Location",
>>      values_to = "Measurement"
>>    ) %>%
>>    arrange(desc(Location), Time_stamp)
>> #> # A tibble: 10 × 3
>> #>    Time_stamp  Location Measurement
>> #>    <chr>        <chr>          <dbl>
>> #>  1 Jun-10 10:34 P1A0B0D    -0.000208
>> #>  2 Jun-10 10:51 P1A0B0D    -0.000228
>> #>  3 Jun-10 11:02 P1A0B0D    -0.000234
>> #>  4 Jun-10 11:17 P1A0B0D    -0.00022
>> #>  5 Jun-10 11:25 P1A0B0D    -0.000238
>> #>  6 Jun-10 10:34 P190-90D  -0.000195
>> #>  7 Jun-10 10:51 P190-90D  -0.000188
>> #>  8 Jun-10 11:02 P190-90D  -0.000204
>> #>  9 Jun-10 11:17 P190-90D  -0.000205
>> #> 10 Jun-10 11:25 P190-90D  -0.000195
>>
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Às 17:22 de 21/06/2022, Thomas Subia escreveu:
>> > Colleagues:
>> >
>> > The header of my data set is:
>> > Time_stamp    P1A0B0D P190-90D
>> > Jun-10 10:34  -0.000208      -0.000195
>> > Jun-10 10:51  -0.000228      -0.000188
>> > Jun-10 11:02  -0.000234      -0.000204
>> > Jun-10 11:17  -0.00022        -0.000205
>> > Jun-10 11:25  -0.000238      -0.000195
>> >
>> > I want my data set to resemble:
>> >
>> > Time_stamp    Location        Measurement
>> > Jun-10 10:34  P1A0B0D -0.000208
>> > Jun-10 10:51  P1A0B0D -0.000228
>> > Jun-10 11:02  P1A0B0D -0.000234
>> > Jun-10 11:17  P1A0B0D -0.00022
>> > Jun-10 11:25  P1A0B0D -0.000238
>> > Jun-10 10:34  P190-90D        -0.000195
>> > Jun-10 10:51  P190-90D        -0.000188
>> > Jun-10 11:02  P190-90D        -0.000204
>> > Jun-10 11:17  P190-90D        -0.000205
>> > Jun-10 11:25  P190-90D        -0.000195
>> >
>> > I need some advice on how to do this using dplyr.
>> >
>> > V/R
>> > Thomas Subia
>> >
>> > FM Industries, Inc. - NGK Electronics, USA | www.fmindustries.com
>> > 221 Warren Ave, Fremont, CA 94539
>> >
>> > "En Dieu nous avons confiance, tous les autres doivent apporter des
>> donnees"
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>    [[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.