[R] Dplyr question

Wed Jun 22 01:13:50 CEST 2022

Bert and Others,

Now that newer versions of R support a reasonable pipeline method, I think there may be more interest in using functions designed to be easy to use in pipelines, including wrappers that just re-arrange the order for existing functions to make the first argument the one passed along the pipeline.

When people say "dplyr' now it is indeed a specific package but  some use it to mean more like the "tidyverse" group of packages that are meant to operate well together and they includes the "tidyr" package.

The intuitively OBVIOUS solution using base R that is shown is actually a bit restricted and does not trivially scale up to deal with lots more columns that are to be consolidated, perhaps in multiple batches and based on things like suffixes in the names and so on that the tidyverse functions are able to handle. And if it matters, you may want to keep the order of the rows relatively intact and the solution offered does not.

But packages like dplyr are not a full solution and most people would be better off learning all about what core R offers and only supplementing it here and there with selected packages. If you ever have to read code others wrote or modified, ...

In any case, THIS forum seems dedicated for a purpose that precludes more than an aside about packages. Very little that is in packages cannot in theory be done using mostly regular R but I am not sure if that is any longer true or wise. Many packages re-write R functionality as something like much faster code in C or C++ or make use of some R code that is more efficient than you might cobble together on your own. Some is also very general and allows programming at higher levels of abstraction and I specifically include the pipeline methods (now also in R) as such a level of abstraction.
The topic, loosely, was how to transform your data.frame (or equivalent) from what some call WIDE form to LONG form. That is often done in pipelines where after some steps, the resulting data has to be transformed before being given to a program like one doing graphics with ggplot() and no amount of lecturing suggesting we use native R graphics for everything will in the slightest bit convince me.

So the supplied method, unless suitably placed in a function that takes a data.frame as a first argument and returns the modified new one as a result, will only help for some purposes and be a pain for others as you pause and leave any pipeline to make the change and then ...

As was said, intuitive is fairly meaningless as my personal intuition often intuits multiple ways of looking at something, each one being its own intuitive way and the task often is simply to pick one based on additional factors. It may be intuitively obvious to do it the shortest and easiest way imaginable but also obvious if you will need this again, to make it properly commented and documented and even at times do more error checking or do more general tasks ...
At MY stage I think I know enough but also see no reason to waste lots of time doing things in many steps with lots of possible mistakes on my part when a few well-coordinated and tested packages make it easy.
To each their own. But I am NOT suggesting this forum should change, there are others that can accommodate people. And there are way more packages out there that most of us are not even aware of exist!

-----Original Message-----
From: Bert Gunter <bgunter.4567 using gmail.com>
To: Rui Barradas <ruipbarradas using sapo.pt>
Cc: r-help using r-project.org <r-help using r-project.org>; Thomas Subia <thomas.subia using fmindustries.com>
Sent: Tue, Jun 21, 2022 2:25 pm
Subject: Re: [R] Dplyr question

Heh heh. Well "intuitiveness" is in the mind of the intuiter. ;-)
One might even say that Jeff's and John's solutions were the most
"intuitive" as they involved nothing more than the "straightforward"
application of standard base R functionality. (Do note the scare quotes
around 'straightforward'.) Of course, other factors may well be decisive,
such as efficiency, generalizability to the *real* problem and data, and so
forth.

Best to all,
Bert

On Tue, Jun 21, 2022 at 10:50 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Hello,
>
> pivot_longer is a package tidyr function, not dplyr. I find its syntax
> very intuitive. Here is a solution.
>
>
>
> x <- "Time_stamp    P1A0B0D P190-90D
> 'Jun-10 10:34'  -0.000208  -0.000195
> 'Jun-10 10:51'  -0.000228  -0.000188
> 'Jun-10 11:02'  -0.000234  -0.000204
> 'Jun-10 11:17'  -0.00022    -0.000205
> 'Jun-10 11:25'  -0.000238  -0.000195"
> df1 <- read.table(textConnection(x), header = TRUE, check.names = FALSE)
>
> suppressPackageStartupMessages({
>    library(dplyr)
>    library(tidyr)
> })
>
> df1 %>%
>    pivot_longer(
>      cols = -Time_stamp,    # or starts_with("P1")
>      names_to = "Location",
>      values_to = "Measurement"
>    ) %>%
>    arrange(desc(Location), Time_stamp)
> #> # A tibble: 10 × 3
> #>    Time_stamp  Location Measurement
> #>    <chr>        <chr>          <dbl>
> #>  1 Jun-10 10:34 P1A0B0D    -0.000208
> #>  2 Jun-10 10:51 P1A0B0D    -0.000228
> #>  3 Jun-10 11:02 P1A0B0D    -0.000234
> #>  4 Jun-10 11:17 P1A0B0D    -0.00022
> #>  5 Jun-10 11:25 P1A0B0D    -0.000238
> #>  6 Jun-10 10:34 P190-90D  -0.000195
> #>  7 Jun-10 10:51 P190-90D  -0.000188
> #>  8 Jun-10 11:02 P190-90D  -0.000204
> #>  9 Jun-10 11:17 P190-90D  -0.000205
> #> 10 Jun-10 11:25 P190-90D  -0.000195
>
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 17:22 de 21/06/2022, Thomas Subia escreveu:
> > Colleagues:
> >
> > The header of my data set is:
> > Time_stamp    P1A0B0D P190-90D
> > Jun-10 10:34  -0.000208      -0.000195
> > Jun-10 10:51  -0.000228      -0.000188
> > Jun-10 11:02  -0.000234      -0.000204
> > Jun-10 11:17  -0.00022        -0.000205
> > Jun-10 11:25  -0.000238      -0.000195
> >
> > I want my data set to resemble:
> >
> > Time_stamp    Location        Measurement
> > Jun-10 10:34  P1A0B0D -0.000208
> > Jun-10 10:51  P1A0B0D -0.000228
> > Jun-10 11:02  P1A0B0D -0.000234
> > Jun-10 11:17  P1A0B0D -0.00022
> > Jun-10 11:25  P1A0B0D -0.000238
> > Jun-10 10:34  P190-90D        -0.000195
> > Jun-10 10:51  P190-90D        -0.000188
> > Jun-10 11:02  P190-90D        -0.000204
> > Jun-10 11:17  P190-90D        -0.000205
> > Jun-10 11:25  P190-90D        -0.000195
> >
> > I need some advice on how to do this using dplyr.
> >
> > V/R
> > Thomas Subia
> >
> > FM Industries, Inc. - NGK Electronics, USA | www.fmindustries.com
> > 221 Warren Ave, Fremont, CA 94539
> >
> > "En Dieu nous avons confiance, tous les autres doivent apporter des
> donnees"
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

    [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]