[R] Gather columns based on multiple columns using tidyr

fahad.usman at openreach.co.uk fahad.usman at openreach.co.uk
Fri Dec 18 18:55:44 CET 2015


Hi David,

I did something similar:

years <- scotland_weather%>%
                        select(starts_with("Year"))%>%
                        gather(year.col,year)%>%
                        select(-year.col)

months <- scotland_weather[seq(1, 24, 2)]%>%
          gather(month,rainfall_mm)


scotland_weather <- cbind(years,months)

However, would have liked more sophisticated solution using tidyr

Regards,

Fahad Usman
Network Engineering | CIO | Openreach 
Web: www.openreach.co.uk

-----Original Message-----
From: David L Carlson [mailto:dcarlson at tamu.edu] 
Sent: 17 December 2015 20:02
To: Ista Zahn; Usman,F,Fahad,BVG1C3 R
Cc: r-help at r-project.org
Subject: RE: [R] Gather columns based on multiple columns using tidyr

Here's another approach, but it doesn't use tidy or dplyr:

> Rain.col <- seq(1, 24, by=2)
> Year.col <- seq(2, 24, by =2)
> Scotland.lst <- lapply(1:12, function(x) data.frame(Year =
+      scotland_weather[ , Year.col[x]], month = month.abb[x],
+       rainfall_mm = scotland_weather[ , Rain.col[x]]))
> Scotland.df <- do.call(rbind, Scotland.lst)
> head(Scotland.df)
  Year month rainfall_mm
1 1993   Jan       293.8
2 1928   Jan       292.2
3 2008   Jan       275.6
4 2015   Jan       252.3
5 1974   Jan       246.2
6 1975   Jan       245.0
> tail(Scotland.df)
   Year month rainfall_mm
67 2013   Dec       300.7
68 1986   Dec       268.5
69 1929   Dec       267.2
70 2011   Dec       265.4
71 2006   Dec       264.0
72 1912   Dec       261.0

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ista Zahn
Sent: Thursday, December 17, 2015 1:49 PM
To: fahad.usman at openreach.co.uk
Cc: r-help at r-project.org
Subject: Re: [R] Gather columns based on multiple columns using tidyr

Hi Fahad,

Easier than what? You didn't tell us what you tried, nor why you were unhappy with it. I'm only passingly familiar with tidyr, but I came up with


library(tidyr)
library(dplyr)

read.table("scotland_rainfall.txt", skip = 7, header=TRUE, fill = TRUE) %>%
  select(-WIN, -SPR, -SUM, -AUT, -ANN) %>%
  gather(junk1, year, starts_with("Year")) %>%
  gather(month, rainfall_mm, one_of(toupper(month.abb))) %>%
  arrange(year, month) %>%
  select(-junk1) -> scotland_weather

Best,
Ista

On Thu, Dec 17, 2015 at 10:24 AM,  <fahad.usman at openreach.co.uk> wrote:
> Hi,
>
> Sorry for this direct approach but I am stuck with a stupid data that I would like to reformat.
>
> The datafile is location at: fileURL <- 
> http://www.metoffice.gov.uk/pub/data/weather/uk/climate/datasets/Rainf
> all/ranked/Scotland.txt
>
> You can read the data by:
>
> if(!file.exists("scotland_rainfall.txt")){
>         #this will download the file in the current working directory
>         download.file(fileURL,destfile = "scotland_rainfall.txt")
>         dateDownload <- Sys.Date() #15-12-2015
>
> }
>
>> head(scotland_weather)
>     Jan Year.1   Feb Year.2   Mar Year.3   Apr Year.4   May Year.5   Jun Year.6   Jul Year.7   Aug Year.8   Sep Year.9   Oct Year.10   Nov Year.11   Dec Year.12
> 1 293.8   1993 278.1   1990 238.5   1994 191.1   1947 191.4   2011 155.0   1938 185.6   1940 216.5   1985 267.6   1950 258.1    1935 262.0    2009 300.7    2013
> 2 292.2   1928 258.8   1997 233.4   1990 149.0   1910 168.7   1986 137.9   2002 181.4   1988 211.9   1992 221.2   1981 254.0    1954 245.3    2015 268.5    1986
> 3 275.6   2008 244.7   2002 201.3   1992 146.8   1934 155.9   1925 137.8   1948 170.1   1939 202.3   2009 193.9   1982 248.8    2014 244.8    1938 267.2    1929
> 4 252.3   2015 227.9   1989 200.2   1967 142.1   1949 149.5   2015 137.7   1931 165.8   2010 191.4   1962 189.7   2011 247.7    1938 242.2    2006 265.4    2011
> 5 246.2   1974 224.9   2014 180.2   1979 133.5   1950 137.4   2003 135.0   1966 162.9   1956 190.3   2014 189.7   1927 242.3    1983 231.3    1917 264.0    2006
> 6 245.0   1975 195.6   1995 180.0   1989 132.9   1932 129.7   2007 131.7   2004 159.9   1985 189.1   2004 189.6   1985 240.9    2001 229.9    1981 261.0    1912
>
>
>> tail(scotland_weather)
>
>      Jan Year.1  Feb Year.2  Mar Year.3  Apr Year.4  May Year.5  Jun 
> Year.6  Jul Year.7  Aug Year.8  Sep Year.9  Oct Year.10  Nov Year.11  
> Dec Year.12
>
> 101 71.2   1987 34.4   1947 50.9   1918 44.6   1982 34.1   1978 38.8   1940 49.2   2005 46.2   2003 50.7   2015 76.5    1973 57.1    1942 62.7    2010
>
> 102 57.9   1997 33.7   1917 44.4   1953 38.5   1918 32.1   1919 36.9   1932 47.8   1989 46.1   1983 49.6   1959 74.6    1922 54.9    1958 59.9    1963
>
> 103 57.9   1941 31.8   1963 39.7   1924 31.7   1981 28.8   1994 33.2   1921 45.8   1983 37.6   1955 48.5   1910 69.9    1972 53.9    1925 55.0    1995
>
> 104 57.6   1940 24.2   1930 38.8   1969 29.0   1938 26.1   2008 32.8   1925 39.7   1919 33.0   1995 40.0   1933 62.9    1914 53.6    1983 43.4    1927
>
> 105 51.9   1929 20.0   1986 37.4   1931 19.8   1980 24.0   1980 30.9   1941 33.7   1955 21.9   1976 39.2   2014 60.7    1951 42.3    1937 40.2    1933
>
> 106 38.6   1963 10.3   1932 28.7   1929 14.0   1974 22.5   1984 30.1   1988 32.7   1913  5.1   1947 31.7   1972 19.4    1946 28.8    1945   NA      NA
>
> How can I format this into:
>
> Year   month     rainfall_mm
> 1993    Jan            293.8
> 1990   Feb           278.1
> ....
>
>> dput(head(scotland_weather,6))
> structure(list(Jan = c(293.8, 292.2, 275.6, 252.3, 246.2, 245 ), 
> Year.1 = c(1993L, 1928L, 2008L, 2015L, 1974L, 1975L), Feb = c(278.1,
> 258.8, 244.7, 227.9, 224.9, 195.6), Year.2 = c(1990L, 1997L, 2002L, 
> 1989L, 2014L, 1995L), Mar = c(238.5, 233.4, 201.3, 200.2, 180.2, 180),
> Year.3 = c(1994L, 1990L, 1992L, 1967L, 1979L, 1989L ), Apr = c(191.1,
> 149, 146.8, 142.1, 133.5, 132.9), Year.4 = c(1947L, 1910L, 1934L, 
> 1949L, 1950L, 1932L), May = c(191.4, 168.7, 155.9, 149.5, 137.4, 
> 129.7), Year.5 = c(2011L, 1986L, 1925L, 2015L, 2003L, 2007L), Jun = 
> c(155, 137.9, 137.8, 137.7, 135, 131.7),
>     Year.6 = c(1938L, 2002L, 1948L, 1931L, 1966L, 2004L), Jul = c(185.6,
>     181.4, 170.1, 165.8, 162.9, 159.9), Year.7 = c(1940L, 1988L,
>     1939L, 2010L, 1956L, 1985L), Aug = c(216.5, 211.9, 202.3,
>     191.4, 190.3, 189.1), Year.8 = c(1985L, 1992L, 2009L, 1962L,
>     2014L, 2004L), Sep = c(267.6, 221.2, 193.9, 189.7, 189.7,
>     189.6), Year.9 = c(1950L, 1981L, 1982L, 2011L, 1927L, 1985L
>     ), Oct = c(258.1, 254, 248.8, 247.7, 242.3, 240.9), Year.10 = c(1935L,
>     1954L, 2014L, 1938L, 1983L, 2001L), Nov = c(262, 245.3, 244.8,
>     242.2, 231.3, 229.9), Year.11 = c(2009L, 2015L, 1938L, 2006L,
>     1917L, 1981L), Dec = c(300.7, 268.5, 267.2, 265.4, 264, 261
>     ), Year.12 = c(2013L, 1986L, 1929L, 2011L, 2006L, 1912L),
>     X1.12 = c(743.6, 649.5, 645.4, 638.3, 608.9, 592.8), Year.13 = c(2014L,
>     1995L, 2000L, 2007L, 1990L, 2015L), X1.13 = c(409.5, 401.3,
>     393.7, 393.2, 391.7, 389.1), Year.14 = c(1986L, 2015L, 1994L,
>     1967L, 1992L, 1913L), X1.14 = c(455.6, 435.6, 427.8, 422.6,
>     397, 390.1), Year.15 = c(1985L, 1948L, 2009L, 1956L, 2004L,
>     1938L), X1.15 = c(661.2, 633.8, 615.8, 594.5, 590.6, 589.2
>     ), Year.16 = c(1981L, 1954L, 1938L, 1935L, 1982L, 2006L),
>     X1.16 = structure(c(105L, 104L, 103L, 102L, 101L, 100L), .Label = c("  1091.2",
>     "  1138.2", "  1158.2", "  1166.0", "  1168.8", "  1174.1",
>     "  1189.4", "  1214.2", "  1219.3", "  1220.0", "  1222.0",
>     "  1231.5", "  1239.5", "  1250.0", "  1255.4", "  1266.1",
>     "  1269.7", "  1274.2", "  1276.0", "  1281.1", "  1283.5",
>     "  1301.7", "  1305.4", "  1306.4", "  1311.0", "  1311.1",
>     "  1314.3", "  1315.8", "  1324.6", "  1325.3", "  1337.6",
>     "  1348.6", "  1351.5", "  1355.6", "  1356.1", "  1356.7",
>     "  1357.8", "  1366.9", "  1374.7", "  1376.5", "  1377.9",
>     "  1378.5", "  1390.2", "  1397.6", "  1406.7", "  1406.9",
>     "  1407.5", "  1407.9", "  1414.0", "  1425.3", "  1426.5",
>     "  1429.6", "  1430.8", "  1431.6", "  1436.4", "  1438.0",
>     "  1438.8", "  1445.9", "  1446.6", "  1448.6", "  1455.0",
>     "  1458.6", "  1459.0", "  1460.9", "  1461.3", "  1464.4",
>     "  1465.7", "  1466.4", "  1467.3", "  1473.9", "  1478.4",
>     "  1478.6", "  1491.3", "  1493.2", "  1503.9", "  1520.3",
>     "  1530.4", "  1532.5", "  1536.3", "  1558.0", "  1561.4",
>     "  1566.8", "  1579.2", "  1582.3", "  1585.0", "  1585.5",
>     "  1592.6", "  1607.8", "  1623.8", "  1627.8", "  1631.0",
>     "  1657.1", "  1670.7", "  1672.8", "  1683.6", "  1686.1",
>     "  1690.4", "  1692.9", "  1696.7", "  1716.5", "  1720.0",
>     "  1735.8", "  1756.8", "  1828.1", "  1886.4", "NA"), class = "factor"),
>     Year.17 = structure(c(102L, 81L, 105L, 29L, 99L, 45L), .Label = c("  1910",
>     "  1911", "  1912", "  1913", "  1914", "  1915", "  1916",
>     "  1917", "  1918", "  1919", "  1920", "  1921", "  1922",
>     "  1923", "  1924", "  1925", "  1926", "  1927", "  1928",
>     "  1929", "  1930", "  1931", "  1932", "  1933", "  1934",
>     "  1935", "  1936", "  1937", "  1938", "  1939", "  1940",
>     "  1941", "  1942", "  1943", "  1944", "  1945", "  1946",
>     "  1947", "  1948", "  1949", "  1950", "  1951", "  1952",
>     "  1953", "  1954", "  1955", "  1956", "  1957", "  1958",
>     "  1959", "  1960", "  1961", "  1962", "  1963", "  1964",
>     "  1965", "  1966", "  1967", "  1968", "  1969", "  1970",
>     "  1971", "  1972", "  1973", "  1974", "  1975", "  1976",
>     "  1977", "  1978", "  1979", "  1980", "  1981", "  1982",
>     "  1983", "  1984", "  1985", "  1986", "  1987", "  1988",
>     "  1989", "  1990", "  1991", "  1992", "  1993", "  1994",
>     "  1995", "  1996", "  1997", "  1998", "  1999", "  2000",
>     "  2001", "  2002", "  2003", "  2004", "  2005", "  2006",
>     "  2007", "  2008", "  2009", "  2010", "  2011", "  2012",
>     "  2013", "  2014", "NA"), class = "factor")), .Names = c("Jan",
> "Year.1", "Feb", "Year.2", "Mar", "Year.3", "Apr", "Year.4", "May", 
> "Year.5", "Jun", "Year.6", "Jul", "Year.7", "Aug", "Year.8", "Sep", 
> "Year.9", "Oct", "Year.10", "Nov", "Year.11", "Dec", "Year.12", 
> "X1.12", "Year.13", "X1.13", "Year.14", "X1.14", "Year.15", "X1.15", 
> "Year.16", "X1.16", "Year.17"), row.names = c(NA, 6L), class = 
> "data.frame")
>
> Is there an easier way to do it with tidyr?
>
> Regards,
>
> Fahad Usman
> Network Engineering | CIO | Openreach
> Web: www.openreach.co.uk<http://www.openreach.co.uk/>
> Openreach is delivering fibre broadband services to communities across the UK as well as installing and maintaining the communications infrastructure that links homes, businesses, public and voluntary sector organisations to their Communications Providers' networks.
> Think before you print! Consider the environment before printing this e-mail.
>
> This email contains BT information, which may be privileged or confidential.
> It's meant only for the individual(s) or entity named above. If you're 
> not the intended recipient, note that disclosing, copying, 
> distributing or using this information is prohibited. If you've 
> received this email in error, please let me know immediately on the email address above. Thank you.
> We monitor our email system, and may record your emails.
> British Telecommunications plc
> Registered office: 81 Newgate Street London EC1A 7AJ Registered in 
> England no: 1800000
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list