[R] COVID-19 datasets...

Deepayan Sarkar deep@y@n@@@rk@r @end|ng |rom gm@||@com
Thu May 7 13:12:50 CEST 2020


On Thu, May 7, 2020 at 4:16 PM Thomas Petzoldt <thpe using simecol.de> wrote:
>
> On 07.05.2020 at 11:19 Deepayan Sarkar wrote:
> > On Thu, May 7, 2020 at 12:58 AM Thomas Petzoldt <thpe using simecol.de> wrote:
> >>
> >> Sorry if I'm joining a little bit late.
> >>
> >> I've put some related links and scripts together a few weeks ago. Then I
> >> stopped with this, because there is so much.
> >>
> >> The data format employed by John Hopkins CSSE was sort of a big surprise
> >> to me.
> >
> > Why? I find it quite convenient to drop the first few columns and
> > extract the data as a matrix (using data.matrix()).
> >
> > -Deepayan
>
> Many thanks for the hint to use data.matrix
>
> My aim was not to say that it is difficult, especially as R has all the
> tools for data mangling.
>
> My surprise was that "wide tables" and non-ISO dates as column names are
> not the "data base way" that we in general teach to our students

Well, I am all for long format data when it makes sense, but I would
disagree that that is always the "right approach". In the case of
regular multiple time series, as in this context, a matrix-like
structure seems much more natural (and nicely handled by ts() in R),
and I wouldn't even bother reshaping the data in the first place.

See, for example,

https://github.com/deepayan/deepayan.github.io/blob/master/covid-19/deaths.rmd

and

https://deepayan.github.io/covid-19/deaths.html

-Deepayan

> With reshape2::melt or tidyr::gather resp. pivot_longer, conversion is
> quite easy, regardless if one wants to use tidyverse or not, see example
> below.
>
> Again, thanks, Thomas
>
>
> library("dplyr")
> library("readr")
> library("tidyr")
>
> file <-
> "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
>
> dat <- read_delim(file, delim=",")
> names(dat)[1:2] <- c("Province_State", "Country_Region")
> dat2 <-
>    dat %>%
>    ## summarize Country/Region duplicates
>    group_by(Country_Region) %>% summarise_at(vars(-(1:4)), sum) %>%
>    ## make it a long table
>    pivot_longer(cols = -Country_Region, names_to = "time") %>%
>    ## convert to ISO 8601 date
>    mutate(time = as.POSIXct(time, format="%m/%e/%y"))
>
>
>
> >
> >> An opposite approach was taken in Germany, that organized it as a
> >> big JSON trees.
> >>
> >> Fortunately, both can be "tidied" with R, and represent good didactic
> >> examples for our students.
> >>
> >> Here yet another repo linking to the data:
> >>
> >> https://github.com/tpetzoldt/covid
> >>
> >>
> >> Thomas
> >>
> >>
> >> On 04.05.2020 at 20:48 James Spottiswoode wrote:
> >>> Sure. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is available here:
> >>>
> >>> https://github.com/CSSEGISandData/COVID-19
> >>>
> >>> All in csv fiormat.
> >>>
> >>>
> >>>> On May 4, 2020, at 11:31 AM, Bernard McGarvey <mcgarvey.bernard using comcast.net> wrote:
> >>>>
> >>>> Just curious does anyone know of a website that has data available in a format that R can download and analyze?
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>> Bernard McGarvey
> >>>>
> >>>>
> >>>> Director, Fort Myers Beach Lions Foundation, Inc.
> >>>>
> >>>>
> >>>> Retired (Lilly Engineering Fellow).
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>> James Spottiswoode
> >>> Applied Mathematics & Statistics
> >>> (310) 270 6220
> >>> jamesspottiswoode Skype
> >>> james using jsasoc.com
> >>>
>
> --
> Dr. Thomas Petzoldt
> senior scientist
>
> Technische Universitaet Dresden
> Faculty of Environmental Sciences
> Institute of Hydrobiology
> 01062 Dresden, Germany
>
> https://tu-dresden.de/Members/thomas.petzoldt



More information about the R-help mailing list