[R] COVID-19 datasets...

Thomas Petzoldt thpe @end|ng |rom @|meco|@de
Thu May 7 12:46:34 CEST 2020


On 07.05.2020 at 11:19 Deepayan Sarkar wrote:
> On Thu, May 7, 2020 at 12:58 AM Thomas Petzoldt <thpe using simecol.de> wrote:
>>
>> Sorry if I'm joining a little bit late.
>>
>> I've put some related links and scripts together a few weeks ago. Then I
>> stopped with this, because there is so much.
>>
>> The data format employed by John Hopkins CSSE was sort of a big surprise
>> to me.
> 
> Why? I find it quite convenient to drop the first few columns and
> extract the data as a matrix (using data.matrix()).
> 
> -Deepayan

Many thanks for the hint to use data.matrix

My aim was not to say that it is difficult, especially as R has all the 
tools for data mangling.

My surprise was that "wide tables" and non-ISO dates as column names are 
not the "data base way" that we in general teach to our students

With reshape2::melt or tidyr::gather resp. pivot_longer, conversion is 
quite easy, regardless if one wants to use tidyverse or not, see example 
below.

Again, thanks, Thomas


library("dplyr")
library("readr")
library("tidyr")

file <- 
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

dat <- read_delim(file, delim=",")
names(dat)[1:2] <- c("Province_State", "Country_Region")
dat2 <-
   dat %>%
   ## summarize Country/Region duplicates
   group_by(Country_Region) %>% summarise_at(vars(-(1:4)), sum) %>%
   ## make it a long table
   pivot_longer(cols = -Country_Region, names_to = "time") %>%
   ## convert to ISO 8601 date
   mutate(time = as.POSIXct(time, format="%m/%e/%y"))



> 
>> An opposite approach was taken in Germany, that organized it as a
>> big JSON trees.
>>
>> Fortunately, both can be "tidied" with R, and represent good didactic
>> examples for our students.
>>
>> Here yet another repo linking to the data:
>>
>> https://github.com/tpetzoldt/covid
>>
>>
>> Thomas
>>
>>
>> On 04.05.2020 at 20:48 James Spottiswoode wrote:
>>> Sure. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University is available here:
>>>
>>> https://github.com/CSSEGISandData/COVID-19
>>>
>>> All in csv fiormat.
>>>
>>>
>>>> On May 4, 2020, at 11:31 AM, Bernard McGarvey <mcgarvey.bernard using comcast.net> wrote:
>>>>
>>>> Just curious does anyone know of a website that has data available in a format that R can download and analyze?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Bernard McGarvey
>>>>
>>>>
>>>> Director, Fort Myers Beach Lions Foundation, Inc.
>>>>
>>>>
>>>> Retired (Lilly Engineering Fellow).
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> James Spottiswoode
>>> Applied Mathematics & Statistics
>>> (310) 270 6220
>>> jamesspottiswoode Skype
>>> james using jsasoc.com
>>>

-- 
Dr. Thomas Petzoldt
senior scientist

Technische Universitaet Dresden
Faculty of Environmental Sciences
Institute of Hydrobiology
01062 Dresden, Germany

https://tu-dresden.de/Members/thomas.petzoldt



More information about the R-help mailing list