[R] Dataverse (reading files with .tab and .7z suffixes)

Sun May 13 14:04:44 CEST 2018

Ilio Fornasero writes:
> Yet, I am at this point.
>
>
>
>
> ## 01. Finding the dataverse server and making a search
> Sys.setenv("DATAVERSE_SERVER" =3D "dataverse.harvard.edu")
> dataverse_search(".Hunger")
>
>
> ## 02. Loading the dataset (in this example, I have chosen the word ".Hunge=
> r" to get
>    # one list and then picked up one out of hundreds results.
>    # The get-dataset() function has to be picked on the dynamic web address=
> )
> (dataset_ifpri <- get_dataset("https://doi.org/10.7910/DVN/ZTCWYQ"))
>
> ## 03. Grabbing the (1st) file we are interested on
> AppendixC <- get_file("001_AppendixC.tab",
>                       "https://doi.org/10.7910/DVN/ZTCWYQ")
> writeBin(AppendixC, "001_AppendixC.tab")
>
> read.table("001_AppendixC.tab")

I imagine you are using the dataverse package.

7z is more straightforward because the file format is clear.

You need to figure out the 001_AppendixC.tab file format.
On first glance it looks to me like a spreadsheet.

  $ file /tmp/001_AppendixC.tab
  /tmp/001_AppendixC.tab: Zip archive data, at least v2.0 to extract
  $ cd /tmp && unzip 001_AppendixC.tab
  $ head -n2 /tmp/xl/workbook.xml | cut -c 1-75
  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"

Once you figure out the format manually, write an R function that
figures out the format, and ask again here to find an R function that
reads the format.