[R] Dataverse (reading files with .tab and .7z suffixes)
Thomas Levine
_ @end|ng |rom thom@@|ev|ne@com
Sun May 13 14:04:44 CEST 2018
Ilio Fornasero writes:
> Yet, I am at this point.
>
>
>
>
> ## 01. Finding the dataverse server and making a search
> Sys.setenv("DATAVERSE_SERVER" =3D "dataverse.harvard.edu")
> dataverse_search(".Hunger")
>
>
> ## 02. Loading the dataset (in this example, I have chosen the word ".Hunge=
> r" to get
> # one list and then picked up one out of hundreds results.
> # The get-dataset() function has to be picked on the dynamic web address=
> )
> (dataset_ifpri <- get_dataset("https://doi.org/10.7910/DVN/ZTCWYQ"))
>
> ## 03. Grabbing the (1st) file we are interested on
> AppendixC <- get_file("001_AppendixC.tab",
> "https://doi.org/10.7910/DVN/ZTCWYQ")
> writeBin(AppendixC, "001_AppendixC.tab")
>
> read.table("001_AppendixC.tab")
I imagine you are using the dataverse package.
7z is more straightforward because the file format is clear.
You need to figure out the 001_AppendixC.tab file format.
On first glance it looks to me like a spreadsheet.
$ file /tmp/001_AppendixC.tab
/tmp/001_AppendixC.tab: Zip archive data, at least v2.0 to extract
$ cd /tmp && unzip 001_AppendixC.tab
$ head -n2 /tmp/xl/workbook.xml | cut -c 1-75
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
Once you figure out the format manually, write an R function that
figures out the format, and ask again here to find an R function that
reads the format.
More information about the R-help
mailing list