[R] Dataverse (reading files with .tab and .7z suffixes)

Sun May 13 19:05:15 CEST 2018

> On May 13, 2018, at 5:04 AM, Thomas Levine <_ using thomaslevine.com> wrote:
> 
> Ilio Fornasero writes:
>> Yet, I am at this point.
>> 
>> 
>> 
>> 
>> ## 01. Finding the dataverse server and making a search
>> Sys.setenv("DATAVERSE_SERVER" =3D "dataverse.harvard.edu")
>> dataverse_search(".Hunger")
>> 
>> 
>> ## 02. Loading the dataset (in this example, I have chosen the word ".Hunge=
>> r" to get
>>   # one list and then picked up one out of hundreds results.
>>   # The get-dataset() function has to be picked on the dynamic web address=
>> )
>> (dataset_ifpri <- get_dataset("https://doi.org/10.7910/DVN/ZTCWYQ"))
>> 
>> ## 03. Grabbing the (1st) file we are interested on
>> AppendixC <- get_file("001_AppendixC.tab",
>>                      "https://doi.org/10.7910/DVN/ZTCWYQ")
>> writeBin(AppendixC, "001_AppendixC.tab")
>> 
>> read.table("001_AppendixC.tab")
> 
> I imagine you are using the dataverse package.
> 
> 7z is more straightforward because the file format is clear.
> 
> You need to figure out the 001_AppendixC.tab file format.
> On first glance it looks to me like a spreadsheet.

That website says it's tab-delimited. The read.delim (in base R) function is designed for that possibility. However the download pull-down menu that appears, seems to offer the option of deliver in a variety of formats:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.pdf
Type: application/pdf
Size: 21204 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180513/6ed61785/attachment-0002.pdf>

-------------- next part --------------

When I choose the Rdata option I get:

 fil <- load("/Users/davidwinsemius/001_AppendixC.RData")
 fil
#[1] "x"

str(x)
#-------------------
'data.frame':	132 obs. of  17 variables:
 $ Country :Class 'AsIs'  atomic [1:132] Afghanistan Albania Algeria Angola ...
  .. ..- attr(*, "comment")= chr "Country"
 $ UN9193  :Class 'AsIs'  atomic [1:132] 37.4 7.7 9.1 65.400000000000006 ...
  .. ..- attr(*, "comment")= chr "UN9193"
 $ UN9901  :Class 'AsIs'  atomic [1:132] 46.1 7.2 10.7 50 ...
------ snipped --------

-- 
David.

> 
>  $ file /tmp/001_AppendixC.tab
>  /tmp/001_AppendixC.tab: Zip archive data, at least v2.0 to extract
>  $ cd /tmp && unzip 001_AppendixC.tab
>  $ head -n2 /tmp/xl/workbook.xml | cut -c 1-75
>  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>  <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
> 
> Once you figure out the format manually, write an R function that
> figures out the format, and ask again here to find an R function that
> reads the format.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law