[R] Dataverse (reading files with .tab and .7z suffixes)
David Winsemius
dw|n@em|u@ @end|ng |rom comc@@t@net
Sun May 13 19:05:15 CEST 2018
> On May 13, 2018, at 5:04 AM, Thomas Levine <_ using thomaslevine.com> wrote:
>
> Ilio Fornasero writes:
>> Yet, I am at this point.
>>
>>
>>
>>
>> ## 01. Finding the dataverse server and making a search
>> Sys.setenv("DATAVERSE_SERVER" =3D "dataverse.harvard.edu")
>> dataverse_search(".Hunger")
>>
>>
>> ## 02. Loading the dataset (in this example, I have chosen the word ".Hunge=
>> r" to get
>> # one list and then picked up one out of hundreds results.
>> # The get-dataset() function has to be picked on the dynamic web address=
>> )
>> (dataset_ifpri <- get_dataset("https://doi.org/10.7910/DVN/ZTCWYQ"))
>>
>> ## 03. Grabbing the (1st) file we are interested on
>> AppendixC <- get_file("001_AppendixC.tab",
>> "https://doi.org/10.7910/DVN/ZTCWYQ")
>> writeBin(AppendixC, "001_AppendixC.tab")
>>
>> read.table("001_AppendixC.tab")
>
> I imagine you are using the dataverse package.
>
> 7z is more straightforward because the file format is clear.
>
> You need to figure out the 001_AppendixC.tab file format.
> On first glance it looks to me like a spreadsheet.
That website says it's tab-delimited. The read.delim (in base R) function is designed for that possibility. However the download pull-down menu that appears, seems to offer the option of deliver in a variety of formats:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.pdf
Type: application/pdf
Size: 21204 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180513/6ed61785/attachment-0002.pdf>
-------------- next part --------------
When I choose the Rdata option I get:
fil <- load("/Users/davidwinsemius/001_AppendixC.RData")
fil
#[1] "x"
str(x)
#-------------------
'data.frame': 132 obs. of 17 variables:
$ Country :Class 'AsIs' atomic [1:132] Afghanistan Albania Algeria Angola ...
.. ..- attr(*, "comment")= chr "Country"
$ UN9193 :Class 'AsIs' atomic [1:132] 37.4 7.7 9.1 65.400000000000006 ...
.. ..- attr(*, "comment")= chr "UN9193"
$ UN9901 :Class 'AsIs' atomic [1:132] 46.1 7.2 10.7 50 ...
------ snipped --------
--
David.
>
> $ file /tmp/001_AppendixC.tab
> /tmp/001_AppendixC.tab: Zip archive data, at least v2.0 to extract
> $ cd /tmp && unzip 001_AppendixC.tab
> $ head -n2 /tmp/xl/workbook.xml | cut -c 1-75
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
>
> Once you figure out the format manually, write an R function that
> figures out the format, and ask again here to find an R function that
> reads the format.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
More information about the R-help
mailing list