dataone Package Overview

2022-06-10

dataone R Package Overview

The dataone R package enables R scripts to search, download and upload science data and metadata to the DataONE Federation. This package calls DataONE web services that allow client programs to interact with Member Nodes (MN) and DataONE Coordinating Nodes (CN).

Quick Start

See the full manual (help dataone) for documentation.

To search the DataONE Federation Member Node Knowledge Network for Biocomplexity (KNB) for a dataset:

library(dataone)
cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")
mySearchTerms <- list(q="abstract:salmon+AND+keywords:acoustics+AND+keywords:\"Oncorhynchus nerka\"",
                      fl="id,title,dateUploaded,abstract,size",
                      fq="dateUploaded:[2013-01-01T00:00:00.000Z TO 2014-01-01T00:00:00.000Z]",
                      sort="dateUploaded+desc")
result <- query(mn, solrQuery=mySearchTerms, as="data.frame")
result[1,c("id", "title")]
pid <- result[1,'id']

The metadata file located in the above search can be downloaded with the commands:

library(XML)
metadata <- rawToChar(getObject(mn, pid))

The metadata file that describes the located research can be viewed in an XML viewer or text editor, once it is written to a disk file. This file details a data file (CSV) that can be obtained using the listed identifier, using the commands:

dataRaw <- getObject(mn, "df35d.443.1")
dataChar <- rawToChar(dataRaw)
theData <- textConnection(dataChar)
df <- read.csv(theData, stringsAsFactors=FALSE)
df[1,]

Uploading a CSV file to a DataONE Member Node requires user authentication. DataONE user authentication is described in the vignette DataONE-Federation.

Once the authentication steps have been followed, uploading is done with:

library(datapack)
library(uuid)
d1c <- D1Client("STAGING", "urn:node:mnStageUCSB2")
id <- paste("urn:uuid:", UUIDgenerate(), sep="")
testdf <- data.frame(x=1:10,y=11:20)
csvfile <- paste(tempfile(), ".csv", sep="")
write.csv(testdf, csvfile, row.names=FALSE)
# Build a DataObject containing the csv, and upload it to the Member Node
d1Object <- new("DataObject", id, format="text/csv", filename=csvfile)
uploadDataObject(d1c, d1Object, public=TRUE)

Additional Resources

The dataone R package vignettes can be viewed using the R vignette command, for example vignette("dataone-overview").

The dataone vignettes describe these topics:

Acknowledgements

Work on this package was supported by:

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.