[R] Download data from NASA for multiple locations - RCurl

Miluji Sb milujisb at gmail.com
Mon Oct 16 22:43:16 CEST 2017


I have done the following using readLines

directory <- "~/"
files <- list.files(directory)
data_frames <- vector("list", length(files))
for (i in seq_along(files)) {
  df <- readLines(file.path(directory, files[i]))
  df <- df[-(1:13)]
  df <- data.frame(year = substr(df,1,4),
                       month = substr(df, 6,7),
                       day = substr(df, 9, 10),
                       hour = substr(df, 12, 13),
                       temp = substr(df, 21, 27))
  data_frames[[i]] <- df
}

What I have been have been having trouble is adding the following
information from the cities file (100 cities) for each of the downloaded
data files. I would like to do the following but automatically:

###
mydata$city <- rep(cities[1,1], nrow(mydata))
mydata$state <- rep(cities[1,2], nrow(mydata))
mydata$lon <- rep(cities[1,3], nrow(mydata))
mydata$lat <- rep(cities[1,4], nrow(mydata))
###

The information for cities look like this:

###
cities <-  dput(droplevels(head(cities, 5)))
structure(list(city = structure(1:5, .Label = c("Boston", "Bridgeport",
"Cambridge", "Fall River", "Hartford"), class = "factor"), state =
structure(c(2L,
1L, 2L, 2L, 1L), .Label = c(" CT ", " MA "), class = "factor"),
    lon = c(-71.06, -73.19, -71.11, -71.16, -72.67), lat = c(42.36,
    41.18, 42.37, 41.7, 41.77)), .Names = c("city", "state",
"lon", "lat"), row.names = c(NA, 5L), class = "data.frame")
###

Apologies if this seems trivial but I have been having a hard time. Thank
you again.

Sincerely,

Milu

On Mon, Oct 16, 2017 at 7:13 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> > On Oct 15, 2017, at 3:35 PM, Miluji Sb <milujisb at gmail.com> wrote:
> >
> > Dear David,
> >
> > This is amazing, thank you so much. If I may ask another question:
> >
> > The output looks like the following:
> >
> > ###
> > dput(head(x,15))
> > c("Metadata for Requested Time Series:", "", "prod_name=GLDAS_NOAH025_3H_
> v2.0",
> > "param_short_name=Tair_f_inst", "param_name=Near surface air
> temperature",
> > "unit=K", "begin_time=1970-01-01T00", "end_time=1979-12-31T21",
> > "lat= 42.36", "lon=-71.06", "Request_time=2017-10-15 22:20:03 GMT",
> > "", "Date&Time               Data", "1970-01-01T00:00:00\t267.769",
> > "1970-01-01T03:00:00\t264.595")
> > ###
> >
> > Thus I need to drop the first 13 rows and do the following to add
> identifying information:
>
> Are you having difficulty reading in the data from disk? The `read.table`
> function has a "skip" parameter.
> >
> > ###
> > mydata <- data.frame(year = substr(x,1,4),
>
> That would not appear to do anything useful with x. The `x` object is not
> a long string. The items you want are in separate elements of x.
>
> substr(x,1,4)   # now returns
>  [1] "Meta" ""     "prod" "para" "para" "unit" "begi" "end_" "lat=" "lon="
> "Requ" ""     "Date"
> [14] "1970" "1970"
>
> You need to learn basic R indexing. The year might be extracted from the
> 7th element of x x via code like this:
>
>     year <- substr( x[7], 1,4)
>
> >                      month = substr(x, 6,7),
> >                      day = substr(x, 9, 10),
> >                      hour = substr(x, 12, 13),
> >                      temp = substr(x, 21, 27))
>
> The time and temp items would naturally be read in with read.table (or in
> the case of tab-delimited data with read.delim) after skipping the first 14
> lines.
>
>
> >
> > mydata$city <- rep(cities[1,1], nrow(mydata))
>
> There's no need to use `rep` with data.frame. If one argument to
> data.frame is length n then all single elelment arguments will be
> "recycled" to fill in the needed number of rows. Please take the time to
> work through all the pages of "Introduction to R" (shipped with all
> distributions of R) or pick another introductory text. We cannot provide
> tutoring to all students. You need to put in the needed self-study first.
>
> --
> David.
>
>
> > mydata$state <- rep(cities[1,2], nrow(mydata))
> > mydata$lon <- rep(cities[1,3], nrow(mydata))
> > mydata$lat <- rep(cities[1,4], nrow(mydata))
> > ###
> >
> > Is it possible to incorporate these into your code so the data looks
> like this:
> >
> > dput(droplevels(head(mydata)))
> > structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
> "1970", class = "factor"),
> >     month = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "01", class =
> "factor"),
> >     day = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "01", class =
> "factor"),
> >     hour = structure(1:6, .Label = c("00", "03", "06", "09",
> >     "12", "15"), class = "factor"), temp = structure(c(6L, 4L,
> >     2L, 1L, 3L, 5L), .Label = c("261.559", "262.525", "262.648",
> >     "264.595", "265.812", "267.769"), class = "factor"), city =
> structure(c(1L,
> >     1L, 1L, 1L, 1L, 1L), .Label = "Boston", class = "factor"),
> >     state = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = " MA ", class
> = "factor"),
> >     lon = c(-71.06, -71.06, -71.06, -71.06, -71.06, -71.06),
> >     lat = c(42.36, 42.36, 42.36, 42.36, 42.36, 42.36)), .Names =
> c("year",
> > "month", "day", "hour", "temp", "city", "state", "lon", "lat"
> > ), row.names = c(NA, 6L), class = "data.frame")
> >
> > Apologies for asking repeated questions and thank you again!
>
> Of course it's possible. I don't understand where the difficulty lies.
> >
> > Sincerely,
> >
> > Milu
> >
> > On Sun, Oct 15, 2017 at 11:45 PM, David Winsemius <
> dwinsemius at comcast.net> wrote:
> >
> > > On Oct 15, 2017, at 2:02 PM, Miluji Sb <milujisb at gmail.com> wrote:
> > >
> > > Dear all,
> > >
> > > i am trying to download time-series climatic data from GES DISC (NASA)
> > > Hydrology Data Rods web-service. Unfortunately, no wget method is
> > > available.
> > >
> > > Five parameters are needed for data retrieval: variable, location,
> > > startDate, endDate, and type. For example:
> > >
> > > ###
> > > https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/
> timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0:
> Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00&
> location=GEOM:POINT(-71.06,%2042.36)&type=asc2
> > > ###
> > >
> > > In this case, variable: Tair_f_inst (temperature), location: (-71.06,
> > > 42.36), startDate: 01 January 1970; endDate: 31 December 1979; type:
> asc2
> > > (output 2-column ASCII).
> > >
> > > I am trying to download data for 100 US cities, data for which I have
> in
> > > the following data.frame:
> > >
> > > ###
> > > cities <-  dput(droplevels(head(cities, 5)))
> > > structure(list(city = structure(1:5, .Label = c("Boston", "Bridgeport",
> > > "Cambridge", "Fall River", "Hartford"), class = "factor"), state =
> > > structure(c(2L,
> > > 1L, 2L, 2L, 1L), .Label = c(" CT ", " MA "), class = "factor"),
> > >    lon = c(-71.06, -73.19, -71.11, -71.16, -72.67), lat = c(42.36,
> > >    41.18, 42.37, 41.7, 41.77)), .Names = c("city", "state",
> > > "lon", "lat"), row.names = c(NA, 5L), class = "data.frame")
> > > ###
> > >
> > > Is it possible to download the data for the multiple locations
> > > automatically (e.g. RCurl) and save them as csv? Essentially, reading
> > > coordinates from the data.frame and entering it in the URL.
> > >
> > > I would also like to add identifying information to each of the data
> files
> > > from the cities data.frame. I have been doing the following for a
> single
> > > file:
> >
> > Didn't seem that difficult:
> >
> > library(downloader)  # makes things easier for Macs, perhaps not needed
> > # if not used will need to use download.file
> >
> > for( i in 1:5) {
> >   target1 <- paste0("https://hydro1.gesdisc.eosdis.nasa.gov/daac-
> bin/access/timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_
> 3H_v2.0:Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-
> 31T00&location=GEOM:POINT(",
> >                      cities[i, "lon"],
> >                      ",%20", cities[i,"lat"],
> >                      ")&type=asc2")
> >   target2 <- paste0("~/",    # change for whatever destination directory
> you may prefer.
> >                     cities[i,"city"],
> >                     cities[i,"state"], ".asc")
> >   download(url=target1, destfile=target2)
> >                 }
> >
> > Now I have 5 named files with extensions ".asc" in my user directory
> (since I'm on a Mac). It is a slow website so patience is needed.
> >
> > --
> > David
> >
> >
> > >
> > > ###
> > > x <- readLines(con=url("
> > > https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/
> timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0:
> Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00&
> location=GEOM:POINT(-71.06,%2042.36)&type=asc2
> > > "))
> > > x <- x[-(1:13)]
> > >
> > > mydata <- data.frame(year = substr(x,1,4),
> > >                     month = substr(x, 6,7),
> > >                     day = substr(x, 9, 10),
> > >                     hour = substr(x, 12, 13),
> > >                     temp = substr(x, 21, 27))
> > >
> > > mydata$city <- rep(cities[1,1], nrow(mydata))
> > > mydata$state <- rep(cities[1,2], nrow(mydata))
> > > mydata$lon <- rep(cities[1,3], nrow(mydata))
> > > mydata$lat <- rep(cities[1,4], nrow(mydata))
> > > ###
> > >
> > > Help and advice would be greatly appreciated. Thank you!
> > >
> > > Sincerely,
> > >
> > > Milu
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
> >
> >
> >
> >
> >
> >
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list