[R] Download data from NASA for multiple locations - RCurl
David Winsemius
dwinsemius at comcast.net
Mon Oct 16 23:30:23 CEST 2017
> On Oct 16, 2017, at 1:43 PM, Miluji Sb <milujisb at gmail.com> wrote:
>
> I have done the following using readLines
>
> directory <- "~/"
> files <- list.files(directory)
> data_frames <- vector("list", length(files))
> for (i in seq_along(files)) {
> df <- readLines(file.path(directory, files[i]))
> df <- df[-(1:13)]
> df <- data.frame(year = substr(df,1,4),
> month = substr(df, 6,7),
> day = substr(df, 9, 10),
> hour = substr(df, 12, 13),
> temp = substr(df, 21, 27))
> data_frames[[i]] <- df
> }
>
> What I have been have been having trouble is adding the following information from the cities file (100 cities) for each of the downloaded data files. I would like to do the following but automatically:
>
> ###
> mydata$city <- rep(cities[1,1], nrow(mydata))
> mydata$state <- rep(cities[1,2], nrow(mydata))
> mydata$lon <- rep(cities[1,3], nrow(mydata))
> mydata$lat <- rep(cities[1,4], nrow(mydata))
> ###
>
Why not store the lat/lon data in the file name and then extract all 4 items from the file name within the loop?
--
David.
> The information for cities look like this:
>
> ###
> cities <- dput(droplevels(head(cities, 5)))
> structure(list(city = structure(1:5, .Label = c("Boston", "Bridgeport",
> "Cambridge", "Fall River", "Hartford"), class = "factor"), state = structure(c(2L,
> 1L, 2L, 2L, 1L), .Label = c(" CT ", " MA "), class = "factor"),
> lon = c(-71.06, -73.19, -71.11, -71.16, -72.67), lat = c(42.36,
> 41.18, 42.37, 41.7, 41.77)), .Names = c("city", "state",
> "lon", "lat"), row.names = c(NA, 5L), class = "data.frame")
> ###
>
> Apologies if this seems trivial but I have been having a hard time. Thank you again.
>
> Sincerely,
>
> Milu
>
> On Mon, Oct 16, 2017 at 7:13 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> > On Oct 15, 2017, at 3:35 PM, Miluji Sb <milujisb at gmail.com> wrote:
> >
> > Dear David,
> >
> > This is amazing, thank you so much. If I may ask another question:
> >
> > The output looks like the following:
> >
> > ###
> > dput(head(x,15))
> > c("Metadata for Requested Time Series:", "", "prod_name=GLDAS_NOAH025_3H_v2.0",
> > "param_short_name=Tair_f_inst", "param_name=Near surface air temperature",
> > "unit=K", "begin_time=1970-01-01T00", "end_time=1979-12-31T21",
> > "lat= 42.36", "lon=-71.06", "Request_time=2017-10-15 22:20:03 GMT",
> > "", "Date&Time Data", "1970-01-01T00:00:00\t267.769",
> > "1970-01-01T03:00:00\t264.595")
> > ###
> >
> > Thus I need to drop the first 13 rows and do the following to add identifying information:
>
> Are you having difficulty reading in the data from disk? The `read.table` function has a "skip" parameter.
> >
> > ###
> > mydata <- data.frame(year = substr(x,1,4),
>
> That would not appear to do anything useful with x. The `x` object is not a long string. The items you want are in separate elements of x.
>
> substr(x,1,4) # now returns
> [1] "Meta" "" "prod" "para" "para" "unit" "begi" "end_" "lat=" "lon=" "Requ" "" "Date"
> [14] "1970" "1970"
>
> You need to learn basic R indexing. The year might be extracted from the 7th element of x x via code like this:
>
> year <- substr( x[7], 1,4)
>
> > month = substr(x, 6,7),
> > day = substr(x, 9, 10),
> > hour = substr(x, 12, 13),
> > temp = substr(x, 21, 27))
>
> The time and temp items would naturally be read in with read.table (or in the case of tab-delimited data with read.delim) after skipping the first 14 lines.
>
>
> >
> > mydata$city <- rep(cities[1,1], nrow(mydata))
>
> There's no need to use `rep` with data.frame. If one argument to data.frame is length n then all single elelment arguments will be "recycled" to fill in the needed number of rows. Please take the time to work through all the pages of "Introduction to R" (shipped with all distributions of R) or pick another introductory text. We cannot provide tutoring to all students. You need to put in the needed self-study first.
>
> --
> David.
>
>
> > mydata$state <- rep(cities[1,2], nrow(mydata))
> > mydata$lon <- rep(cities[1,3], nrow(mydata))
> > mydata$lat <- rep(cities[1,4], nrow(mydata))
> > ###
> >
> > Is it possible to incorporate these into your code so the data looks like this:
> >
> > dput(droplevels(head(mydata)))
> > structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "1970", class = "factor"),
> > month = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "01", class = "factor"),
> > day = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "01", class = "factor"),
> > hour = structure(1:6, .Label = c("00", "03", "06", "09",
> > "12", "15"), class = "factor"), temp = structure(c(6L, 4L,
> > 2L, 1L, 3L, 5L), .Label = c("261.559", "262.525", "262.648",
> > "264.595", "265.812", "267.769"), class = "factor"), city = structure(c(1L,
> > 1L, 1L, 1L, 1L, 1L), .Label = "Boston", class = "factor"),
> > state = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = " MA ", class = "factor"),
> > lon = c(-71.06, -71.06, -71.06, -71.06, -71.06, -71.06),
> > lat = c(42.36, 42.36, 42.36, 42.36, 42.36, 42.36)), .Names = c("year",
> > "month", "day", "hour", "temp", "city", "state", "lon", "lat"
> > ), row.names = c(NA, 6L), class = "data.frame")
> >
> > Apologies for asking repeated questions and thank you again!
>
> Of course it's possible. I don't understand where the difficulty lies.
> >
> > Sincerely,
> >
> > Milu
> >
> > On Sun, Oct 15, 2017 at 11:45 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> >
> > > On Oct 15, 2017, at 2:02 PM, Miluji Sb <milujisb at gmail.com> wrote:
> > >
> > > Dear all,
> > >
> > > i am trying to download time-series climatic data from GES DISC (NASA)
> > > Hydrology Data Rods web-service. Unfortunately, no wget method is
> > > available.
> > >
> > > Five parameters are needed for data retrieval: variable, location,
> > > startDate, endDate, and type. For example:
> > >
> > > ###
> > > https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0:Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00&location=GEOM:POINT(-71.06,%2042.36)&type=asc2
> > > ###
> > >
> > > In this case, variable: Tair_f_inst (temperature), location: (-71.06,
> > > 42.36), startDate: 01 January 1970; endDate: 31 December 1979; type: asc2
> > > (output 2-column ASCII).
> > >
> > > I am trying to download data for 100 US cities, data for which I have in
> > > the following data.frame:
> > >
> > > ###
> > > cities <- dput(droplevels(head(cities, 5)))
> > > structure(list(city = structure(1:5, .Label = c("Boston", "Bridgeport",
> > > "Cambridge", "Fall River", "Hartford"), class = "factor"), state =
> > > structure(c(2L,
> > > 1L, 2L, 2L, 1L), .Label = c(" CT ", " MA "), class = "factor"),
> > > lon = c(-71.06, -73.19, -71.11, -71.16, -72.67), lat = c(42.36,
> > > 41.18, 42.37, 41.7, 41.77)), .Names = c("city", "state",
> > > "lon", "lat"), row.names = c(NA, 5L), class = "data.frame")
> > > ###
> > >
> > > Is it possible to download the data for the multiple locations
> > > automatically (e.g. RCurl) and save them as csv? Essentially, reading
> > > coordinates from the data.frame and entering it in the URL.
> > >
> > > I would also like to add identifying information to each of the data files
> > > from the cities data.frame. I have been doing the following for a single
> > > file:
> >
> > Didn't seem that difficult:
> >
> > library(downloader) # makes things easier for Macs, perhaps not needed
> > # if not used will need to use download.file
> >
> > for( i in 1:5) {
> > target1 <- paste0("https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0:Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00&location=GEOM:POINT(",
> > cities[i, "lon"],
> > ",%20", cities[i,"lat"],
> > ")&type=asc2")
> > target2 <- paste0("~/", # change for whatever destination directory you may prefer.
> > cities[i,"city"],
> > cities[i,"state"], ".asc")
> > download(url=target1, destfile=target2)
> > }
> >
> > Now I have 5 named files with extensions ".asc" in my user directory (since I'm on a Mac). It is a slow website so patience is needed.
> >
> > --
> > David
> >
> >
> > >
> > > ###
> > > x <- readLines(con=url("
> > > https://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/access/timeseries.cgi?variable=GLDAS2:GLDAS_NOAH025_3H_v2.0:Tair_f_inst&startDate=1970-01-01T00&endDate=1979-12-31T00&location=GEOM:POINT(-71.06,%2042.36)&type=asc2
> > > "))
> > > x <- x[-(1:13)]
> > >
> > > mydata <- data.frame(year = substr(x,1,4),
> > > month = substr(x, 6,7),
> > > day = substr(x, 9, 10),
> > > hour = substr(x, 12, 13),
> > > temp = substr(x, 21, 27))
> > >
> > > mydata$city <- rep(cities[1,1], nrow(mydata))
> > > mydata$state <- rep(cities[1,2], nrow(mydata))
> > > mydata$lon <- rep(cities[1,3], nrow(mydata))
> > > mydata$lat <- rep(cities[1,4], nrow(mydata))
> > > ###
> > >
> > > Help and advice would be greatly appreciated. Thank you!
> > >
> > > Sincerely,
> > >
> > > Milu
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
> >
> >
> >
> >
> >
> >
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
More information about the R-help
mailing list