[R] Create single vector after looping through multiple data frames with GREP

Michael Bedward michael.bedward at gmail.com
Mon Oct 11 07:19:32 CEST 2010


Hi Simon,

The function below should do it or at least get you started...

getPlotData <- function (datalist, response, times)
{
  qdata <- sapply(datalist[times],
    function(df) {
      irow <- grepl(response, df$Response)
      df[irow, 2:5]
    }
  )

  # qdata is a matrix with rows Q1:Q4 and cols for times;
  # we turn it into a two col matrix with col 1 = time index
  # and col 2 = value
  time.index <- seq(4 * ncol(qdata))
  out <- cbind(time.index, as.numeric(qdata))
  rownames(out) <- paste(time.index, rownames(qdata), sep=".")
  colnames(out) <- c("time", response)
  out
}

#Example, get data for times 10:15 where Response contains "Economy"
x <- getPlotData(r, "Economy", 10:15)


Michael


On 11 October 2010 03:35, Simon Kiss <sjkiss at gmail.com> wrote:
> Hello all,
>
> I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette.  However, the original chain is included below.
>
> I've incorporated bits of  both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming with control structures. The reproducible code below creates a list containing 19 data frames, one each for the "Most Important Problem"  survey data for Canada.
>
> What I'd like at this stage is a loop where I can search through all the data frames for rows containing the search term and then bind the rows together in a plotable (sp?) format.
>
> At the bottom of the code below, you'll find my first attempt to make use of a search string and to put it into a plotable format.  It only partially works.  I can only get the numbers for one year, where I'd like to be able to get a string of numbers for several years.But, on the upside, grep appears to do the trick in terms of selecting rows.
>
> Can any one suggest a solution?
> Yours truly,
> Simon Kiss
>
> #This is the reproducible code to set-up all the data frames
> require("XML")
> library(XML)
> #This gets the data from the web and lists them
> mylist <- paste ("http://www.queensu.ca/cora/_trends/mip_",
> c(1987:2001,2003:2006), ".htm", sep="")
> alltables <- lapply(mylist, readHTMLTable)
>
> #convert to dataframes
> r<-lapply(alltables, function(x) {as.data.frame(x)} )
>
> #This is just some house-cleaning; structuring all the tables so they are uniform
> r[[1]][3]<-r[[1]][2]
> r[[1]][2]<-c(" ")
> r[[2]][4]<-r[[2]][2]
> r[[2]][5]<-r[[2]][3]
> r[[2]][2:3]<-c(" ")
> r[[3]][4:5]<-r[[3]][3:4]
> r[[3]][3]<-c(" ")
>
> #This loop deletes some superfluous columns and rows, turns the first column in to character strings and the data into numeric
> for (i in 1:19) {
> n.rows<-dim(r[[i]])[1]
> r[[i]] <- r[[i]][15:n.rows-3, 1:5]
> n.rows<-dim(r[[i]])[1]
> row.names(r[[i]]) <-NULL
> names(r[[i]]) <- c("Response", "Q1", "Q2", "Q3", "Q4")
>
> r[[i]][, 1]<-as.character(r[[i]][,1])
> #r[[i]][,2:5]<-as.numeric(as.character(r[[i]][,2:5]))
> r[[i]][, 2:5]<-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))})
> #n.rows<-dim(r[[i]])[1]
> #r[[i]]<-r[[i]][9
> }
>
> #This code is my first attempt at introducing a search string, getting the rows, binding and plotting;
> economy<-r[[10]][grep('Economy', r[[10]][,1]),]
> economy_2<-r[[11]][grep('Economy', r[[11]][,1]),]
> test<-cbind(economy, economy_2)
> plot(as.numeric(test), type='l')
>
> #here's another attempt I'm trying....
> economy<-data.frame
> for (i in 15:19) {
> economy[i,] <-r[[i]][grep('Economy', r[[i]][,1]), ]
> }
>
> Begin forwarded message:
>
>> From: Simon Kiss <sjkiss at gmail.com>
>> Date: October 7, 2010 4:59:46 PM EDT
>> To: Simon Kiss <simonjkiss at yahoo.ca>
>> Subject: Fwd: [R] Converting scraped data
>>
>>
>>
>> Begin forwarded message:
>>
>>> From: Ethan Brown <ethancbrown at gmail.com>
>>> Date: October 6, 2010 4:22:41 PM GMT-04:00
>>> To: Simon Kiss <sjkiss at gmail.com>
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Converting scraped data
>>>
>>> Hi Simon,
>>>
>>> You'll notice the "test" data.frame has a whole mix of characters in
>>> the columns you're interested, including a "-" for missing values, and
>>> that the columns you're interested in are in fact factors.
>>>
>>> as.numeric(factor) returns the level of the factor, not the value of
>>> the level. (See ?levels and ?factor)--that's why it's giving you those
>>> irrelevant integers. I always end up using something like this handy
>>> code snippet to deal with the situation:
>>>
>>> unfactor <- function(factors)
>>> # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor
>>> # Transform a factor back into its factor names
>>> {
>>>  return(levels(factors)[factors])
>>> }
>>>
>>> Then, to get your data to where you want it, I'd do this:
>>>
>>> require(XML)
>>> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"
>>> tables <- readHTMLTable(theurl)
>>> n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
>>> class(tables)
>>> test<-data.frame(tables, stringsAsFactors=FALSE)
>>>
>>>
>>> result <- test[11:42, 1:5] #Extract the actual data we want
>>> names(result) <- c("Response", "Q1", "Q2","Q3","Q4")
>>> for(i in 2:5) {
>>> # Convert columns to factors
>>> result[,i] <- as.numeric(unfactor(result[,i]))
>>> }
>>> result
>>>
>>> From here you should be able to plot or do whatever else you want.
>>>
>>> Hope this helps,
>>> Ethan Brown
>>>
>>>
>>> On Wed, Oct 6, 2010 at 9:52 AM, Simon Kiss <sjkiss at gmail.com> wrote:
>>>> Dear Colleagues,
>>>> I used this code to scrape data from the URL conatined within.  This code
>>>> should be reproducible.
>>>>
>>>> require("XML")
>>>> library(XML)
>>>> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"
>>>> tables <- readHTMLTable(theurl)
>>>> n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
>>>> class(tables)
>>>> test<-data.frame(tables, stringsAsFactors=FALSE)
>>>> test[16,c(2:5)]
>>>> as.numeric(test[16,c(2:5)])
>>>> quartz()
>>>> plot(c(1:4), test[15, c(2:5)])
>>>>
>>>> calling the values from the row of interest using test[16, c(2:5)] can bring
>>>> them up as represented on the screen, plotting them or coercing them to
>>>> numeric changes the values and in a way that doesn't make sense to me. My
>>>> intuitino is that there is something going on with the way the characters
>>>> are coded or classed when they're scraped into R.  I've looked around the
>>>> help files for converting from character to numeric but can't find a
>>>> solution.
>>>>
>>>> I also tried this:
>>>>
>>>> as.numeric(as.character(test[16,c(2:5)] and that also changed the values
>>>> from what they originally were.
>>>>
>>>> I'm grateful for any suggestions.
>>>> Yours, Simon Kiss
>>>>
>>>>
>>>>
>>>> *********************************
>>>> Simon J. Kiss, PhD
>>>> Assistant Professor, Wilfrid Laurier University
>>>> 73 George Street
>>>> Brantford, Ontario, Canada
>>>> N3T 2C9
>>>> Cell: +1 519 761 7606
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> *********************************
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 519 761 7606
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> *********************************
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 519 761 7606
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list