[R] How to clean up missing values in a list of lists

Aron Lindberg aron.lindberg at case.edu
Tue Feb 10 22:41:08 CET 2015


Thanks Dennis!




In the end try worked:





get_files <- function(pull_lists){

  try(sapply(pull_lists$content, "[[", "filename" ))

}




Best,

Aron




-- 

Aron Lindberg




Doctoral Candidate, Information Systems

Weatherhead School of Management 

Case Western Reserve University

aronlindberg.github.io

On Tue, Feb 10, 2015 at 1:22 PM, Dennis Murphy <djmuser at gmail.com> wrote:

> Hi:
> It sounds like you need a condition handler, so look at ?try and
> ?tryCatch for starters. I'd also suggest looking at ?plyr::failwith if
> all you need is a simple error handler. One of the advantages of these
> functions is that they are designed for situations like yours, where
> an lapply() invocation over a list may occasionally result in an
> error, warning or message. The tryCatch() function is the most
> versatile, in that it allows you to define separate handlers for
> errors, warnings, messages and interrupts ; failwith() and try() are
> primarily used for dealing with errors alone.
> More details re these functions can be found here:
> http://adv-r.had.co.nz/Exceptions-Debugging.html
> HTH,
> Dennis
> On Tue, Feb 10, 2015 at 6:46 AM, Aron Lindberg <aron.lindberg at case.edu> wrote:
>> Hi,
>>
>>
>> I’m trying to query the Github API, and I’m running into some data munging issues, so I was hoping someone on the list might advise.
>>
>>
>> Here’s my code. To run it you need to replace client_id and client_secret with your own authorization information for Github.
>>
>>
>> library(github)
>> library(RCurl)
>> library(httpuv)
>> library(jsonlite)
>>
>>
>> # Set up the query
>> ctx = interactive.login(“client_id”, “client_secret”)
>>
>>
>> pull <- function(i){
>>   get.pull.request.files(owner = “rails”, repo = “rails”, id = i, ctx = get.github.context(), per_page=1000)
>> }
>>
>>
>> data <- read.csv(getURL(“https://gist.githubusercontent.com/aronlindberg/a3d135a303664046c94a/raw/e42a0734ec4542eccf5f4d5bdeed5afbdd1720e9/pull_ids”), sep = “\n”)
>>
>>
>> list <- read.csv(textConnection(data), header = FALSE)
>>
>>
>> pull_lists <- lapply(list$V1, pull)
>>
>>
>> get_files <- function(pull_lists){
>>   sapply(pull_lists$content, “[[“, “filename” )
>> }
>>
>>
>> file_lists <- lapply(pull_lists, get_files)
>>
>>
>> Everything works fine until the last command, which generates:
>>
>>
>> Error in FUN(X[[1L]], ...) : subscript out of bounds
>>
>>
>> I’ve read here: http://stackoverflow.com/questions/18461499/subscript-out-of-bounds-on-character-vector
>>
>>
>> which leads me to believe that the reason for the error is that when I run file_lists <- lapply(pull_lists, get_files) some of the entries are missing. However, I cannot figure out how to clean up the data. I have tried something along the lines of:
>>
>>
>> clean_files <- function(pull_lists){
>>   pull_lists$content[which(nchar(pull_lists$content)==NULL)]<-NA
>> }
>>
>>
>> clean_lists <- lapply(pull_lists, clean_files)
>>
>>
>> But that simply replaces *every* value with NA (similarly if I change ==NULL to <1, or <2).
>>
>>
>> How can I make this code work?
>>
>>
>> Best,
>> Aron
>>
>>
>> --
>> Aron Lindberg
>>
>>
>> Doctoral Candidate, Information Systems
>> Weatherhead School of Management
>> Case Western Reserve University
>> aronlindberg.github.io
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]



More information about the R-help mailing list