[R] File checking problem

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Thu Mar 5 20:19:32 CET 2009


2009/3/5 ling ling <metal_licaling at live.com>:
>
> Dear all,
>
> I am a newcomer to R programming, I met the problem:
>
> I have a lot of .txt files in my directory.
>
> Firstly, I check whether the file satisfies the conditions:
> 1.empty
> 2.the "Rep" column of the file has no "useractivity_idle" or
> "useractivity_act"
> 3.even The "rep" has both of them, numbers of "useractivity_idle"==numbers of "useractivity_act"==1
> If the file has one of those conditions, skip this file, jump to and read the next .txt file:
> I made the programming as:
>
> name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
>           full.names = FALSE, recursive = FALSE,
>           ignore.case = FALSE)
>
> for(k in 1:length(name)){
>
> log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
>
> x<-which(log1$Rep=="useractivity_act")
> y<-which(log1$Rep=="useractivity_idle")
>
> while(all(log1$Rep!="useractivity_act")||all(log1$Rep!="useractivity_idle")||(length(x)==1
> && length(y)==1)||(file.info(name[k])$size== 0)){
> k=k+1
> log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
> }
>
> ........
>
> }
>
> But I always get the following information:
> Error in file(file, "r") : cannot open the connection
> In addition: Warning message:
> In file(file, "r") : cannot open file 'NA': No such file or directory
>
>
> I have been exploring this for long time, any help would be appreciated. Thanks a lot!

 You are trying to read one more file than you have! Simplified your
code looks like this:

name = list.files(...)
for(k in 1:length(name)){
  log1 = read.table(name[k],....)
  while(something){
    k =k + 1
    log1 = read.table(name[k],...)     # 1
  }
}

What will happen is that when the last file is read at point #1, the
loop goes round again, k becomes more than the length of name, and it
will fail at #1 again.

 I think you've overcomplicated it. You just need one loop with an
'if' in it. I'd write it as:

processFiles = function(){

name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
          full.names = FALSE, recursive = FALSE,
          ignore.case = FALSE)

 for(k in 1:length(name)){
   log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
   if(testCondition(log1)){
      cat("Processing ",name[k],"\n")
     processLog(log1)
   }else{
     cat("Skipping ",name[k],"\n")
   }
 }
}

Then you need two more functions, testCondition and processLog.
testCondition takes a data frame and decides whether you want to
process it or note. I'm not sure I've got the test logic right here,
but you should get the idea:

`testCondition` <-
  function(log1){
    ## test for Rep column:
    if(!any(names(log1)=="Rep"))return(FALSE)
    ## test active/idle count
    nAct = sum(log1$Rep == "useractivity_act")
    nIdle = sum(log1$Rep == "useractivity_idle")
    ## if we have no active or idle, return False
    if(nAct + nIdle == 0)return(FALSE)
    ## if we only have one of either, return False
    if(nAct == 1 || nIdle ==1) return(FALSE)
    ## maybe some other tests here?
    return(TRUE)
  }

 here is a simple processLog function that just prints the summary of
the data frame. Put whatever you want in here:

`processLog` <-
  function(log1){
     ## for example:
    print(summary(log1))
  }

How's that? Note the use of comments and breaking the code up into
small independent, testable functions.

Barry




More information about the R-help mailing list