[R] File checking problem
Barry Rowlingson
b.rowlingson at lancaster.ac.uk
Thu Mar 5 20:19:32 CET 2009
2009/3/5 ling ling <metal_licaling at live.com>:
>
> Dear all,
>
> I am a newcomer to R programming, I met the problem:
>
> I have a lot of .txt files in my directory.
>
> Firstly, I check whether the file satisfies the conditions:
> 1.empty
> 2.the "Rep" column of the file has no "useractivity_idle" or
> "useractivity_act"
> 3.even The "rep" has both of them, numbers of "useractivity_idle"==numbers of "useractivity_act"==1
> If the file has one of those conditions, skip this file, jump to and read the next .txt file:
> I made the programming as:
>
> name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
> full.names = FALSE, recursive = FALSE,
> ignore.case = FALSE)
>
> for(k in 1:length(name)){
>
> log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
>
> x<-which(log1$Rep=="useractivity_act")
> y<-which(log1$Rep=="useractivity_idle")
>
> while(all(log1$Rep!="useractivity_act")||all(log1$Rep!="useractivity_idle")||(length(x)==1
> && length(y)==1)||(file.info(name[k])$size== 0)){
> k=k+1
> log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
> }
>
> ........
>
> }
>
> But I always get the following information:
> Error in file(file, "r") : cannot open the connection
> In addition: Warning message:
> In file(file, "r") : cannot open file 'NA': No such file or directory
>
>
> I have been exploring this for long time, any help would be appreciated. Thanks a lot!
You are trying to read one more file than you have! Simplified your
code looks like this:
name = list.files(...)
for(k in 1:length(name)){
log1 = read.table(name[k],....)
while(something){
k =k + 1
log1 = read.table(name[k],...) # 1
}
}
What will happen is that when the last file is read at point #1, the
loop goes round again, k becomes more than the length of name, and it
will fail at #1 again.
I think you've overcomplicated it. You just need one loop with an
'if' in it. I'd write it as:
processFiles = function(){
name<-list.files(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE)
for(k in 1:length(name)){
log1<-read.table(name[k],header=TRUE,stringsAsFactors=FALSE)
if(testCondition(log1)){
cat("Processing ",name[k],"\n")
processLog(log1)
}else{
cat("Skipping ",name[k],"\n")
}
}
}
Then you need two more functions, testCondition and processLog.
testCondition takes a data frame and decides whether you want to
process it or note. I'm not sure I've got the test logic right here,
but you should get the idea:
`testCondition` <-
function(log1){
## test for Rep column:
if(!any(names(log1)=="Rep"))return(FALSE)
## test active/idle count
nAct = sum(log1$Rep == "useractivity_act")
nIdle = sum(log1$Rep == "useractivity_idle")
## if we have no active or idle, return False
if(nAct + nIdle == 0)return(FALSE)
## if we only have one of either, return False
if(nAct == 1 || nIdle ==1) return(FALSE)
## maybe some other tests here?
return(TRUE)
}
here is a simple processLog function that just prints the summary of
the data frame. Put whatever you want in here:
`processLog` <-
function(log1){
## for example:
print(summary(log1))
}
How's that? Note the use of comments and breaking the code up into
small independent, testable functions.
Barry
More information about the R-help
mailing list