[R] how to ignore NA with "NA" or "NULL"
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Wed Jun 6 16:22:42 CEST 2012
Still not clear what solution you would consider a success. On the one hand, you said you needed the NULLs, but you want one big data frame also.
Does
refill <- refill[ -which( sapply( refill, is.null ), arr.ind=TRUE ) ) ]
refill <- as.data.frame( refill )
do what you want? If you need to keep the nulls, perhaps don't overwrite the refill list?
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
jeff6868 <geoffrey_klein at etu.u-bourgogne.fr> wrote:
>Ok Jeff, but then it'll be a big one. I'm working on a list of files
>and my
>problem depends on different functions used previously. So it's very
>hard
>for me to summarize to reproduct my error. But here is the
>reproductible
>example with the error at the last line of the code (just copy and
>paste
>it).
>You'll notice that the data.frame with only NAs is set to NULL in
>"refill",
>and I just want to have it unchanged in output (so the same as input).
>The aim of the function is to fill the NAs of my data.frames. It'll not
>work
>in this example because there're only big NA gaps which are my problem
>for
>the moment. But maybe now you can have an idea where the problem is
>(change
>NULL for "only NA DF" in output to the same DF as in input).
>For the example, we are just testing for "x1".
>Hope you have understood my problem now :)
>Thanks Jeff, Rui or everyone else!
>
># my data for example
>DF1 <- data.frame(x1=rnorm(1:20),x2=c(31:50))
>write.table(DF1,"ST001_2008.csv",sep=";")
>DF2 <-
>data.frame(x1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,rnorm(1:10)),x2=c(1:20))
>write.table(DF2,"ST002_2008.csv",sep=";")
>DF3 <- data.frame(x1=rnorm(81:100),x2=NA)
>write.table(DF3,"ST003_2008.csv",sep=";")
>DF4 <- data.frame(x1=c(21:40),x2=rnorm(1:20))
>write.table(DF4,"ST004_2008.csv",sep=";")
>
> #list my data
> filenames <- list.files(pattern="\\_2008.csv$")
>
> Sensors <- paste("x", 1:2,sep="")
>
> Stations <-substr(filenames,1,5)
>
> nsensors <- length(Sensors)
> nstations <- length(Stations)
>
> nobs <- nrow(read.table(filenames[1], header=TRUE))
>
> yr2008 <- array(NA, dim=c(nobs, nsensors, nstations))
>
> for(i in seq_len(nstations)){
> tmp <- read.table(filenames[i], header=TRUE, sep=";")
> yr2008[ , , i] <- as.matrix(tmp[, Sensors])
> }
>
> dimnames(yr2008) <- list(seq.int(nobs), Sensors, Stations)
>
> yr2008capt1hiver<-yr2008[1:10,1,]
> yr2008capt1hiver <- as.data.frame(yr2008capt1hiver)
>
> #correlation between my data for x1 (for the example)
> corhiver2008capt1 <- cor(yr2008capt1hiver,use="pairwise.complete.obs")
>
> capt1hiver <- c(1:length(yr2008capt1hiver))
>
> for(i in 1:length(capt1hiver))
> {
>
>if(sum(!is.na(yr2008capt1hiver[,capt1hiver[i]]))<(length(yr2008capt1hiver[[capt1hiver[i]]])/2))
> {
> corhiver2008capt1[i,]=NA
> corhiver2008capt1[,i]=NA
> }
> }
>
>
> lst <- lapply(list.files(pattern="\\_2008.csv$"), read.table,sep=";",
>header=TRUE, stringsAsFactors=FALSE)
> names(lst) <- Stations
>
> # searching the highest correlation for each data.Frame
> get.max.cor <- function(station, mat){
> mat[row(mat) == col(mat)] <- -Inf
> m <- max(mat[station, ],na.rm=TRUE)
> if (is.finite(m)) {return(which( mat[station, ] == m ))}
> else {return(NA)}
> }
>
> # fill the data.frame with the data.frame which has the highest
>correlation coefficient
> na.fill <- function(x, y){
> if(all(!is.finite(y[1:10,1]))) return(y)
> i <- is.na(x[1:10,1])
> xx <- y[1:10,1]
> new <- data.frame(xx=xx)
> x[1:10,1][i] <- predict(lm(x[1:10,1]~xx, na.action=na.exclude),new)[i]
> x
> }
>
> process.all <- function(df.list, mat){
>
> f <- function(station)
> na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
>
> g <- function(station){
> x <- df.list[[station]]
> if(any(!is.finite(x[1:10,1]))){
> mat[row(mat) == col(mat)] <- -Inf
> nas <- which(is.na(x[1:10,1]))
> ord <- order(mat[station, ], decreasing = TRUE)[-c(1,
>ncol(mat))]
> for(y in ord){
> if(all(!is.na(df.list[[y]][1:10,1][nas]))){
> xx <- df.list[[y]][1:10,1]
> new <- data.frame(xx=xx)
> x[1:10,1][nas] <- predict(lm(x[1:10,1]~xx,
>na.action=na.exclude), new)[nas]
> break
> }
> }
> }
> x
> }
>
> n <- length(df.list)
> nms <- names(df.list)
> max.cor <- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
> df.list <- lapply(seq.int(n), f)
> df.list <- lapply(seq.int(n), g)
> names(df.list) <- nms
> df.list
> }
>
> refill <- process.all(lst, corhiver2008capt1)
>refill <- as.data.frame(refill)
>
>########## HERE IS THE PROBLEM ######
> head(refill)
>
>--
>View this message in context:
>http://r.789695.n4.nabble.com/how-to-ignore-NA-with-NA-or-NULL-tp4632287p4632527.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list