[R] select part of files from a list.files

jeff6868 geoffrey_klein at etu.u-bourgogne.fr
Mon May 21 17:32:59 CEST 2012


Hi everyone.

I'm working on a list of files (about 50 files). I've listed them thanks to
the function: list.files.
Each of my files contains 35000 lines of data. These files may also contain
some missing values NA (sometimes till 10 000 NAs following each other).
The aim is to do some correlation matrices between these files (I already
have the script). But as I have often missing values, the script doesn't
work yet for all my files.

In this topic, I would like to select a part of the data of these files
before the correlation.
In the files list I've created, I would like to select only the 9000 first
lines of each of my files: myfiles[1:9000,1], and then, in these 9000 lines,
I would like to keep only in my list the files which contains at least 1000
non-NA lines (so numeric data) on my 9000 lines.

I would like then to apply my script on this list of files which contains at
least 1000 numeric data on the first 9000 lines of my whole data.

I've created easy data.frames for the example, if someone could explain me
how I can do this easily (at least 2 non NA values for the 5 first lines for
example for these fake data.frames just here).
Thank you very much!

ST1 <- data.frame(a=1:10)
ST2 <- data.frame(b=c(NA,NA,NA,NA,NA,6:10))
ST3 <- data.frame(c=c(1,NA,NA,4:10))
ST4 <- data.frame(d=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
ST5 <- data.frame(e=c(1,2,3,4,NA,NA,7:9,NA))

( in this example, the aim is to keep only in the list.files: ST1, ST3 and
ST5 because they all contains at least 2 non-NA values in the 5 first lines,
and so to remove from the list.files ST2 and ST4 because they contain both
too much NAs in the first 5 lines). Hope you've understood! Thanks again!




--
View this message in context: http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list