[R] parsing files for plot

baptiste auguie baptiste.auguie at googlemail.com
Sat Jan 30 14:43:49 CET 2010


Hi again,

Below are two versions, depending on whether you want to use scan or read.table,

## with scan
library(reshape)
listOfFiles <- list.files()
d <- llply(listOfFiles, scan)
names(d) <- basename(listOfFiles)

melt(d)

## with read.table

listOfFiles <- list.files()
names(listOfFiles) <- basename(listOfFiles)

library(plyr)
ldply(listOfFiles, read.table)


Note, I tested this code with the following files,

system("mkdir dummy")
setwd(paste(getwd(), "/dummy", sep=""))

files <- replicate(5, rnorm(sample(3:20, 1)), simplify=FALSE)
names <- paste("datafile", letters[1:5],".txt",  sep="")

l_ply(seq_along(files), function(ii, ...) write.table(x=files[[ii]],
file=names[ii], ... ),
      row.names = F, col.names = F)

HTH,

baptiste



On 30 January 2010 14:23, Maxim <deeepersound at googlemail.com> wrote:
> Hi,
>
> my data is really not spectacular, each of the 6 files (later several
> hundred) contains correlation coefficients in plain text format like:
>
> 0.923960073
> 0.923960073
> 0.612571344
> 0.064183275
> 0.007733399
> -0.315444372
> -0.064591277
> -0.268336142
> ...........
>
> with between 1000-13000 rows.
>
> Scanning from the directory works, as this script:
>
> comb<-data.frame()
> count<-0
> files <- list.files()  # all files in the working directory
> for(i in files) {
>                   count<-count+1
>
>        tmp <- scan(i)
>        assign(files[count], tmp)
>
>        if (i ==1)
>        comb<-data.frame(dats=c(tmp), index=c(rep(files[1], length(tmp))))
>        else
>        combadd<-data.frame(dats=c(tmp), index=c(rep(files[count],
> length(tmp))))
>        comb<-rbind(comb,combadd)
>
> }
> boxplot(dats ~ index, data = comb)
>
>
> works just great. There is no additional files in the folder. But look, how
> much code for such a simple task. I'd definitely prefer the plyr solution.
>
> Maxim
>
>
> 2010/1/30 baptiste auguie <baptiste.auguie at googlemail.com>
>>
>> Why don't you post an example of what your input files look like? (to
>> the list, not just to me!) A reproducible example is always required
>> if you want a good answer.
>>
>> Note that if you are scanning *all* files in the working directory,
>> you may also be scanning the R file containing your instructions which
>> won't have the correct format, obviously.
>>
>> Best,
>>
>> baptiste
>>
>> On 30 January 2010 13:52, Maxim <deeepersound at googlemail.com> wrote:
>> > Hi,
>> >
>> > thanks, that looks much more elegant than what I managed to accomplish
>> > in
>> > meantime:
>> >
>> > count<-1
>> > files <- list.files()  # all files in the working directory
>> > for(i in files) {
>> >
>> >        tmp <- scan(i)
>> >        assign(files[count], tmp)
>> >
>> >        if (i ==1)
>> >        comb<-data.frame(dats=c(tmp), index=c(rep(files[1],
>> > length(tmp))))
>> >        else
>> >        combadd<-data.frame(dats=c(tmp), index=c(rep(files[count],
>> > length(tmp))))
>> >        comb<-rbind(comb,combadd)
>> >
>> >        count<-count+1
>> > }
>> > boxplot(dats ~ index, data = comb)
>> >
>> >
>> > This code works, unfortunately the plots get plotted in a different
>> > order
>> > than expected (appears to be more or less random to me). Why is this?
>> >
>> >
>> > Concerning your code: I get an error like:
>> >
>> > Read 2652 items
>> > Read 3310 items
>> > Read 1096 items
>> > Read 2177 items
>> > Read 11387 items
>> > Read 12503 items
>> > Error in list_to_dataframe(res, attr(.data, "split_labels")) :
>> >   Results are not equal lengths
>> >
>> > hmmh?
>> >
>> > Maxim
>> >
>> >
>> > 2010/1/30 baptiste auguie <baptiste.auguie at googlemail.com>
>> >>
>> >> Hi,
>> >>
>> >> Hadley recently proposed a strategy using plyr for a very similar
>> >> problem,
>> >>
>> >> listOfFiles <- list.files()
>> >> names(listOfFiles) <- basename(listOfFiles)
>> >>
>> >> library(plyr)
>> >> d <- ldply(listOfFiles, scan)
>> >>
>> >> Even if you don't want to use plyr, it's always better to group things
>> >> in a list rather than clutter your workspace with lots of assign()ed
>> >> variables.
>> >>
>> >> HTH,
>> >>
>> >> baptiste
>> >>
>> >>
>> >> On 30 January 2010 13:19, Maxim <deeepersound at googlemail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I have many files containing one column of data. I like to use the
>> >> > scan
>> >> > function to parse the data. Next I like to bind to a large vector.
>> >> > I try this like:
>> >> >
>> >> > count<-1
>> >> > files <- list.files()  # all files in the working directory
>> >> > for(i in files) {
>> >> >
>> >> >       tmp <- scan(i)
>> >> >       assign(files[count], tmp)
>> >> >      count<-count+1
>> >> > }
>> >> >
>> >> > This part works!
>> >> >
>> >> > Now I like to plot the data in a boxplot.
>> >> >
>> >> > Usually I do this from individual vectors like:
>> >> >
>> >> > comb <- data.frame(dat = c(vector1, vector2 ......), ind =
>> >> > c(rep('vector1',
>> >> > length(vector1)).......))
>> >> > boxplot(dat ~ ind, data = comb)
>> >> >
>> >> > But how do I do this i a loop?
>> >> >
>> >> > I know the vector names (according to the filenames in the working
>> >> > directory), but I do not how to access them in my R code after having
>> >> > assigned the names.
>> >> >
>> >> > I guess the "lapply" or "dply" from the plyr library can do this, but
>> >> > I
>> >> > seem
>> >> > not to be able to do it.
>> >> >
>> >> > Is there a way to do this?
>> >> >
>> >> > gma
>> >> >
>> >> >        [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >
>> >
>
>



More information about the R-help mailing list