[R] merge a list of data frames

PIKAL Petr petr.pikal at precheza.cz
Thu Sep 6 18:27:22 CEST 2012


Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Sam Steingold
> Sent: Thursday, September 06, 2012 3:43 PM
> To: David Winsemius
> Cc: r-help at r-project.org
> Subject: Re: [R] merge a list of data frames
> 
> > * David Winsemius <qjvafrzvhf at pbzpnfg.arg> [2012-09-05 21:02:16 -
> 0700]:
> >
> > On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:
> >
> >> I have a list of data frames:
> >>
> >>> str(data)
> >> List of 4
> >> $ :'data.frame':	700773 obs. of  3 variables:
> >>  ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
> >> "200030633708779" "200010587002779" ...
> >>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
> >>  ..$ V3: num [1:700773] 1 1 1 1 1 ...
> >> $ :'data.frame':	700773 obs. of  3 variables:
> >>  ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
> >> "200030633708779" "200010587002779" ...
> >>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
> >>  ..$ V3: num [1:700773] 1 1 1 1 1 ...
> >> $ :'data.frame':	700773 obs. of  3 variables:
> >>  ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
> >> "200030633708779" "200010587002779" ...
> >>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
> >>  ..$ V3: num [1:700773] 1 1 1 1 1 ...
> >> $ :'data.frame':	700773 obs. of  3 variables:
> >>  ..$ V1: chr [1:700773] "200160325893778" "200130647544079"
> >> "200130446465779" "200120186959078" ...
> >>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
> >>  ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...
> >>
> >> I want to merge them.
> >
> > Why? What are you expecting?
> 
> these are the results of applying a model to the test data.
> the first column is the ID
> the second column is the actual value
> the third column is the model score
> 
> after I will merge the frames, I will
> 1. check that all the V2 columns are identical and drop all but one (I
> guess I could just merge on c("V1","V2") instead, right?)

colSums(apply(do.call(cbind,lapply(data, "[", "V2")),1,diff)!=0)

shall give you 0 if there is no difference 

> 
> 2. compute the sum (or the mean, whatever is easier) of all the V3
> columns

sapply(lapply(data, "[", "V3"), sum)
sapply(lapply(data, "[", "V3"), mean)

shall give you table of means or sums. Sorting them is straightforward

The most tedious part of my response was to prepare toy data. So please, maybe you shall be kind to us to provide them by an appropriate way

dput(header(data))

Regards
Petr

> 3. sort by the sum/mean of the V3 columns and evaluate the combined
> model using the lift quality metric
> (http://dl.acm.org/citation.cfm?id=380995.381018)
> 
> I have many more score files (not just 4), so it is not practical for
> me to rename the column to something unique.
> 
> 
> 
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
> 11.0.11103000 http://www.childpsy.net/ http://www.memritv.org
> http://truepeace.org http://jihadwatch.org http://mideasttruth.com
> http://americancensorship.org To be popular with ladies one has to be
> smart, handsome & rich. Or to be a cat.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list