[R] How to join data.frames and vectors of different length, in an inteligent way?

Chuck Cleland ccleland at optonline.net
Tue Jun 10 16:24:43 CEST 2008


   You could put the group averages back into dafSamp using ave():

dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),
                  c(117,73,92,113,80,78,98,106,99)))

dafSamp$Ay <- ave(dafSamp$X2, dafSamp$X1, FUN=mean)

dafSamp$vecAA <- dafSamp$X2 * (dafSamp$Ay / mean(dafSamp$X2))

dafSamp
     X1  X2       Ay     vecAA
1 1972 117 117.0000 143.92640
2 1984  73  89.5000  68.69334
3 1969  92  92.0000  88.99065
4 1976 113 103.3333 122.76869
5 1999  80  80.0000  67.28972
6 1996  78  78.0000  63.96729
7 1976  98 103.3333 106.47196
8 1984 106  89.5000  99.74650
9 1976  99 103.3333 107.55841

?ave

On 6/10/2008 9:05 AM, Hvidberg, Martin wrote:
> I have a data set something like this:
> 
>  
> 
> "YYYY", "Value"
> 
> 1972 , 117
> 
> 1984 , 73
> 
> 1969 , 92
> 
> 1976 , 113
> 
> 1999 , 80
> 
> 1996 , 78
> 
> 1976 , 98
> 
> 1984 , 106
> 
> 1976 , 99
> 
>  
> 
> it could be created with:
> 
>> dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,73,92,113,80,78,98,106,99)))
> 
>  
> 
> The real dataset is of cause much larger, app. 100.000 samples
> 
>  
> 
> I need to adjust each value to remove any tendency of some years generally having higher values and others lower, since this is an unwanted artifact from different measuring traditions.
> 
> My plan is to generate an average for each year Ay, as well as a global average Ag. Then each value should be multiplied by Ay/Ag.
> 
>  
> 
>  
> 
> I can make the averages like this:
> 
>  
> 
>> Ag <- mean(dafSamp[,2])
> 
>> Ag
> 
> [1] 95.11111
> 
>  
> 
>> Ay <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='mean')
> 
>> Ay
> 
>   Group.1        x
> 
> 1    1969  92.0000
> 
> 2    1972 117.0000
> 
> 3    1976 103.3333
> 
> 4    1984  89.5000
> 
> 5    1996  78.0000
> 
> 6    1999  80.0000
> 
>  
> 
>  
> 
> To see how many samples from each year I could write:
> 
>  
> 
>> Cy <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='length')
> 
>> Cy
> 
>   Group.1 x
> 
> 1    1969 1
> 
> 2    1972 1
> 
> 3    1976 3
> 
> 4    1984 2
> 
> 5    1996 1
> 
> 6    1999 1
> 
>  
> 
>  
> 
> I would like to create a new vector with the adjusted values (dafSmap[,2] * Ay(for a relevant year) / Ag)
> 
>  
> 
> I tried to write:
> 
>  
> 
> vecAA <- dafSamp[,2] *  Ay[which(Ay[,1]==dafSamp[,1]),2] / Ag
> 
>  
> 
> but the result is all NAs :-( Might have seen that coming, Not the same length...
> 
>  
> 
> Question: How do I go about making such calculation?
> 
>  
> 
> :-) Martin Hvidberg
> 
>  
> 
> Here is the code in full, if you want to try it...
> 
>  
> 
> dafSamp <- data.frame(cbind(c(1972,1984,1969,1976,1999,1996,1976,1984,1976),c(117,73,92,113,80,78,98,106,99)))
> 
> Ag <- mean(dafSamp[,2])
> 
> Ag
> 
> Ay <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='mean')
> 
> Ay
> 
> Cy <- aggregate(x=dafSamp[,2], by=list(dafSamp[,1]), FUN='length')
> 
> Cy
> 
> vecAA <- dafSamp[,2] *  Ay[which(Ay[,1]==dafSamp[,1]),2] / Ag
> 
>  
> 
> 
> 
> 
>  
> 	University of Aarhus <http://www.au.dk/en>  	Danmarks Miljøundersøgelser <http://www.dmu.dk/>  	
> 	
> Hvidberg, Martin <http://www2.dmu.dk/1_Om_DMU/2_medarbejdere/cv/employee2_NH.asp?PersonID=MHV>  
> Senior Geographer (Climatology, Spatial modeling) <http://www.geogr.ku.dk/>  
> N 55°41m43.48s E 12°06m05.13s ETRS89
> National Environmental Research Inst. <http://www.dmu.dk/International/>  
> P.O. Box 358 
> Frederiksborgvej 399 
> DK-4000 Roskilde	
> Martin.Hvidberg at dmu.dk 
> www.dmu.dk/AtmosphericEnvironment/ 	tel:
> fax: 	+45 46 30 11 55
> +45 46 30 12 14 	
> 
> 	[[alternative HTML version deleted]]
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list