[R] Calculating mean for a number of columns

David Winsemius dwinsemius at comcast.net
Wed Feb 10 19:28:42 CET 2010


On Feb 10, 2010, at 1:18 PM, Steve Murray wrote:

>
> Dear all,
>
> I am attempting to perform what should be a relatively simple  
> calculation on a number of data frame columns. I am hoping to find  
> the average on a per-row basis for each of the 50 columns. If on a  
> particular row a 'NA' value is encountered, then this should be  
> ignored and the mean formed on the basis of the other rows.
>
> However, I'm finding that the end result for each row is identical  
> (when it shouldn't be). I suspect I'm making a syntax error, but  
> can't seem to spot it...
>
>> PDSI_Jan  <- cbind(pdsi_195101[,1:2], mean(c(pdsi_195101[,3],  
>> pdsi_195201[,3], pdsi_195301[,3], pdsi_195401[,3], pdsi_195501[,3],  
>> pdsi_195601[,3], pdsi_195701[,3], pdsi_195801[,3], pdsi_195801[,3],  
>> pdsi_195901[,3], pdsi_196001[,3], pdsi_196101[,3], pdsi_196201[,3],  
>> pdsi_196301[,3], pdsi_196401[,3], pdsi_196501[,3], pdsi_196601[,3],  
>> pdsi_196701[,3], pdsi_196801[,3], pdsi_196901[,3], pdsi_197001[,3],  
>> pdsi_197101[,3], pdsi_197201[,3], pdsi_197301[,3], pdsi_197401[,3],  
>> pdsi_197501[,3], pdsi_197601[,3], pdsi_197701[,3], pdsi_197801[,3],  
>> pdsi_197901[,3], pdsi_198001[,3], pdsi_198101[,3], pdsi_198201[,3],  
>> pdsi_198301[,3], pdsi_198401[,3], pdsi_198501[,3], pdsi_198601[,3],  
>> pdsi_198701[,3], pdsi_198801[,3], pdsi_198901[,3], pdsi_199001[,3],  
>> pdsi_199101[,3], pdsi_199201[,3], pdsi_199301[,3], pdsi_199401[,3],  
>> pdsi_199501[,3], pdsi_199601[,3], pdsi_199701[,3], pdsi_199801[,3],  
>> pdsi_199901[,3], pdsi_200001[,3])), na.rm=TRUE)
>

It's not a syntax error (which would have allowed the parser to spot  
it.) It's a semantic error (which requires an inter-aural parser). You  
have created the mean of an extremely long vector using c() and that  
single value is then being replicated to fill in each of the rows. You  
probably want to use lapply on a list or sapply on a vector with  
na.rm=TRUE being supplied as subsidiary argument.
>
> The object structure for each of the data frames being used is as  
> follows:
>
>> str(pdsi_195101)
> 'data.frame':    2756 obs. of  3 variables:
>  $ Latitude : chr  "-48.75" "-51.25" "-53.75" "-48.75" ...
>  $ Longitude: Factor w/ 144 levels "-178.75","-176.25",..: 1 1 1 2 3  
> 3 4 6 6 6 ...
>  $ PDSI     : num  4.7 -1.94 -1.29 -0.68 -0.66 -0.49 -0.51 2.52 3.68  
> 4.17 ...
>
>
> Many thanks for any help offered,
>
> Steve
> 		 	   		


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list