[R] Extracting arithmetic mean for specific values from multiple .txt-files

Rui Barradas ruipbarradas at sapo.pt
Mon Jul 9 20:44:05 CEST 2012


Hello,

There must be a difference in the file you are processing and in the one 
excel and I are:


 > fun <- function(x){
+ dat <- read.table(x, skip=14)
+ dat[ , 8] <- as.numeric(gsub("\\.", "", dat[, 8]))
+ mean(dat[, 8])
+ }
 >
 > sapply(list.files(pattern="XYZ.*\\.txt"), fun)
XYZ_34.txt
   345210.4

This result is even better, more accurate than excel's.

AS for the second question, because with the dots, those values are read 
by R as character and when put into the data.frame converted to factors, 
the name R gives to categorical variables. You can see this with the 
instruction, right after the read.table,

print(str(dat))
 > str(dat)
'data.frame':   151 obs. of  8 variables:
  $ V1: int  2 2 2 2 2 2 2 2 2 2 ...
  $ V2: int  1 2 3 4 5 6 7 8 9 10 ...
  $ V3: int  1 2 3 4 5 6 7 8 9 10 ...
  $ V4: int  3 2 4 3 3 1 3 1 3 2 ...
  $ V5: int  27 16 16 27 27 27 27 27 27 16 ...
  $ V6: int  0 16 16 16 27 27 27 27 16 16 ...
  $ V7: int  6 1 1 2 1 1 1 1 2 1 ...
  $ V8: Factor w/ 151 levels "1.200.995","247.102",..: 1 139 135 39 133 
73 142 63 77 67 ...

It's V8 the column we want. The real values are 1 139 135 39 etc. The 
levels are categories labels, the categories themselves are the 1-based 
integer values.

Anyway, what's important is that the code is working, and if there's an 
error maybe it can be solved with this modification:


fun <- function(x, skip = 14){
	dat <- read.table(x, skip=skip)

And the rest is the same. Inspect the file and see if the data starts at 
line 15.

(And please, Rui is enough, NO 'Mr.')

Hope this helps,

Rui Barradas

Em 09-07-2012 14:54, vimmster escreveu:
> Dear Mr. Barradas,
>
> your solution comes very close to what I want.
>
> But I have two questions left:
>
>
> First question: If "R" computes the mean for the reaction times of test
> subject 34 (the example I provided above), it says "310112.0", but if I use
> the "mean"-function in Excel it says "345.210". Apart from the dots in themaybe
> column of interest (which you mentioned before), the mean is obviously not
> the same. Do you have any idea why?
>
> Second question: Why are the dots in the column of interest problematic?
>
> Kind regards
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Extracting-arithmetic-mean-for-specific-values-from-multiple-txt-files-tp4635809p4635854.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list