[R] how to combine data of several csv-files

Mon Jul 30 17:24:17 CEST 2007

Ok. I missed the grouping factor
Try this.
You can modify "my factor" to fit your needs.
As to avoid list, I cannot help, sorry
I use them only when I have to collect different classes of objects.

v1 <- NA
v2 <- rnorm(6)
v3 <- rnorm(6)
v4 <- rnorm(6)
v5 <- rnorm(6)
v6 <- rnorm(6)
v7 <- rnorm(6)
v8 <- rnorm(6)
v8 <- NA

df.my <- cbind.data.frame(v1, v2, v3, v4, v5, v6, v7, v8)
(df.my2 <- reshape(df.my,
                  varying=list(c("v1","v2","v3", "v4","v5","v6","v7","v8")),
                  idvar="sequential",
                  timevar="cat",
                  direction="long"
        ))
my.factor <- factor(
                    ifelse(is.na(df.my2$v1), "not.considered",
                           ifelse(df.my2$cat %in% 2:4, "cat1", "cat2")
                           )
                    )
df.my3 <- cbind(df.my2, Correct.Cat =my.factor)
aggregate(df.my2$v1, by=list(category=df.my3$Correct.Cat), mean)
aggregate(df.my2$v1, by=list(category=df.my3$Correct.Cat), 
function(x){sd(x, na.rm = TRUE)})

Antje ha scritto:
> Hello,
>
> thank you for your help. But I guess, it's still not what I want... 
> printing df.my gives me
>
> df.my
>   v1         v2          v3          v4         v5          
> v6           v7 v8
> 1 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 
> -0.046778203 NA
> 2 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 
> -0.233788840 NA
> 3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  
> 0.443908897 NA
> 4 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 
> -1.649740180 NA
> 5 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  
> 0.003991354 NA
> 6 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247 
> -0.228020092 NA
>
> now, I have to combine like this:
>
>   v1         v2          v3          v4         v5          
> v6           v7     v8
>   NA         cat1     cat1         cat1       cat2        cat2         
> cat2   NA
>
> -->
>
> mean(df.my$v2[1],df.my$v3[1],df.my$v4[1])
> mean(df.my$v2[2],df.my$v3[2],df.my$v4[2])
> mean(df.my$v2[3],df.my$v3[3],df.my$v4[3])
> mean(df.my$v2[4],df.my$v3[4],df.my$v4[4])
> mean(df.my$v2[5],df.my$v3[5],df.my$v4[5])
> mean(df.my$v2[6],df.my$v3[6],df.my$v4[6])
>
> the same for v5, v6 and v7
>
> further, I'm not sure how to avoid the list, because this is the 
> result of the processing I did before...
>
> Ciao,
> Antje
>
>
> 8rino-Luca Pantani schrieb:
>> I hope I see.
>>
>> Why not try the following, and avoid lists, which I'm not still able 
>> to manage properly ;-)
>> v1 <- NA
>> v2 <- rnorm(6)
>> v3 <- rnorm(6)
>> v4 <- rnorm(6)
>> v5 <- rnorm(6)
>> v6 <- rnorm(6)
>> v7 <- rnorm(6)
>> v8 <- rnorm(6)
>> v8 <- NA
>> (df.my <- cbind.data.frame(v1, v2, v3, v4, v5, v6, v7, v8))
>> (df.my2 <- reshape(df.my,
>>                  varying=list(c("v1","v2","v3", 
>> "v4","v5","v6","v7","v8")),
>>                  idvar="sequential",
>>                  timevar="cat",
>>                  direction="long"
>>        ))
>> aggregate(df.my2$v1, by=list(category=df.my2$cat), mean)
>> aggregate(df.my2$v1, by=list(category=df.my2$cat), function(x){sd(x, 
>> na.rm = TRUE)})
>>
>>
>> Antje ha scritto:
>>> okay, I played a bit around and now I have some kind of testcase for 
>>> you:
>>>
>>> v1 <- NA
>>> v2 <- rnorm(6)
>>> v3 <- rnorm(6)
>>> v4 <- rnorm(6)
>>> v5 <- rnorm(6)
>>> v6 <- rnorm(6)
>>> v7 <- rnorm(6)
>>> v8 <- rnorm(6)
>>> v8 <- NA
>>>
>>> list <- list(v1,v2,v3,v4,v5,v6,v7,v8)
>>> categ <- c(NA,"cat1","cat1","cat1","cat2","cat2","cat2",NA)
>>>
>>> > list
>>> [[1]]
>>> [1] NA
>>>
>>> [[2]]
>>> [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750
>>>
>>> [[3]]
>>> [1]  0.02354036 -1.36186952 -0.42197792  1.50445971 -1.76763996  
>>> 0.53722404
>>>
>>> [[4]]
>>> [1] -1.40362589  0.13045724 -0.84651458  1.57005071  0.06961015  
>>> 0.25269771
>>>
>>> [[5]]
>>> [1] -1.1829260  2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698
>>>
>>> [[6]]
>>> [1]  1.17099178  0.49248118 -0.18690065  1.50050976 -0.65552410 
>>> -0.01243247
>>>
>>> [[7]]
>>> [1] -0.046778203 -0.233788840  0.443908897 -1.649740180  0.003991354 
>>> -0.228020092
>>>
>>> [[8]]
>>> [1] NA
>>>
>>> now, I need the means (and sd) of element 1 of 
>>> list[2],list[3],list[4] (because they belong to "cat1") and
>>>
>>> = mean(-0.6442149, 0.02354036, -1.40362589)
>>>
>>> the same for element 2 up to element 6 (--> I would the get a vector 
>>> containing the means for "cat1")
>>> the same for the vectors belonging to "cat2".
>>>
>>> does anybody now understand what I mean?
>>>
>>> Antje
>>>
>>>
>>>
>>
>
>

-- 
Ottorino-Luca Pantani, Università di Firenze
Dip. Scienza del Suolo e Nutrizione della Pianta
P.zle Cascine 28 50144 Firenze Italia
Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 
OLPantani at unifi.it