[R] Help with data.frames

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Tue Jul 25 16:23:15 CEST 2000


"Heberto Ghezzo" <Heberto at meakins.lan.mcgill.ca> writes:

> Hi  R experts, I have a problem. . .
> I am using R 1.1 in Win98
> if I do:
> 
> age <- c( 20,25,30,50,20,30)
> sex <- c("M","F","F","M","M","F")
> hgt <- c(1.80,1.65,1.70,1.75,1.85,1.68)
> obs <-c("first","second","third","fourth","fifth","sixth")
> base<-data.frame(rbind(age=age,sex=sex,hgt=hgt,obs=obs))
                   ^^^^^
cbind, right?

>  base
>   age sex  hgt    obs
> 1  20   M  1.8  first
> 2  25   F 1.65 second
> 3  30   F  1.7  third
> 4  50   M 1.75 fourth
> 5  20   M 1.85  fifth
> 6  30   F 1.68  sixth
> 
> i.e. I created a data.frame with I think, 2 real columns, 1 factor and 
> 1 character. . . I thought

Nope. In binding variables together, they form a matrix and all
elements of a matrix must have the same type. So all columns become
character variables.

A better guess would have been

 base<-data.frame(age, sex, hgt, obs)

except that that turns "obs" into a factor, which you don't seem to
want. This works:

> base<-data.frame(age, sex, hgt, obs=I(obs))
> summary(base)
      age        sex        hgt            obs           
 Min.   :20.00   F:3   Min.   :1.650   Length:6          
 1st Qu.:21.25   M:3   1st Qu.:1.685   Class :AsIs       
 Median :27.50         Median :1.725   Mode  :character  
 Mean   :29.17         Mean   :1.738                     
 3rd Qu.:30.00         3rd Qu.:1.788                     
 Max.   :50.00         Max.   :1.850                     


> lage<-log(age)
> base2<-cbind(base,lage)
> base2
>   age sex  hgt    obs     lage
> 1  20   M  1.8  first 2.995732
> 2  25   F 1.65 second 3.218876
> 3  30   F  1.7  third 3.401197
> 4  50   M 1.75 fourth 3.912023
> 5  20   M 1.85  fifth 2.995732
> 6  30   F 1.68  sixth 3.401197
> 
> I can add a numeric column with no problems. . .I thought
> 
> Now I want to add a new observation. .
> 
>  log(40)
> [1] 3.688879
>  new.guy <- c(40,"M",1.82,"seventh",3.688879)
>  base<-rbind(base,new.guy)

again, you're c()'ing elements of different types, so they all come
out as character:

> c(40,"M",1.82,"seventh",3.688879)
[1] "40"       "M"        "1.82"     "seventh"  "3.688879"

and rbind()ing that onto base will coerce the columns of base first to
character and then to factor, even with base itself correctly defined.

You need something like

> new.guy <- list(40,"M",1.82,"seventh")
> rbind(base,new.guy)
  age sex  hgt     obs
1  20   M 1.80   first
2  25   F 1.65  second
3  30   F 1.70   third
4  50   M 1.75  fourth
5  20   M 1.85   fifth
6  30   F 1.68   sixth
7  40   M 1.82 seventh

(Notice that you don't want new.guy as a data frame because you would
have to say data.frame(age=40,sex="M",hgt=1.82,obs=I("seventh"))) in
order to (a) get the same names as in base and (b) avoid turning
"seventh" into a factor.)

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list