[R] Help with data.frames
Peter Dalgaard BSA
p.dalgaard at biostat.ku.dk
Tue Jul 25 16:23:15 CEST 2000
"Heberto Ghezzo" <Heberto at meakins.lan.mcgill.ca> writes:
> Hi R experts, I have a problem. . .
> I am using R 1.1 in Win98
> if I do:
>
> age <- c( 20,25,30,50,20,30)
> sex <- c("M","F","F","M","M","F")
> hgt <- c(1.80,1.65,1.70,1.75,1.85,1.68)
> obs <-c("first","second","third","fourth","fifth","sixth")
> base<-data.frame(rbind(age=age,sex=sex,hgt=hgt,obs=obs))
^^^^^
cbind, right?
> base
> age sex hgt obs
> 1 20 M 1.8 first
> 2 25 F 1.65 second
> 3 30 F 1.7 third
> 4 50 M 1.75 fourth
> 5 20 M 1.85 fifth
> 6 30 F 1.68 sixth
>
> i.e. I created a data.frame with I think, 2 real columns, 1 factor and
> 1 character. . . I thought
Nope. In binding variables together, they form a matrix and all
elements of a matrix must have the same type. So all columns become
character variables.
A better guess would have been
base<-data.frame(age, sex, hgt, obs)
except that that turns "obs" into a factor, which you don't seem to
want. This works:
> base<-data.frame(age, sex, hgt, obs=I(obs))
> summary(base)
age sex hgt obs
Min. :20.00 F:3 Min. :1.650 Length:6
1st Qu.:21.25 M:3 1st Qu.:1.685 Class :AsIs
Median :27.50 Median :1.725 Mode :character
Mean :29.17 Mean :1.738
3rd Qu.:30.00 3rd Qu.:1.788
Max. :50.00 Max. :1.850
> lage<-log(age)
> base2<-cbind(base,lage)
> base2
> age sex hgt obs lage
> 1 20 M 1.8 first 2.995732
> 2 25 F 1.65 second 3.218876
> 3 30 F 1.7 third 3.401197
> 4 50 M 1.75 fourth 3.912023
> 5 20 M 1.85 fifth 2.995732
> 6 30 F 1.68 sixth 3.401197
>
> I can add a numeric column with no problems. . .I thought
>
> Now I want to add a new observation. .
>
> log(40)
> [1] 3.688879
> new.guy <- c(40,"M",1.82,"seventh",3.688879)
> base<-rbind(base,new.guy)
again, you're c()'ing elements of different types, so they all come
out as character:
> c(40,"M",1.82,"seventh",3.688879)
[1] "40" "M" "1.82" "seventh" "3.688879"
and rbind()ing that onto base will coerce the columns of base first to
character and then to factor, even with base itself correctly defined.
You need something like
> new.guy <- list(40,"M",1.82,"seventh")
> rbind(base,new.guy)
age sex hgt obs
1 20 M 1.80 first
2 25 F 1.65 second
3 30 F 1.70 third
4 50 M 1.75 fourth
5 20 M 1.85 fifth
6 30 F 1.68 sixth
7 40 M 1.82 seventh
(Notice that you don't want new.guy as a data frame because you would
have to say data.frame(age=40,sex="M",hgt=1.82,obs=I("seventh"))) in
order to (a) get the same names as in base and (b) avoid turning
"seventh" into a factor.)
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list