[R] How do I coerce numeric factor columns of data frame to vector?
Murray Jorgensen
maj at stats.waikato.ac.nz
Tue Sep 9 03:25:27 CEST 2003
Hi Thomas et al,
checking the code that read the frame, I see that the problem was indeed
caused by missing value codes at the read.table() stage. However I did
not want to re-visit the reading stages again with these frames. (To
show why not I include the code that read them, which you may recognise
from an earlier thread in which I got some help from Andy Liaw.)
Murray
nam.vec
<-c(“min.pkt.sz”,”pkt.count”,”bytes”,”duration”,”m1.psz”,”m1.count”,”m2.psz”,”m2.count”,”m3.psz”,”m3.count”,”iat.min”,”iat.max”,”m1.iat”,”m1.iat.count”,”m2.iat”,”m2.iat.count”,”m3.iat”,”m3.iat.count”,”port”,”ip.address”,“min.pkt.sz2”,”pkt.count2”,”bytes2”,
”m1.psz2”,”m1.count2”,”m2.psz2”,”m2.count2”,”m3.psz2”,”m3.count2”,”iat.min2”,”iat.max2”,”m1.iat2”,”m1.iat.count2”,”m2.iat2”,”m2.iat.count2”,”m3.iat2”,”m3.iat.count2”,”port2”,”ip.address2”,”diff.min.psz”,”diff.max.psz”)
flines <- 107165
slines <- 3000
sel6 <- sample(flines,3000*6)
selected1 <- sort(sel6[1:3000])
selected2 <- sort(sel6[3001:6000])
selected3 <- sort(sel6[6001:9000])
selected4 <- sort(sel6[9001:12000])
selected5 <- sort(sel6[12001:15000])
selected6 <- sort(sel6[15001:18000])
select.frame <- function(selected) {
strvec <- rep("",slines)
selected <- sort(sample(flines, slines))
skip <- c(0, diff(selected) - 1)
fcon <- file("c:/data/perry/data.csv", open="r")
for (i in 1:length(skip)) {
## skip to the selected line
readLines(fcon, n=skip[i])
strvec[i] <- readLines(fcon, n=1)
}
close(fcon)
sel.flows <- read.table(textConnection(strvec), header=FALSE, sep=",")
names(sel.flows) <- nam.vec
sel.flows
}
Thomas W Blackwell wrote:
> Michael -
>
> Because these columns are factors to begin with, using as.numeric()
> alone will have unexpected results. See the section "Warning:" in
> help("factor").
>
> However, it is worth Murray asking himself WHY these columns are
> factors to start with, rather than the expected numeric values.
> One frequent source of this is using read.table() on a file
> which contains column headers without setting header=T. Then,
> the character string in the first row of each column prevents
> numeric conversion of all of the other rows. Another possible
> difficulty is an unusual missing value code, or commas in place
> of decimal points, or anything else, somewhere in the file that
> does not convert automatically to numeric. Maybe it's worth
> editing the original data file before Murray reads it in.
>
> Hmmm. I think I ought to have offered these many cents worth
> with my earlier reply.
>
> - tom blackwell - u michigan medical school - ann arbor -
>
> On Mon, 8 Sep 2003, Michael A. Miller wrote:
>
>
>>>>>>>"Murray" == Murray Jorgensen <maj at stats.waikato.ac.nz> writes:
>>
>> > I have just noticed that quite a few columns of a data
>> > frame that I am working on are numeric factors. For
>> > summary() purposes I want them to be vectors.
>>
>>Do you want them to be vectors or do you want numeric values? If
>>the later, try as.numeric instead of as.vector:
>>
>>
>>>as.vector(factor(rep(seq(4),3)))
>>
>> [1] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4"
>>
>>>as.numeric(factor(rep(seq(4),3)))
>>
>> [1] 1 2 3 4 1 2 3 4 1 2 3 4
>>
>>>summary(as.vector(factor(rep(seq(4),3))))
>>
>> Length Class Mode
>> 12 character character
>>
>>>summary(as.numeric(factor(rep(seq(4),3))))
>>
>> Min. 1st Qu. Median Mean 3rd Qu. Max.
>> 1.00 1.75 2.50 2.50 3.25 4.00
>>
>>Mike
>
>
>
--
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz Fax 7 838 4155
Phone +64 7 838 4773 wk +64 7 849 6486 home Mobile 021 1395 862
More information about the R-help
mailing list