[R] function on factors - how best to proceed
Gustaf Rydevik
gustaf.rydevik at gmail.com
Wed Sep 19 14:43:22 CEST 2007
On 9/19/07, Karin Lagesen <karin.lagesen at medisin.uio.no> wrote:
>
> Sorry about this one being long, and I apologise beforehand if there
> is something obvious here that I have missed. I am new to creating my
> own functions in R, and I am uncertain of how they work.
>
> I have a data set that I have read into a data frame:
>
> > gctable[1:5,]
> refseq geometry X60_origin X60_terminus length kingdom
> 1 NC_009484 cir 1790000 773000 3389227 Bacteria
> 2 NC_009484 cir 1790000 773000 3389227 Bacteria
> 3 NC_009484 cir 1790000 773000 3389227 Bacteria
> 4 NC_009484 cir 1790000 773000 3389227 Bacteria
> 5 NC_009484 cir 1790000 773000 3389227 Bacteria
> grp feature gene begin dir gc_content replicor LEADLAG
> 1 Alphaproteobacteria CDS CDS 261 + 0.654244 RIGHT LEAD
> 2 Alphaproteobacteria CDS CDS 1737 - 0.651408 RIGHT LAG
> 3 Alphaproteobacteria CDS CDS 2902 + 0.607843 RIGHT LEAD
> 4 Alphaproteobacteria CDS CDS 3693 + 0.617647 RIGHT LEAD
> 5 Alphaproteobacteria CDS CDS 4227 + 0.699208 RIGHT LEAD
> >
>
> Most of these columns are factors.
>
> Now, I have a function that I would like to employ on this data
> frame. Right now I cannot get it to work, and that seems to be due to
> the columns in the data frame being factors. I tested it with a data
> frame created from vectors, and it worked fine.
>
> The function:
>
> percentdistance <- function(origin, terminus, length, begin, replicor){
> print(c(origin, terminus, length, begin, repl))
> d = 0
> if (terminus>origin) {
> if(replicor=="LEFT") {
> d = -((origin-begin)%%length)
> }
> else {
> d = (begin-origin)
> }
> }
> else {
> if (replicor=="LEFT") {
> d=(origin-begin)
> }
> else{
> d = -((begin-origin)%%length)
> }
> }
> d/length*2
> }
>
> The error I get:
> > percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, gctable$begin, gctable$replicor)
> [1] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [19] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [37] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [55] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [73] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [91] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [109] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> [127] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87
> .....[99919] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [99937] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [99955] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [99973] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [99991] 2 2 2 2 2 2 2 2 2
> [ reached getOption("max.print") -- omitted 8526091 entries ]]
> Error in if (terminus > origin) { : missing value where TRUE/FALSE needed
> In addition: Warning messages:
> 1: > not meaningful for factors in: Ops.factor(terminus, origin)
> 2: the condition has length > 1 and only the first element will be used in: if (terminus > origin) {
> >
>
> This worked nice when the input were columns from a data frame created
> from vectors.
>
> I have also tried the different apply-functions, although I am
> uncertain of which one would be appropriate here.
>
>
...
>
> Karin
> --
> Karin Lagesen, PhD student
> karin.lagesen at medisin.uio.no
> http://folk.uio.no/karinlag
Hej Karin!
A couple of things:
First, the first warning message tells you that:
1: > not meaningful for factors in: Ops.factor(terminus, origin).
Thus, terminus and origin are factor variables, which cannot be
ordered. You have to convert
them to numerical variables (See the faq for HowTo)
The second warning message tells you that:
2: the condition has length > 1 and only the first element will be
used in: if (terminus > origin)
You are comparing two vectors, which generate a vector of TRUE/FALSE values.
The "if" statement need a single TRUE/FALSE value.
Either use a for loop:
for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
or a nested ifelse statement (which is recommendable on such a big data set).
best,
Gustaf
--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik
More information about the R-help
mailing list