[R] repeated searching of nomissing values
Bert Gunter
gunter.berton at gene.com
Thu Dec 11 00:39:13 CET 2008
Yes. Read the help pages **carefully**!
e.g. ?tapply says that the first argument is an **atomic** vector. A
factor is not an atomic vector. So tapply interprets it as such by looking
only at its representation, which is as integer values.
apply works on **arrays,** which must be of a single type. So it silently
converts the data frame to the simplest common type it "can," which is an
array of characters.
etc.
I admit that these details are somewhat obscure and even annoying  but
they **are** documented. I think that's all we can expect. Some have
lamented the lack of the language's perfect consistency in these matters,
but I cannot understand how that would be possible given its nature,
intended, as it is, to be **easily** used for high level data manipulation,
graphics,statistical analysis etc. as well as programming. There are just
too many possible data structures to expect logical consistency in their
handling throughout (if one can even define what that means in specific
instances!). All these little inconveniences can be worked around easily, of
course. For example, if your new vector of numeric factor levels if f.new
and f.old is your original factor, levels(f.old)[f.new] converts f.new to
the appropriate character vector. And so forth. So the key is: pay
**careful** attention to the docs.
 Bert Gunter
Original Message
From: rhelpbounces at rproject.org [mailto:rhelpbounces at rproject.org] On
Behalf Of Patrizio Frederic
Sent: Wednesday, December 10, 2008 2:09 PM
To: rhelp at rproject.org
Subject: [R] repeated searching of nomissing values
hi all,
I have a data frame such as:
1 blue 0.3
1 NA 0.4
1 red NA
2 blue NA
2 green NA
2 blue NA
3 red 0.5
3 blue NA
3 NA 1.1
I wish to find the last nonmissing value in every 3ple: ie I want a 3
by 3 data.frame such as:
1 red 0.4
2 blue NA
3 blue 1.1
I have written a little script
data = structure(list(V1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), V2 = structure(c(1L, NA, 3L, 1L, 2L, 1L, 3L, 1L, NA), .Label = c("blue",
"green", "red"), class = "factor"), V3 = c(0.3, 0.4, NA, NA,
NA, NA, 0.5, NA, 1.1)), .Names = c("V1", "V2", "V3"), class =
"data.frame", row.names = c(NA,
9L))
cl = function(x) x[max(which(!is.na(x)))]
choose.last = function(x) tapply(x,x[,1],cl)
# now function choose.last works properly on numeric vectors:
> choose.last(data[,3])
1 2 3
0.4 NA 1.1
# but not on factors (I loose the factor labels):
> choose.last(data[,2])
1 2 3
3 1 1
# moreover, if I apply this function to the whole data.frame
# the output is a character matrix
> apply(data,2,choose.last)
V1 V2 V3
1 "1" "red" "0.4"
2 "2" "blue" NA
3 "3" "blue" "1.1"
# and if I sapply, I loose factors labels
> sapply(data,choose.last)
V1 V2 V3
1 1 3 0.4
2 2 1 NA
3 3 1 1.1
any hint?
Thanks in advance,
Patrizio
+
 Patrizio Frederic, PhD
 Research associate in Statistics,
 Department of Economics,
 University of Modena and Reggio Emilia,
 Via Berengario 51,
 41100 Modena, Italy

 tel: +39 059 205 6727
 fax: +39 059 205 6947
 mail: patrizio.frederic at unimore.it
+
______________________________________________
Rhelp at rproject.org mailing list
https://stat.ethz.ch/mailman/listinfo/rhelp
PLEASE do read the posting guide http://www.Rproject.org/postingguide.html
and provide commented, minimal, selfcontained, reproducible code.
More information about the Rhelp
mailing list