[R] data.frame transformation
Bill.Venables at csiro.au
Bill.Venables at csiro.au
Tue Mar 15 00:16:49 CET 2011
It is possible to do it with numeric comparisons, as well, but to make life comfortable you need to turn off the warning system temporarily.
df <- data.frame(q1 = c(0,0,33.33,"check"),
q2 = c(0,33.33,"check",9.156),
q3 = c("check","check",25,100),
q4 = c(7.123,35,100,"check"))
conv <- function(x, cutoff) {
oldOpt <- options(warn = -1)
on.exit(options(oldOpt))
x <- as.factor(x)
lev <- as.numeric(levels(x))
levels(x)[!is.na(lev) & lev < cutoff] <- "."
x
}
Check:
> (df1 <- data.frame(lapply(df, conv, cutoff = 10)))
q1 q2 q3 q4
1 . . check .
2 . 33.33 check 35
3 33.33 check 25 100
4 check . 100 check
>
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
Sent: Tuesday, 15 March 2011 6:29 AM
To: andrija djurovic
Cc: r-help at r-project.org
Subject: Re: [R] data.frame transformation
On Mar 14, 2011, at 3:51 PM, andrija djurovic wrote:
> I would like to hide cells with values less the 10%, so "." or just
> "" doesn't make me any difference. Also I used apply combined with
> as.character:
>
> apply(df, 2, function(x) ifelse(as.character(x) < 10,".",x))
>
> This is, probably not a good solution, but it works except that I
> lose row names and because of that I was wondering if there is some
> other way to do this.
>
> Anyway thank you both i will try to do this before combining numbers
> and strings.
I saw your later assertion that it didn't work which surprised me. My
version of your data followed my advice not to use factors and your
effort did succeed when the columns were character rather than factor.
I put back the row numbers by coercing back to a data.frame. `apply`
returns a matrix.
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33," check",9.156),
+ q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),
stringsAsFactors=FALSE)
> as.data.frame(apply(df, 2, function(x) ifelse(as.character(x) <
10,".",x)))
q1 q2 q3 q4
1 . . check 7.123
2 . 33.33 check 35
3 33.33 . 25 100
4 check 9.156 100 check
There is a danger of using character collation in that if there are
any leading characters in those strings that are below "1" such as a
<blank> or any other punctuation, they will get "dotted".
> "," < "1"
[1] TRUE
> "." < "1"
[1] TRUE
> "-" < "1"
[1] TRUE
And "1.check" would also get "dotted"
> "1.check" < 10
[1] TRUE
>
> Andrija
>
> On Mon, Mar 14, 2011 at 8:11 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>
> On Mar 14, 2011, at 2:52 PM, andrija djurovic wrote:
>
> Hi R users,
>
> I have following data frame
>
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
> q3=c("check","check",25,100),q4=c(7.123,35,100,"check"))
>
> and i would like to replace every element that is less then 10
> with . (dot)
> in order to obtain this:
>
> q1 q2 q3 q4
> 1 . . check .
> 2 . 33.33 check 35
> 3 33.33 check 25 100
> 4 check . 100 check
>
> I had a lot of difficulties because each variable is factor.
>
> Right, so comparisons with "<" will throw an error. I would
> sidestep the factor problem with stringsAsFactors=FALSE in the
> data.frame call. You might want to reconsider the "." as a missing
> value. If you are coming from a SAS background, you should try to
> get comfortable with NA or NA_character as a value.
>
>
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
> q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),
> stringsAsFactors=FALSE)
>
> is.na(df) <- t(apply(df, 1, function(x) as.numeric(x) < 10))
>
> Warning messages:
> 1: In FUN(newX[, i], ...) : NAs introduced by coercion
> 2: In FUN(newX[, i], ...) : NAs introduced by coercion
> 3: In FUN(newX[, i], ...) : NAs introduced by coercion
> 4: In FUN(newX[, i], ...) : NAs introduced by coercion
> > df
> q1 q2 q3 q4
> 1 <NA> <NA> check <NA>
> 2 <NA> 33.33 check 35
>
> 3 33.33 check 25 100
> 4 check <NA> 100 check
>
>
> Could someone help me with this?
>
> Thanks in advance for any help.
>
> Andrija
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list