[R] < symbols in a data frame

Bert Gunter gunter.berton at gene.com
Wed Jul 9 19:41:07 CEST 2014


Well, ?grep and ?regex are clearly apropos here -- dealing with
character data is an essential skill for handling input from diverse
sources with various formatting conventions. I suggest you go through
one of the many regular expression tutorials on the web to learn more.

But this may not be the important issue here at all. If "<k" means the
value is left censored at k -- i.e. we know it's less than k but not
how much less -- than Sarah's proposal is not what you want to do.
Exactly what you do want to do depends on context, and as it concerns
statistical methodology, is not something that should be discussed
here. Consult a local statistician if this is a correct guess.
Otherwise ignore.

... and please post in plain text in future (as requested) as HTML can
get garbled.

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi Sam,
>
> I'd take the similar tack of removing the < instead. Note that if you
> import the data frame using the stringsAsFactors=FALSE argument, you
> don't need the first step.
>
> metals$Cedar.Creek <- as.character(metals$Cedar.Creek)
> metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek)
> metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek)
>
> R> str(metals)
> 'data.frame':    19 obs. of  2 variables:
>  $ Parameter  : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6
> 7 8 9 10 11 ...
>  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...
>
> Sarah
>
>
> On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers <tonightsthenight at gmail.com> wrote:
>> Hello,
>>
>> I have recently received a dataset from a metal analysis company. The
>> dataset is filled with less than symbols. What I am looking for is a
>> efficient way to subset for any whole numbers from the dataset. The column
>> is automatically formatted as a factor because of the "<" symbols making it
>> difficult to deal with the numbers is a useful way.
>>
>> So in sum any ideas on how I could subset the example below for only whole
>> numbers?
>>
>> Thanks in advance!
>>
>> Sam
>>
>> #code
>>
>> metals <-
>>
>>
>> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
>> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
>> = c("Antimony",
>> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
>> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
>> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
>> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L,
>> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
>> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
>> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
>> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
>> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
>> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
>> "77", "89", "951"), class = "factor")), .Names = c("Parameter",
>> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list