[R] < symbols in a data frame

Wed Jul 9 20:02:59 CEST 2014

Thanks for all the responses. It sometimes difficult to outline
exactly what you need. These response were helpful to get there.
Speaking to Bert's point a bit, I needed a column to identify where
the < symbol was used. If I knew more about R I think I might be
embarrassed to post my solution to that problem but here is how I used
Sarah's solution but still kept the info about detection limits. I'm
sure there is a more elegant way:

metals <-
structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
= c("Antimony",
"Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
"Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
"Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
"Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L,
3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
"<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
"1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
"22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
"516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
"77", "89", "951"), class = "factor")), .Names = c("Parameter",
"Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")

metals$temp1<-metals$Cedar.Creek
metals$Cedar.Creek <- as.character(metals$Cedar.Creek)
metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek)
metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek)

metals$temp2<-metals$temp1==metals$Cedar.Creek
metals$Detection<-factor(ifelse(metals$temp2=="TRUE","Measured","Limit"))
metals[,c(1,2,5)]

Thanks again!

Sam

On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter <gunter.berton at gene.com> wrote:
> Well, ?grep and ?regex are clearly apropos here -- dealing with
> character data is an essential skill for handling input from diverse
> sources with various formatting conventions. I suggest you go through
> one of the many regular expression tutorials on the web to learn more.
>
> But this may not be the important issue here at all. If "<k" means the
> value is left censored at k -- i.e. we know it's less than k but not
> how much less -- than Sarah's proposal is not what you want to do.
> Exactly what you do want to do depends on context, and as it concerns
> statistical methodology, is not something that should be discussed
> here. Consult a local statistician if this is a correct guess.
> Otherwise ignore.
>
> ... and please post in plain text in future (as requested) as HTML can
> get garbled.
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
>> Hi Sam,
>>
>> I'd take the similar tack of removing the < instead. Note that if you
>> import the data frame using the stringsAsFactors=FALSE argument, you
>> don't need the first step.
>>
>> metals$Cedar.Creek <- as.character(metals$Cedar.Creek)
>> metals$Cedar.Creek <- gsub("<", "", metals$Cedar.Creek)
>> metals$Cedar.Creek <- as.numeric(metals$Cedar.Creek)
>>
>> R> str(metals)
>> 'data.frame':    19 obs. of  2 variables:
>>  $ Parameter  : Factor w/ 20 levels "Antimony","Arsenic",..: 1 2 3 4 6
>> 7 8 9 10 11 ...
>>  $ Cedar.Creek: num  100 100 500 100 10 1000 100 516 550 10 ...
>>
>> Sarah
>>
>>
>> On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers <tonightsthenight at gmail.com> wrote:
>>> Hello,
>>>
>>> I have recently received a dataset from a metal analysis company. The
>>> dataset is filled with less than symbols. What I am looking for is a
>>> efficient way to subset for any whole numbers from the dataset. The column
>>> is automatically formatted as a factor because of the "<" symbols making it
>>> difficult to deal with the numbers is a useful way.
>>>
>>> So in sum any ideas on how I could subset the example below for only whole
>>> numbers?
>>>
>>> Thanks in advance!
>>>
>>> Sam
>>>
>>> #code
>>>
>>> metals <-
>>>
>>>
>>> structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L,
>>> 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label
>>> = c("Antimony",
>>> "Arsenic", "Barium", "Beryllium", "Boron (Hot Water Soluble)",
>>> "Cadmium", "Chromium", "Cobalt", "Copper", "Lead", "Mercury",
>>> "Molybdenum", "Nickel", "pH 1:2", "Selenium", "Silver", "Thallium",
>>> "Tin", "Vanadium", "Zinc"), class = "factor"), Cedar.Creek = structure(c(3L,
>>> 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L,
>>> 4L, 4L, 3L), .Label = c("<1", "<10", "<100", "<1000", "<200",
>>> "<5", "<500", "0.1", "0.13", "0.5", "0.8", "1.07", "1.1", "1.4",
>>> "1.5", "137", "154", "163", "165", "169", "178", "2.3", "2.4",
>>> "22", "24", "244", "27.2", "274", "3", "3.1", "40.2", "43", "50",
>>> "516", "53.3", "550", "569", "65", "66.1", "68", "7.6", "72",
>>> "77", "89", "951"), class = "factor")), .Names = c("Parameter",
>>> "Cedar.Creek"), row.names = c(NA, 19L), class = "data.frame")
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.