[R] Plot of a subset of a data.frame()

David Winsemius dwinsemius at comcast.net
Tue Jul 27 00:48:52 CEST 2010


On Jul 26, 2010, at 10:56 AM, Steffen Uhlig wrote:

> Dear David, Petr, and Alain,
>
> thank you very much for your fast responses. It's a typical  
> "handbook-not-read-error" at my side. I will dig deeper into the  
> plot-functions and the assignment of data. I was not aware of that  
> the vector "a" is handled as a vector of factors with 10 levels.  
> Thanks for your suggestions and hints!

You can prevent that behavior and instead get a character vector ...  
at least from functions that return such ... by using stringsAsFactors  
= FALSE within the data.frame call. You also have the option of  
setting that globally which at least one well known institution has  
adopted as the default policy for its work.

?data.frame
?options

-- 
David
>
> Best regards,
> /steffen
>
>
> Am 26.07.2010 14:30, schrieb David Winsemius:
>>
>> On Jul 26, 2010, at 7:38 AM, Steffen Uhlig wrote:
>>
>>> Hello,
>>>
>>> my data.frame is sort of a collection of process values, i.e. huge
>>> run-chart. It consists of a time-stamp in the first column (date as
>>> string), factors in the following columns (used for subset- 
>>> filtering),
>>> and some process-data columns.
>>> Hereafter, two examples are listed, showing the problems that occour
>>> during print:
>>>
>>> At first the example, that works fine:
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> a = c(1:10) # create a vector of integers
>>> b = rep(c("a","b"),5) # create a vector of chars, used
>>> # as factor-levels
>>> d = rnorm(10) # some random numbers
>>> e = data.frame(a,b,d) # connect to a data.frame
>>
>> You've gotten several answers, but none have addressed an aspect of R
>> behavior that took me longer to appreciate than it perhaps should  
>> have.
>> The "b" column inside the "e" data.frame is now a factor column. I
>> mention that because you later referred to it as a "string" which  
>> it is
>> not. It is an integer with an associated indexed level character  
>> vector.
>> Many of the functions that you might think would "work" on "strings"
>> will give either errors or unexpected results when applied to  
>> factors.
>>
>>
>>>
>>> e.1 = subset(e, b=="a") # create two subsets
>>> e.2 = subset(e, b=="b")
>>> plot(d~a, e.1, pch=3, col=2) # plot first data-subset
>>> points(d~a, e.2, pch=4, col=3) # plot the 2nd one
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> all looks fine in theses plots.
>>>
>>>
>>> However, changing the content of vector "a" to a set of strings the
>>> following happens:
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> a = c("a","b","c","d","e","f","g","h","i","j")
>>> e = data.frame(a,b,d) # re-build data.frame
>>>
>>> e.1 = subset(e, b=="a") # create two subsets
>>> e.2 = subset(e, b=="b")
>>> plot(d~a, e.1, pch=3, col=2)
>>> points(d~a, e.2, pch=4, col=3)
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> The plot-command produces horizontal lines instead of dots. This  
>>> seems
>>> to happen when the x-axis contains strings rather than numbers. is
>>> there a way out?
>>>
>>> Best regards,
>>> /Steffen
>
>
> -- 
> Steffen Uhlig, PhD
> Mechatronik und Sensortechnik
> HTW des Saarlandes
> Goebenstraße 40
> 66117 Saarbrücken
>
> Tel.: +49 (0) 681 58 67 274

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list