[R-sig-eco] Subset by family name?

hadley wickham h.wickham at gmail.com
Sat Nov 29 19:45:56 CET 2008


Or even easier:

options(stringsAsFactors = FALSE)

You'll never look bet

Hadley

On Sat, Nov 29, 2008 at 10:38 AM, Peter Solymos <solymos at ualberta.ca> wrote:
> Hi All,
>
> maybe a more transparent solution for the zombie factor problem
> (dropping unused factor levels) for data frames is (note, this applies
> for all factors in the data frame x):
>
> x[] <- lapply(x, function(x) x[drop = TRUE])
>
> As I recall, on the help page of factor(), there is a slight warning
> against character or numeric coercion of factors.
>
> Cheers,
>
> Peter
>
>
> On Sat, Nov 29, 2008 at 9:25 AM,  <byrnes at msi.ucsb.edu> wrote:
>> This can still be a problem after subsetting with zombie factors hanging
>> around.  It's particularly annoying when your boxplotting from a subset,
>> as you'll have a bunch of empty entries in the plot.  I have a function I
>> call purgef that deals with eliminating levels of a factor that I have
>> subsetted out.
>>
>> purgef<-function(x){
>>  x<-as.character(x)
>>  x<-as.factor(x)
>>  return(x)
>> }
>>
>> Gets rid of those pesky zombie levels.
>>
>> In your case
>>
>> Yut_are$Mark<-purgef(Yut_are$Mark)
>>
>>> Sorry to bother everyone---I realized I should have used "==" instead
>>> of "=" in the subset syntax!
>>>
>>>
>>> Quoting Ophelia Wang <opheliawang at mail.utexas.edu>:
>>>
>>>> Hi all,
>>>>
>>>> I thought this should be very simple, but I'm not sure where the
>>>> problem is. I have a .txt data file that contains X and Y coordinates
>>>> of trees and their family names:
>>>>
>>>> "X"  "Y"     "Mark"
>>>> 0    28      "Sapotaceae"
>>>> 1    30      "Meliaceae"
>>>> 1    40      "Meliaceae"
>>>> 1    60      "Mimosaceae"
>>>> 1    76      "Olacaceae"
>>>> 1.5  73      "Myristicaceae"
>>>> 2    34      "Euphorbiaceae"
>>>> 2    62      "Olacaceae"
>>>> 2    86      "Mimosaceae"
>>>> 2.5  36      "Arecaceae"
>>>> 3    22      "Nyctaginaceae"
>>>> 3    25      "Moraceae"
>>>> 3    38      "Rubiaceae"
>>>> 3    47      "Desconocido "
>>>> 3    99      "Mimosaceae"
>>>> 3.5  24      "Anacardiaceae"
>>>> 3.5  57      "Sapotaceae"
>>>> 4    1       "Lecythidaceae"
>>>>
>>>> Now I just want to work on one family for various spatial analyses in
>>>> ads and spatstats, so I wrote:
>>>>
>>>> Yut <-read.delim(
>>>> "C:/dissertation/data2006/Parcela_1-3/Yutsun_tree.txt", header = TRUE,
>>>> sep = "\t", quote="\"", dec=".", fill = TRUE )
>>>>
>>>> Yut_are <- subset (Yut, Mark="Arecaceae", select=c(X, Y, Mark))
>>>>
>>>> However, the summary of Yut_are still contains trees of other families:
>>>>
>>>>   X                Y                    Mark
>>>>  Min.   :  0.00   Min.   : 0.00   Myristicaceae: 65
>>>>  1st Qu.: 24.00   1st Qu.:24.00   Lecythidaceae: 60
>>>>  Median : 46.00   Median :51.00   Sapotaceae   : 51
>>>>  Mean   : 48.07   Mean   :49.72   Moraceae     : 45
>>>>  3rd Qu.: 72.50   3rd Qu.:75.50   Arecaceae    : 41
>>>>  Max.   :100.00   Max.   :99.00   Mimosaceae   : 34
>>>>                                   (Other)      :313
>>>>
>>>> Please tell me how do I subset a dataset like this to extract trees
>>>> from only one or a few families? Thanks a lot!
>>>>
>>>> Ophelia
>>>>
>>>
>>
>> _______________________________________________
>> R-sig-ecology mailing list
>> R-sig-ecology at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>>
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
http://had.co.nz/



More information about the R-sig-ecology mailing list