[R-sig-eco] subsetting data in R

Sarah Goslee sarah.goslee at gmail.com
Sun Apr 24 15:51:11 CEST 2011


By default, read.csv() turns character variables into factors, using all the
unique values as the levels.

subset() retains those levels by default, as they are a vital element of the
data. If you are studying some attribute of men and women, say height,
even if you are only looking at the heights for women it's important to remember
that men still exist.

If you don't want influencia to be a factor, you can change that in the import
stringsAsFactors=FALSE.

If you do want influencia to be a factor, but want the unused levels to be
removed, you can use factor() to do that.

> testdata <- data.frame(group=c("A", "B", "C", "A", "B", "C"), value=1:6)
> testdata
  group value
1     A     1
2     B     2
3     C     3
4     A     4
5     B     5
6     C     6
> str(testdata)
'data.frame':	6 obs. of  2 variables:
 $ group: Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3
 $ value: int  1 2 3 4 5 6
> subset(testdata, group=="A")
  group value
1     A     1
4     A     4
> subset(testdata, group=="A")$group
[1] A A
Levels: A B C
> ?subset
> factor(subset(testdata, group=="A")$group)
[1] A A
Levels: A

Sarah

On Sun, Apr 24, 2011 at 9:04 AM, Manuel Spínola <mspinola10 at gmail.com> wrote:
> Dear list members,
>
> I have a question regarding too subsetting a data set in R.
>
> I created an object for my data:
>
>  >pa = read.csv("espec_indic.csv", header = T, sep=",", check.names = F)
>
>  > levels(pa$influencia)
> [1] "AID" "AII" "AP"
>
> The object has 3 levels for influencia (AP, AID, AII)
>
> Now I subset only observations with influencia = "AID"
>
>  >pa2 = subset(pa, influencia=="AID")
>
> but if I ask for the levels of influencia still show me the 3 levels,
> AP, AID, AII.
>
>  > levels(pa2$influencia)
> [1] "AID" "AII" "AP"
>
> Why is that?
>
> I was thinking that I was creating a new data frame with only AID as a
> level for influencia.
>
> How can I make a complete new object with only the observations for
> "AID" and that the only level for influencia is indeed "AID"?
>
> Best,
>
> Manuel
>
>
-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-sig-ecology mailing list