[R] When creating a data frame with data.frame() transforms "integers" into "factors"

Bert Gunter gunter.berton at gene.com
Sun May 26 16:00:23 CEST 2013


1. Please always cc. the list; do not reply just to me.

2.  OK, I see. I ERRED. Had you cc'ed the list, someone might have
pointed this out. The correct example reproduces what you saw.

z<- sample(1:10,30,rep=TRUE)
table(z)
w <- data.frame(table(z))
w

     z  Freq
1   1    2
2   2    3
3   3    1
4   4    3
5   5    5
6   6    3
7   7    5
8   8    4
9   9    1
10 10    3

> sapply(w,class)
        z      Freq
 "factor" "integer"

This is exactly what is expected and documented.  See ?table. So the
question is: What do you expect?  table() produces an array whose
cross-classifying factors are the dimensions. data.frame converts this
into a data frame. Perhaps the following will help clarify:

> z <- data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),
      fac2 = sample(c("j","k"),10,rep=TRUE))
> z
   fac1 fac2
1     A    k
2     B    k
3     C    k
4     C    k
5     B    k
6     C    k
7     C    k
8     A    j
9     A    j
10    C    j

> table(z)

    fac2
fac1 j k
   A 2 1
   B 0 2
   C 1 4

> data.frame(table(z))

  fac1 fac2 Freq
1    A    j    2
2    B    j    0
3    C    j    1
4    A    k    1
5    B    k    2
6    C    k    4

> table(z['fac1'])

A B C
3 2 5

> data.frame(table(z['fac1']))
  Var1 Freq
1    A    3
2    B    2
3    C    5

Cheers,
Bert

On Sat, May 25, 2013 at 6:54 PM, António Camacho <toinobc at gmail.com> wrote:
> Hello Bert
> Thanks for your prompt reply.
> I tried your example and it worked without a problem.
>
> But what i want is to create a data frame from the output of the function
> table(), so in your example i tried "sapply(data.frame(tbl),class)" and the
> output was z --> factor and Freq --->integer.
> What is happening in the table() function that is transforming the integers
> in z into values with labels ?
> because when i do "names(tbl)" it returns each value of z as a name....
>
> I read the manual for " [ " but i didn't understand it completely. I have to
> read the introduction to R more carefully.
>
> I also tried using "[," "[[" and "$" for the extraction of the values from
> the 'posts' column, but the problem persisted.
>
> Like i said, this code was taken from an example in a webpage. I contacted
> the author and he confirmed me that the code worked on his machine, that was
> running R 2.15.1....
> Maybe something changed between versions in the data.frame() ??
>
> I really don't understant what I am doing wrong.
>
> António
>
> On 2013/05/26, at 01:44, Bert Gunter wrote:
>
>> Huh?
>>
>>> z <- sample(1:10,30,rep=TRUE)
>>> tbl <- table(z)
>>> tbl
>>
>> z
>> 1 2 3 4 5 6 7 8 9 10
>> 4 3 2 6 3 3 2 2 2 3
>>>
>>> data.frame(z)
>>
>>    z
>> 1   5
>> 2   2
>> 3   4
>> 4   1
>> 5   6
>> 6   4
>> 7  10
>> 8   4
>> 9   3
>> 10  8
>> 11 10
>> 12  4
>> 13  3
>> 14  9
>> 15  2
>> 16  2
>> 17  6
>> 18  1
>> 19  4
>> 20  7
>> 21  9
>> 22 10
>> 23  7
>> 24  5
>> 25  5
>> 26  6
>> 27  8
>> 28  1
>> 29  1
>> 30  4
>>>
>>> sapply(data.frame(z),class)
>>
>>        z
>> "integer"
>>
>> Your error: you used df['posts']  . You should have used df[,'posts'] .
>>
>> The former is a data frame. The latter is a vector. Read the
>> "Introduction to R tutorial" or ?"[" if you don't understand why.
>>
>> -- Bert
>>
>> -- Bert
>>
>> On Sat, May 25, 2013 at 12:36 PM, António Camacho <toinobc at gmail.com>
>> wrote:
>>>
>>> Hello
>>>
>>>
>>> I am novice to R and i was learning how to do a scatter plot with R using
>>> an example from a website.
>>>
>>> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
>>> without additional packages loaded
>>>
>>> I created a .csv file in vim with  the following content
>>> userID,user,posts
>>> 1,user1,581
>>> 2,user2,281
>>> 3,user3,196
>>> 4,user4,150
>>> 5,user5,282
>>> 6,user6,184
>>> 7,user7,90
>>> 8,user8,74
>>> 9,user9,45
>>> 10,user10,20
>>> 11,user11,3
>>> 12,user12,1
>>> 13,user13,345
>>> 14,user14,123
>>>
>>> i imported the file into R using : ' df <- read.csv('file.csv')
>>> to confirm the data types i did : 'sappily(df, class) '
>>> that returns "userID" --> "integer" ; "user" ---> "factor" ; "posts" --->
>>> "integer"
>>> then i try to create another data frame with the number of posts and its
>>> frequencies,
>>> so i did: 'postFreqCount<-data.frame(table(df['posts']))'
>>> this gives me the postFreqCount data frame with two columns, one called
>>> 'Var1' that has the number of posts each user did, and another collumn
>>> 'Freq' with the frequency of each number of posts.
>>> the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
>>> returns "factor".
>>> So the data.frame() function transformed a variable that was "integer"
>>> (posts) to a variable (Var1) that has the same values but is "factor".
>>> I want to know how to prevent this from happening. How do i keep the
>>> values
>>> from being transformed from "integer" to "factor" ?
>>>
>>> Thank you for your help
>>>
>>> António
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list