[R] When creating a data frame with data.frame() transforms "integers" into "factors"

António Brito Camacho toinobc at gmail.com
Sun May 26 17:34:02 CEST 2013


Hello Bert.

I didn't reply to the list because i forgot. I hit reply instead of reply all....

Thanks for your example.
I understood now that i was trying to do something that didn't made sense and that was why it failed.
I should have used an histogram do do a graph of the frequency of each number of 'posts' instead of going the convoluted way around and trying to do a scatterplot.
I now understand that table() transforms each value of the variable into a "factor" and counts how many times it shows up. It makes sense that these "factors" are then tranformed into "character" when in the data frame, because they are not a quantity, but the representation of the number.

Thanks for the help. Problem solved.

António Brito Camacho


No dia 26/05/2013, às 15:00, Bert Gunter <gunter.berton at gene.com> escreveu:

> 1. Please always cc. the list; do not reply just to me.
> 
> 2.  OK, I see. I ERRED. Had you cc'ed the list, someone might have
> pointed this out. The correct example reproduces what you saw.
> 
> z<- sample(1:10,30,rep=TRUE)
> table(z)
> w <- data.frame(table(z))
> w
> 
>     z  Freq
> 1   1    2
> 2   2    3
> 3   3    1
> 4   4    3
> 5   5    5
> 6   6    3
> 7   7    5
> 8   8    4
> 9   9    1
> 10 10    3
> 
>> sapply(w,class)
>        z      Freq
> "factor" "integer"
> 
> This is exactly what is expected and documented.  See ?table. So the
> question is: What do you expect?  table() produces an array whose
> cross-classifying factors are the dimensions. data.frame converts this
> into a data frame. Perhaps the following will help clarify:
> 
>> z <- data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),
>      fac2 = sample(c("j","k"),10,rep=TRUE))
>> z
>   fac1 fac2
> 1     A    k
> 2     B    k
> 3     C    k
> 4     C    k
> 5     B    k
> 6     C    k
> 7     C    k
> 8     A    j
> 9     A    j
> 10    C    j
> 
>> table(z)
> 
>    fac2
> fac1 j k
>   A 2 1
>   B 0 2
>   C 1 4
> 
>> data.frame(table(z))
> 
>  fac1 fac2 Freq
> 1    A    j    2
> 2    B    j    0
> 3    C    j    1
> 4    A    k    1
> 5    B    k    2
> 6    C    k    4
> 
>> table(z['fac1'])
> 
> A B C
> 3 2 5
> 
>> data.frame(table(z['fac1']))
>  Var1 Freq
> 1    A    3
> 2    B    2
> 3    C    5
> 
> Cheers,
> Bert
> 
> On Sat, May 25, 2013 at 6:54 PM, António Camacho <toinobc at gmail.com> wrote:
>> Hello Bert
>> Thanks for your prompt reply.
>> I tried your example and it worked without a problem.
>> 
>> But what i want is to create a data frame from the output of the function
>> table(), so in your example i tried "sapply(data.frame(tbl),class)" and the
>> output was z --> factor and Freq --->integer.
>> What is happening in the table() function that is transforming the integers
>> in z into values with labels ?
>> because when i do "names(tbl)" it returns each value of z as a name....
>> 
>> I read the manual for " [ " but i didn't understand it completely. I have to
>> read the introduction to R more carefully.
>> 
>> I also tried using "[," "[[" and "$" for the extraction of the values from
>> the 'posts' column, but the problem persisted.
>> 
>> Like i said, this code was taken from an example in a webpage. I contacted
>> the author and he confirmed me that the code worked on his machine, that was
>> running R 2.15.1....
>> Maybe something changed between versions in the data.frame() ??
>> 
>> I really don't understant what I am doing wrong.
>> 
>> António
>> 
>> On 2013/05/26, at 01:44, Bert Gunter wrote:
>> 
>>> Huh?
>>> 
>>>> z <- sample(1:10,30,rep=TRUE)
>>>> tbl <- table(z)
>>>> tbl
>>> 
>>> z
>>> 1 2 3 4 5 6 7 8 9 10
>>> 4 3 2 6 3 3 2 2 2 3
>>>> 
>>>> data.frame(z)
>>> 
>>>   z
>>> 1   5
>>> 2   2
>>> 3   4
>>> 4   1
>>> 5   6
>>> 6   4
>>> 7  10
>>> 8   4
>>> 9   3
>>> 10  8
>>> 11 10
>>> 12  4
>>> 13  3
>>> 14  9
>>> 15  2
>>> 16  2
>>> 17  6
>>> 18  1
>>> 19  4
>>> 20  7
>>> 21  9
>>> 22 10
>>> 23  7
>>> 24  5
>>> 25  5
>>> 26  6
>>> 27  8
>>> 28  1
>>> 29  1
>>> 30  4
>>>> 
>>>> sapply(data.frame(z),class)
>>> 
>>>       z
>>> "integer"
>>> 
>>> Your error: you used df['posts']  . You should have used df[,'posts'] .
>>> 
>>> The former is a data frame. The latter is a vector. Read the
>>> "Introduction to R tutorial" or ?"[" if you don't understand why.
>>> 
>>> -- Bert
>>> 
>>> -- Bert
>>> 
>>> On Sat, May 25, 2013 at 12:36 PM, António Camacho <toinobc at gmail.com>
>>> wrote:
>>>> 
>>>> Hello
>>>> 
>>>> 
>>>> I am novice to R and i was learning how to do a scatter plot with R using
>>>> an example from a website.
>>>> 
>>>> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
>>>> without additional packages loaded
>>>> 
>>>> I created a .csv file in vim with  the following content
>>>> userID,user,posts
>>>> 1,user1,581
>>>> 2,user2,281
>>>> 3,user3,196
>>>> 4,user4,150
>>>> 5,user5,282
>>>> 6,user6,184
>>>> 7,user7,90
>>>> 8,user8,74
>>>> 9,user9,45
>>>> 10,user10,20
>>>> 11,user11,3
>>>> 12,user12,1
>>>> 13,user13,345
>>>> 14,user14,123
>>>> 
>>>> i imported the file into R using : ' df <- read.csv('file.csv')
>>>> to confirm the data types i did : 'sappily(df, class) '
>>>> that returns "userID" --> "integer" ; "user" ---> "factor" ; "posts" --->
>>>> "integer"
>>>> then i try to create another data frame with the number of posts and its
>>>> frequencies,
>>>> so i did: 'postFreqCount<-data.frame(table(df['posts']))'
>>>> this gives me the postFreqCount data frame with two columns, one called
>>>> 'Var1' that has the number of posts each user did, and another collumn
>>>> 'Freq' with the frequency of each number of posts.
>>>> the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
>>>> returns "factor".
>>>> So the data.frame() function transformed a variable that was "integer"
>>>> (posts) to a variable (Var1) that has the same values but is "factor".
>>>> I want to know how to prevent this from happening. How do i keep the
>>>> values
>>>> from being transformed from "integer" to "factor" ?
>>>> 
>>>> Thank you for your help
>>>> 
>>>> António
>>>> 
>>>>       [[alternative HTML version deleted]]
>>>> 
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>> 
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>> 
>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>> 
>> 
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list