[R] Creating a simple function
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sat Sep 21 16:25:40 CEST 2019
On 21/09/2019 9:05 a.m., Jeff Newmiller wrote:
> Your use of subset instead of select does not help,
Whoops, sorry. Thanks for doing the real check.
Duncan
but a corrected example does indeed confirm your point.
>
> library(dplyr)
>
> str(data.frame(a=c(1,1,2,2), b=1:4) %>% select(b,a))
> ## 'data.frame': 4 obs. of 2 variables:
> ## $ b: int 1 2 3 4
> ## $ a: num 1 1 2 2
>
> However the `[` issue is still worth addressing. If that does not fix the problem then a dput(head(troublesomedata)) from Zachary will be needed to figure out what actually is going on.
>
> On September 21, 2019 5:22:07 AM PDT, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>> On 21/09/2019 7:38 a.m., Jeff Newmiller wrote:
>>> The dplyr::select function returns a special variety of data.frame
>> called a tibble.
>>
>> I don't think that's always true. The docs say it returns "An object
>> of
>> the same class as .data.", and that's what I'm seeing:
>>
>>> str(data.frame(a=c(1,1,2,2), b=1:4) %>% subset(a == 1))
>> 'data.frame': 2 obs. of 2 variables:
>> $ a: num 1 1
>> $ b: int 1 2
>>
>> But I believe there are other dplyr functions that take dataframes as
>> input and return tibbles, I just don't know which ones.
>>
>> Duncan Murdoch
>>
>> The tibble has certain features designed to make it behave consistently
>>
>> when indexing is used. Specifically, the `[` operator always returns a
>> tibble regardless of how many columns are indicated by the column
>> index.
>> This is unlike the conventional data frame which returns a vector when
>> exactly one column is indicated by the column index, or a data.frame if
>>
>> more than one is indicated.
>>>
>>> A syntax that consistently yields a column vector with both tibbles
>> and data.frames is
>>>
>>> dta[[ 1 ]]
>>>
>>> so
>>>
>>> ctab <- function(data) {
>>> CrossTable(data[[1]], data[[2]], prop.chisq = FALSE, prop.c =
>> FALSE,
>>> prop.t = FALSE, format = "SPSS")
>>> }
>>>
>>> should work.
>>>
>>> On September 20, 2019 10:59:46 AM PDT, Duncan Murdoch
>> <murdoch.duncan using gmail.com> wrote:
>>>> On 20/09/2019 11:30 a.m., Zachary Lim wrote:
>>>>> Hi,
>>>>>
>>>>> I'm trying to create a simple function that takes a dataframe as
>> its
>>>> only argument. I've been using gmodels::CrossTable, but it requires
>> a
>>>> lot of arguments, e.g.:
>>>>>
>>>>> #this runs fine
>>>>> CrossTable(data$col1, data$col2, prop.chisq = FALSE, prop.c =
>> FALSE,
>>>> prop.t = FALSE, format = "SPSS")
>>>>>
>>>>> Moreover, I wanted to make it compatible with piping, so I decided
>> to
>>>> create the following function:
>>>>>
>>>>> ctab <- function(data) {
>>>>> CrossTable(data[,1], data[,2], prop.chisq = FALSE, prop.c =
>> FALSE,
>>>> prop.t = FALSE, format = "SPSS")
>>>>> }
>>>>>
>>>>> When I try to use this function, however, I get the following
>> error:
>>>>>
>>>>> #this results in 'Error: Must use a vector in `[`, not an object of
>>>> class matrix.'
>>>>> data %>% select(col1, col2) %>% ctab()
>>>>>
>>>>> I tried searching online but couldn't find much about that error
>>>> (except for in specific and unrelated cases). Moreover, when I
>> created
>>>> a very simple dataset, it turns out there's no problem:
>>>>>
>>>>> #this runs fine
>>>>> data.frame(C1 = c('x','y','x','y'), C2 = c('a','a','b','b')) %>%
>>>> ctab()
>>>>>
>>>>>
>>>>> Is this a problem with my function or the data? If it's the data,
>> why
>>>> does directly calling CrossTable work?
>>>>
>>>> Presumably data %>% select(col1, col2) isn't giving you a
>> dataframe.
>>>> However, you haven't given us a reproducible example, so I can't
>> tell
>>>> you what it's doing. But that's where you should look.
>>>>
>>>> Duncan Murdoch
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
More information about the R-help
mailing list