[R] Sanity check in loading large dataframe

Luigi Marongiu m@rong|u@|u|g| @end|ng |rom gm@||@com
Fri Aug 6 07:34:05 CEST 2021


Ok, so nothing to worry about. Yet, are there other checks I can implement?
Thank you

On Thu, 5 Aug 2021, 15:40 Duncan Murdoch, <murdoch.duncan using gmail.com> wrote:

> On 05/08/2021 9:16 a.m., Luigi Marongiu wrote:
>  > Hello,
>  > I am using a large spreadsheet (over 600 variables).
>  > I tried `str` to check the dimensions of the spreadsheet and I got
>  > ```
>  >> (str(df))
>  > 'data.frame': 302 obs. of  626 variables:
>  >   $ record_id                 : int  1 1 1 1 1 1 1 1 1 1 ...
>  > ....
>  > $ v1_medicamento___aceta    : int  1 NA NA NA NA NA NA NA NA NA ...
>  >    [list output truncated]
>  > NULL
>  > ```
>  > I understand that `[list output truncated]` means that there are more
>  > variables than those allowed by str to be displayed as rows. Thus I
>  > increased the row's output with:
>  > ```
>  >
>  >> (str(df, list.len=1000))
>  > 'data.frame': 302 obs. of  626 variables:
>  >   $ record_id                 : int  1 1 1 1 1 1 1 1 1 1 ...
>  > ...
>  > NULL
>  > ```
>  >
>  > Does `NULL` mean that some of the variables are not closed? (perhaps a
>  > missing comma somewhere)
>  > Is there a way to check the sanity of the data and avoid that some
>  > separator is not in the right place?
>  > Thank you
>
> The NULL is the value returned by str().  Normally it is not printed,
> but when you wrap str in parens as (str(df, list.len=1000)), that forces
> the value to print.
>
> str() is unusual in R functions in that it prints to the console as it
> runs and returns nothing.  Many other functions construct a value which
> is only displayed if you print it, but something like
>
> x <- str(df, list.len=1000)
>
> will print the same as if there was no assignment, and then assign NULL
> to x.
>
> Duncan Murdoch
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list