[R] Sanity check in loading large dataframe
Luigi Marongiu
m@rong|u@|u|g| @end|ng |rom gm@||@com
Fri Aug 6 07:34:05 CEST 2021
Ok, so nothing to worry about. Yet, are there other checks I can implement?
Thank you
On Thu, 5 Aug 2021, 15:40 Duncan Murdoch, <murdoch.duncan using gmail.com> wrote:
> On 05/08/2021 9:16 a.m., Luigi Marongiu wrote:
> > Hello,
> > I am using a large spreadsheet (over 600 variables).
> > I tried `str` to check the dimensions of the spreadsheet and I got
> > ```
> >> (str(df))
> > 'data.frame': 302 obs. of 626 variables:
> > $ record_id : int 1 1 1 1 1 1 1 1 1 1 ...
> > ....
> > $ v1_medicamento___aceta : int 1 NA NA NA NA NA NA NA NA NA ...
> > [list output truncated]
> > NULL
> > ```
> > I understand that `[list output truncated]` means that there are more
> > variables than those allowed by str to be displayed as rows. Thus I
> > increased the row's output with:
> > ```
> >
> >> (str(df, list.len=1000))
> > 'data.frame': 302 obs. of 626 variables:
> > $ record_id : int 1 1 1 1 1 1 1 1 1 1 ...
> > ...
> > NULL
> > ```
> >
> > Does `NULL` mean that some of the variables are not closed? (perhaps a
> > missing comma somewhere)
> > Is there a way to check the sanity of the data and avoid that some
> > separator is not in the right place?
> > Thank you
>
> The NULL is the value returned by str(). Normally it is not printed,
> but when you wrap str in parens as (str(df, list.len=1000)), that forces
> the value to print.
>
> str() is unusual in R functions in that it prints to the console as it
> runs and returns nothing. Many other functions construct a value which
> is only displayed if you print it, but something like
>
> x <- str(df, list.len=1000)
>
> will print the same as if there was no assignment, and then assign NULL
> to x.
>
> Duncan Murdoch
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list