[R] Sanity check in loading large dataframe
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Thu Aug 5 15:40:51 CEST 2021
On 05/08/2021 9:16 a.m., Luigi Marongiu wrote:
> Hello,
> I am using a large spreadsheet (over 600 variables).
> I tried `str` to check the dimensions of the spreadsheet and I got
> ```
>> (str(df))
> 'data.frame': 302 obs. of 626 variables:
> $ record_id : int 1 1 1 1 1 1 1 1 1 1 ...
> ....
> $ v1_medicamento___aceta : int 1 NA NA NA NA NA NA NA NA NA ...
> [list output truncated]
> NULL
> ```
> I understand that `[list output truncated]` means that there are more
> variables than those allowed by str to be displayed as rows. Thus I
> increased the row's output with:
> ```
>
>> (str(df, list.len=1000))
> 'data.frame': 302 obs. of 626 variables:
> $ record_id : int 1 1 1 1 1 1 1 1 1 1 ...
> ...
> NULL
> ```
>
> Does `NULL` mean that some of the variables are not closed? (perhaps a
> missing comma somewhere)
> Is there a way to check the sanity of the data and avoid that some
> separator is not in the right place?
> Thank you
The NULL is the value returned by str(). Normally it is not printed,
but when you wrap str in parens as (str(df, list.len=1000)), that forces
the value to print.
str() is unusual in R functions in that it prints to the console as it
runs and returns nothing. Many other functions construct a value which
is only displayed if you print it, but something like
x <- str(df, list.len=1000)
will print the same as if there was no assignment, and then assign NULL
to x.
Duncan Murdoch
More information about the R-help
mailing list