[R] Data import R: some explanatory variables not showing up correctly in summary

David Winsemius dwinsemius at comcast.net
Thu Jun 1 18:17:27 CEST 2017


> On Jun 1, 2017, at 8:57 AM, William Dunlap via R-help <r-help at r-project.org> wrote:
> 
> Check for leading or trailing spaces in the strings in your data.
> dput(dataset) would show them.

This function would strip any leading or trailing spaces from a column:

trim <-
   function (s) 
        {
    s <- as.character(s)
    s <- sub(pattern = "^[[:blank:]]+", replacement = "", x = s)
    s <- sub(pattern = "[[:blank:]]+$", replacement = "", x = s)
    s
         }

You could restrict it to non-mumeric columns with:

my_dfrm[ !sapply(my_dfrm, is.numeric) ] <- lapply( my_dfrm[ !sapply(my_dfrm, is.numeric) ], trim)

It would have the side-effect, (desirable in my opinion but opinions do vary on this matter), of converting any factor columns to character-class.



> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Thu, Jun 1, 2017 at 8:49 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> wrote:
> 
>> Hi Tara,
>> 
>> It seems that you categorise and count for each category. Could it be that
>> the method you use puts everything that doesn't match the predefined
>> categories in Other?
>> 
>> I'm only guessing because without a minimal  reproducible example it's
>> difficult to do anything else.
>> 
>> Best wishes
>> Ulrik
>> 
>> Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30:
>> 
>>> Hello,
>>> 
>>> In order for us to help we need to know how you've imported your data.
>>> What was the file type? What instructions have you used to import it?
>>> Did you use base R or a package?
>>> Give us a minimal but complete code example that can reproduce your
>>> situation.
>>> 
>>> Hope this helps,
>>> 
>>> Rui Barradas
>>> 
>>> Em 01-06-2017 11:02, Tara Adcock escreveu:
>>>> Hi,
>>>> 
>>>> I have a question regarding data importing into R.
>>>> 
>>>> When I import my data into R and review the summary, some of my
>>> explanatory variables are being reported as if instead of being one
>>> variable, they are two with the same name. See below for an example;
>>>> 
>>>>    Behav person         Behav dog               Position
>>>>   **combination  : 38   combination  :  4**     Bank    :372
>>>>   **combination  :  7   combination  :  4**   **Island  :119**
>>>>     fast         :123   fast         : 15     **Island  : 11**
>>>>     slow         :445   slow         : 95       Land    :  3
>>>>     stat         :111   stat         : 14       Water   :230
>>>> 
>>>> Also, all of the distances I have imported are showing up in the
>> summary
>>> along with a line entitled "other". However, I haven't used any other
>>> distances?
>>>> 
>>>>    Distance        Distance.dog
>>>>    2-10m  :184     <50m   : 35
>>>>    <50m   :156     2-10m  : 27
>>>>    10-20m :156     20-30m : 23
>>>>    20-30m : 91     30-40m : 16
>>>>    40-50m : 57     10-20m : 13
>>>>    **(Other): 82   (Other): 18**
>>>> 
>>>> I have checked my data sheet over and over again and I think
>>> standardised the data, but the issue keeps arising. I'm assuming I need
>> to
>>> clean the data set but as a nearly complete novice in R I am not certain
>>> how to do this. Any help at all with this would be much appreciated.
>> Thanks
>>> so much.
>>>> 
>>>> Kind Regards,
>>>> 
>>>> Tara Adcock.
>>>> 
>>>> 
>>>>      [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list