[R] about data problem

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Sep 21 02:06:04 CEST 2016


You can use the latter IF you know there are no problems with the input data. If you need to troubleshoot then you need separate columns so you can compare them. 
-- 
Sent from my phone. Please excuse my brevity.

On September 20, 2016 4:22:41 PM PDT, lily li <chocold12 at gmail.com> wrote:
>Thanks. The former method works. I confused character with factor.
>
>Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge )
>instead of: dta$Discharge <- as.numeric( dta$Discharge )
>
>
>On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us>
>wrote:
>
>> Which means it avoided converting to factor... Success!
>>
>> Note that the column apparently has garbage characters in one or more
>of
>> the rows, which should be evident when you LOOK AT THE CHARACTERS in
>the
>> column. They should all be numeric symbols, plus or minus, and
>perhaps
>> decimal points. If they are not, then the conversion to numeric will
>be
>> incomplete. See my other message. You have the choice of editing the
>file
>> (may have concerns with traceability), or you can write R code that
>removes
>> the garbage characters using gsub.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 20, 2016 4:09:02 PM PDT, lily li <chocold12 at gmail.com>
>wrote:
>> >Yes, I tried to add this statement when reading the dataset.
>> >But when I use summary(df), it shows:
>> >Discharge
>> >Length:
>> >Class  :character
>> >Mode  :character
>> >
>> >
>> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini
><joeceradini at gmail.com>
>> >wrote:
>> >
>> >> read.csv("your_data.csv", stringsAsFactors=FALSE)
>> >> (I'm just reiterating Jianling said...)
>> >>
>> >> Joe
>> >>
>> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li <chocold12 at gmail.com>
>wrote:
>> >>
>> >>> Is there a function in read.csv that I can use to avoid
>converting
>> >numeric
>> >>> to factor? Thanks a lot.
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <chocold12 at gmail.com>
>> >wrote:
>> >>>
>> >>> > Thanks. Then what should I do to solve the problem?
>> >>> >
>> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>> >>> jdnewmil at dcn.davis.ca.us>
>> >>> > wrote:
>> >>> >
>> >>> >> I suppose you can do what works for your data, but I wouldn't
>> >recommend
>> >>> >> na.rm=TRUE because it hides problems rather than clarifying
>them.
>> >>> >>
>> >>> >> If in fact your data includes true NA values (the letters NA
>or
>> >simply
>> >>> >> nothing between the commas are typical ways this information
>may
>> >be
>> >>> >> indicated), then read.csv will NOT change from integer to
>factor
>> >>> >> (particularly if you have specified which markers represent NA
>> >using
>> >>> the
>> >>> >> na.strings argument documented under read.table)... so you
>> >probably DO
>> >>> have
>> >>> >> unexpected garbage still in your data which could be obscuring
>> >valuable
>> >>> >> information that could affect your conclusions.
>> >>> >> --
>> >>> >> Sent from my phone. Please excuse my brevity.
>> >>> >>
>> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li
>> ><chocold12 at gmail.com>
>> >>> >> wrote:
>> >>> >> >I reread the data, and use 'na.rm = T' when reading the data.
>> >This
>> >>> time
>> >>> >> >it
>> >>> >> >has no such problem. It seems that the existence of NAs
>convert
>> >the
>> >>> >> >integer
>> >>> >> >to factor. Thanks for your help.
>> >>> >> >
>> >>> >> >
>> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
>> ><fanjianling at gmail.com>
>> >>> >> >wrote:
>> >>> >> >
>> >>> >> >> Add the "stringsAsFactors = F"  when you read the data, and
>> >then
>> >>> >> >> convert them to numeric.
>> >>> >> >>
>> >>> >> >> On 20 September 2016 at 16:00, lily li
><chocold12 at gmail.com>
>> >wrote:
>> >>> >> >> > Yes, it is stored as factor. I can't check out any
>problem
>> >in the
>> >>> >> >> original
>> >>> >> >> > data. Reread data doesn't help either. I use read.csv to
>> >read in
>> >>> >> >the
>> >>> >> >> data,
>> >>> >> >> > do you think it is better to use read.table? Thanks
>again.
>> >>> >> >> >
>> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
>> ><538280 at gmail.com>
>> >>> >> >wrote:
>> >>> >> >> >
>> >>> >> >> >> This indicates that your Discharge column has been
>> >>> >> >stored/converted as
>> >>> >> >> >> a factor (run str(df) to verify and check other
>columns).
>> >This
>> >>> >> >> >> usually happens when functions like read.table are left
>to
>> >try to
>> >>> >> >> >> figure out what each column is and it finds something in
>> >that
>> >>> >> >column
>> >>> >> >> >> that cannot be converted to a number (possibly an oh
>> >instead of a
>> >>> >> >> >> zero, an el instead of a one, or just a letter or
>> >punctuation
>> >>> mark
>> >>> >> >> >> accidentally in the file).  You can either find the
>error
>> >in your
>> >>> >> >> >> original data, fix it, and reread the data, or specify
>that
>> >the
>> >>> >> >column
>> >>> >> >> >> should be numeric using the colClasses argument to
>> >read.table or
>> >>> >> >other
>> >>> >> >> >> function.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
>> ><chocold12 at gmail.com>
>> >>> >> >wrote:
>> >>> >> >> >> > Hi R users,
>> >>> >> >> >> >
>> >>> >> >> >> > I have a problem in reading data.
>> >>> >> >> >> > For example, part of my dataframe is like this:
>> >>> >> >> >> >
>> >>> >> >> >> > df
>> >>> >> >> >> > month day year          Discharge
>> >>> >> >> >> >    3        1   2010                6.4
>> >>> >> >> >> >    3        2   2010               7.58
>> >>> >> >> >> >    3        3   2010               6.82
>> >>> >> >> >> >    3        4   2010               8.63
>> >>> >> >> >> >    3        5   2010               8.16
>> >>> >> >> >> >    3        6   2010               7.58
>> >>> >> >> >> >
>> >>> >> >> >> > Then if I type summary(df), why it converts the
>discharge
>> >data
>> >>> >> >to
>> >>> >> >> >> levels? I
>> >>> >> >> >> > also met the same problem when reading some other csv
>> >files.
>> >>> How
>> >>> >> >to
>> >>> >> >> solve
>> >>> >> >> >> > this problem? Thanks.
>> >>> >> >> >> >
>> >>> >> >> >> > Discharge
>> >>> >> >> >> > 7.58     :2
>> >>> >> >> >> > 6.4       :1
>> >>> >> >> >> > 6.82     :1
>> >>> >> >> >> > 8.63     :1
>> >>> >> >> >> > 8.16     :1
>> >>> >> >> >> >
>> >>> >> >> >> >         [[alternative HTML version deleted]]
>> >>> >> >> >> >
>> >>> >> >> >> > ______________________________________________
>> >>> >> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE
>and
>> >more,
>> >>> >> >see
>> >>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >> >> >> > PLEASE do read the posting guide
>> >http://www.R-project.org/
>> >>> >> >> >> posting-guide.html
>> >>> >> >> >> > and provide commented, minimal, self-contained,
>> >reproducible
>> >>> >> >code.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> --
>> >>> >> >> >> Gregory (Greg) L. Snow Ph.D.
>> >>> >> >> >> 538280 at gmail.com
>> >>> >> >> >>
>> >>> >> >> >
>> >>> >> >> >         [[alternative HTML version deleted]]
>> >>> >> >> >
>> >>> >> >> > ______________________________________________
>> >>> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
>> >more, see
>> >>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >> >> > PLEASE do read the posting guide
>http://www.R-project.org/
>> >>> >> >> posting-guide.html
>> >>> >> >> > and provide commented, minimal, self-contained,
>reproducible
>> >code.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> Jianling Fan
>> >>> >> >> 樊建凌
>> >>> >> >>
>> >>> >> >
>> >>> >> >       [[alternative HTML version deleted]]
>> >>> >> >
>> >>> >> >______________________________________________
>> >>> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >see
>> >>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> >> >PLEASE do read the posting guide
>> >>> >> >http://www.R-project.org/posting-guide.html
>> >>> >> >and provide commented, minimal, self-contained, reproducible
>> >code.
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide http://www.R-project.org/posti
>> >>> ng-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible
>code.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Cooperative Fish and Wildlife Research Unit
>> >> Zoology and Physiology Dept.
>> >> University of Wyoming
>> >> JoeCeradini at gmail.com / 914.707.8506
>> >> wyocoopunit.org
>> >>
>> >>
>>
>>



More information about the R-help mailing list