[R] about data problem
lily li
chocold12 at gmail.com
Wed Sep 21 01:22:41 CEST 2016
Thanks. The former method works. I confused character with factor.
Besides, I should use: dta$DischargeNum <- as.numeric( dta$Discharge )
instead of: dta$Discharge <- as.numeric( dta$Discharge )
On Tue, Sep 20, 2016 at 5:18 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:
> Which means it avoided converting to factor... Success!
>
> Note that the column apparently has garbage characters in one or more of
> the rows, which should be evident when you LOOK AT THE CHARACTERS in the
> column. They should all be numeric symbols, plus or minus, and perhaps
> decimal points. If they are not, then the conversion to numeric will be
> incomplete. See my other message. You have the choice of editing the file
> (may have concerns with traceability), or you can write R code that removes
> the garbage characters using gsub.
> --
> Sent from my phone. Please excuse my brevity.
>
> On September 20, 2016 4:09:02 PM PDT, lily li <chocold12 at gmail.com> wrote:
> >Yes, I tried to add this statement when reading the dataset.
> >But when I use summary(df), it shows:
> >Discharge
> >Length:
> >Class :character
> >Mode :character
> >
> >
> >On Tue, Sep 20, 2016 at 5:06 PM, Joe Ceradini <joeceradini at gmail.com>
> >wrote:
> >
> >> read.csv("your_data.csv", stringsAsFactors=FALSE)
> >> (I'm just reiterating Jianling said...)
> >>
> >> Joe
> >>
> >> On Tue, Sep 20, 2016 at 4:56 PM, lily li <chocold12 at gmail.com> wrote:
> >>
> >>> Is there a function in read.csv that I can use to avoid converting
> >numeric
> >>> to factor? Thanks a lot.
> >>>
> >>>
> >>>
> >>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <chocold12 at gmail.com>
> >wrote:
> >>>
> >>> > Thanks. Then what should I do to solve the problem?
> >>> >
> >>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
> >>> jdnewmil at dcn.davis.ca.us>
> >>> > wrote:
> >>> >
> >>> >> I suppose you can do what works for your data, but I wouldn't
> >recommend
> >>> >> na.rm=TRUE because it hides problems rather than clarifying them.
> >>> >>
> >>> >> If in fact your data includes true NA values (the letters NA or
> >simply
> >>> >> nothing between the commas are typical ways this information may
> >be
> >>> >> indicated), then read.csv will NOT change from integer to factor
> >>> >> (particularly if you have specified which markers represent NA
> >using
> >>> the
> >>> >> na.strings argument documented under read.table)... so you
> >probably DO
> >>> have
> >>> >> unexpected garbage still in your data which could be obscuring
> >valuable
> >>> >> information that could affect your conclusions.
> >>> >> --
> >>> >> Sent from my phone. Please excuse my brevity.
> >>> >>
> >>> >> On September 20, 2016 3:11:42 PM PDT, lily li
> ><chocold12 at gmail.com>
> >>> >> wrote:
> >>> >> >I reread the data, and use 'na.rm = T' when reading the data.
> >This
> >>> time
> >>> >> >it
> >>> >> >has no such problem. It seems that the existence of NAs convert
> >the
> >>> >> >integer
> >>> >> >to factor. Thanks for your help.
> >>> >> >
> >>> >> >
> >>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan
> ><fanjianling at gmail.com>
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Add the "stringsAsFactors = F" when you read the data, and
> >then
> >>> >> >> convert them to numeric.
> >>> >> >>
> >>> >> >> On 20 September 2016 at 16:00, lily li <chocold12 at gmail.com>
> >wrote:
> >>> >> >> > Yes, it is stored as factor. I can't check out any problem
> >in the
> >>> >> >> original
> >>> >> >> > data. Reread data doesn't help either. I use read.csv to
> >read in
> >>> >> >the
> >>> >> >> data,
> >>> >> >> > do you think it is better to use read.table? Thanks again.
> >>> >> >> >
> >>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow
> ><538280 at gmail.com>
> >>> >> >wrote:
> >>> >> >> >
> >>> >> >> >> This indicates that your Discharge column has been
> >>> >> >stored/converted as
> >>> >> >> >> a factor (run str(df) to verify and check other columns).
> >This
> >>> >> >> >> usually happens when functions like read.table are left to
> >try to
> >>> >> >> >> figure out what each column is and it finds something in
> >that
> >>> >> >column
> >>> >> >> >> that cannot be converted to a number (possibly an oh
> >instead of a
> >>> >> >> >> zero, an el instead of a one, or just a letter or
> >punctuation
> >>> mark
> >>> >> >> >> accidentally in the file). You can either find the error
> >in your
> >>> >> >> >> original data, fix it, and reread the data, or specify that
> >the
> >>> >> >column
> >>> >> >> >> should be numeric using the colClasses argument to
> >read.table or
> >>> >> >other
> >>> >> >> >> function.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li
> ><chocold12 at gmail.com>
> >>> >> >wrote:
> >>> >> >> >> > Hi R users,
> >>> >> >> >> >
> >>> >> >> >> > I have a problem in reading data.
> >>> >> >> >> > For example, part of my dataframe is like this:
> >>> >> >> >> >
> >>> >> >> >> > df
> >>> >> >> >> > month day year Discharge
> >>> >> >> >> > 3 1 2010 6.4
> >>> >> >> >> > 3 2 2010 7.58
> >>> >> >> >> > 3 3 2010 6.82
> >>> >> >> >> > 3 4 2010 8.63
> >>> >> >> >> > 3 5 2010 8.16
> >>> >> >> >> > 3 6 2010 7.58
> >>> >> >> >> >
> >>> >> >> >> > Then if I type summary(df), why it converts the discharge
> >data
> >>> >> >to
> >>> >> >> >> levels? I
> >>> >> >> >> > also met the same problem when reading some other csv
> >files.
> >>> How
> >>> >> >to
> >>> >> >> solve
> >>> >> >> >> > this problem? Thanks.
> >>> >> >> >> >
> >>> >> >> >> > Discharge
> >>> >> >> >> > 7.58 :2
> >>> >> >> >> > 6.4 :1
> >>> >> >> >> > 6.82 :1
> >>> >> >> >> > 8.63 :1
> >>> >> >> >> > 8.16 :1
> >>> >> >> >> >
> >>> >> >> >> > [[alternative HTML version deleted]]
> >>> >> >> >> >
> >>> >> >> >> > ______________________________________________
> >>> >> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> >more,
> >>> >> >see
> >>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> >> >> > PLEASE do read the posting guide
> >http://www.R-project.org/
> >>> >> >> >> posting-guide.html
> >>> >> >> >> > and provide commented, minimal, self-contained,
> >reproducible
> >>> >> >code.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Gregory (Greg) L. Snow Ph.D.
> >>> >> >> >> 538280 at gmail.com
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> > [[alternative HTML version deleted]]
> >>> >> >> >
> >>> >> >> > ______________________________________________
> >>> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> >more, see
> >>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
> >>> >> >> posting-guide.html
> >>> >> >> > and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Jianling Fan
> >>> >> >> 樊建凌
> >>> >> >>
> >>> >> >
> >>> >> > [[alternative HTML version deleted]]
> >>> >> >
> >>> >> >______________________________________________
> >>> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> >see
> >>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >> >PLEASE do read the posting guide
> >>> >> >http://www.R-project.org/posting-guide.html
> >>> >> >and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >>
> >>> >>
> >>> >
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posti
> >>> ng-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >>
> >> --
> >> Cooperative Fish and Wildlife Research Unit
> >> Zoology and Physiology Dept.
> >> University of Wyoming
> >> JoeCeradini at gmail.com / 914.707.8506
> >> wyocoopunit.org
> >>
> >>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list