[R] about data problem
Martin Maechler
maechler at stat.math.ethz.ch
Wed Sep 21 13:01:20 CEST 2016
>>>>> Joe Ceradini <joeceradini at gmail.com>
>>>>> on Tue, 20 Sep 2016 17:06:17 -0600 writes:
> read.csv("your_data.csv", stringsAsFactors=FALSE)
> (I'm just reiterating Jianling said...)
If you do not have very many columns, and want to become more
efficient and knowledgeable,
I strongly recommend alternatively to use the 'colClasses' argument
to read.csv or read.table (they are the same apart from defaults
for arguments!) and set "numeric" for numeric columns.
This has a similar effect to the *combination* of
1) stringsAsFactors = FALSE
2) foo <- as.numeric(foo) # for respective columns
Martin
> Joe
> On Tue, Sep 20, 2016 at 4:56 PM, lily li <chocold12 at gmail.com> wrote:
>> Is there a function in read.csv that I can use to avoid converting numeric
>> to factor? Thanks a lot.
>>
>>
>>
>> On Tue, Sep 20, 2016 at 4:42 PM, lily li <chocold12 at gmail.com> wrote:
>>
>> > Thanks. Then what should I do to solve the problem?
>> >
>> > On Tue, Sep 20, 2016 at 4:30 PM, Jeff Newmiller <
>> jdnewmil at dcn.davis.ca.us>
>> > wrote:
>> >
>> >> I suppose you can do what works for your data, but I wouldn't recommend
>> >> na.rm=TRUE because it hides problems rather than clarifying them.
>> >>
>> >> If in fact your data includes true NA values (the letters NA or simply
>> >> nothing between the commas are typical ways this information may be
>> >> indicated), then read.csv will NOT change from integer to factor
>> >> (particularly if you have specified which markers represent NA using the
>> >> na.strings argument documented under read.table)... so you probably DO
>> have
>> >> unexpected garbage still in your data which could be obscuring valuable
>> >> information that could affect your conclusions.
>> >> --
>> >> Sent from my phone. Please excuse my brevity.
>> >>
>> >> On September 20, 2016 3:11:42 PM PDT, lily li <chocold12 at gmail.com>
>> >> wrote:
>> >> >I reread the data, and use 'na.rm = T' when reading the data. This time
>> >> >it
>> >> >has no such problem. It seems that the existence of NAs convert the
>> >> >integer
>> >> >to factor. Thanks for your help.
>> >> >
>> >> >
>> >> >On Tue, Sep 20, 2016 at 4:09 PM, Jianling Fan <fanjianling at gmail.com>
>> >> >wrote:
>> >> >
>> >> >> Add the "stringsAsFactors = F" when you read the data, and then
>> >> >> convert them to numeric.
>> >> >>
>> >> >> On 20 September 2016 at 16:00, lily li <chocold12 at gmail.com> wrote:
>> >> >> > Yes, it is stored as factor. I can't check out any problem in the
>> >> >> original
>> >> >> > data. Reread data doesn't help either. I use read.csv to read in
>> >> >the
>> >> >> data,
>> >> >> > do you think it is better to use read.table? Thanks again.
>> >> >> >
>> >> >> > On Tue, Sep 20, 2016 at 3:55 PM, Greg Snow <538280 at gmail.com>
>> >> >wrote:
>> >> >> >
>> >> >> >> This indicates that your Discharge column has been
>> >> >stored/converted as
>> >> >> >> a factor (run str(df) to verify and check other columns). This
>> >> >> >> usually happens when functions like read.table are left to try to
>> >> >> >> figure out what each column is and it finds something in that
>> >> >column
>> >> >> >> that cannot be converted to a number (possibly an oh instead of a
>> >> >> >> zero, an el instead of a one, or just a letter or punctuation mark
>> >> >> >> accidentally in the file). You can either find the error in your
>> >> >> >> original data, fix it, and reread the data, or specify that the
>> >> >column
>> >> >> >> should be numeric using the colClasses argument to read.table or
>> >> >other
>> >> >> >> function.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Sep 20, 2016 at 3:46 PM, lily li <chocold12 at gmail.com>
>> >> >wrote:
>> >> >> >> > Hi R users,
>> >> >> >> >
>> >> >> >> > I have a problem in reading data.
>> >> >> >> > For example, part of my dataframe is like this:
>> >> >> >> >
>> >> >> >> > df
>> >> >> >> > month day year Discharge
>> >> >> >> > 3 1 2010 6.4
>> >> >> >> > 3 2 2010 7.58
>> >> >> >> > 3 3 2010 6.82
>> >> >> >> > 3 4 2010 8.63
>> >> >> >> > 3 5 2010 8.16
>> >> >> >> > 3 6 2010 7.58
>> >> >> >> >
>> >> >> >> > Then if I type summary(df), why it converts the discharge data
>> >> >to
>> >> >> >> levels? I
>> >> >> >> > also met the same problem when reading some other csv files. How
>> >> >to
>> >> >> solve
>> >> >> >> > this problem? Thanks.
>> >> >> >> >
>> >> >> >> > Discharge
>> >> >> >> > 7.58 :2
>> >> >> >> > 6.4 :1
>> >> >> >> > 6.82 :1
>> >> >> >> > 8.63 :1
>> >> >> >> > 8.16 :1
>> >> >> >> >
>> >> >> >> > [[alternative HTML version deleted]]
>> >> >> >> >
>> >> >> >> > ______________________________________________
>> >> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >> >see
>> >> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> >> posting-guide.html
>> >> >> >> > and provide commented, minimal, self-contained, reproducible
>> >> >code.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Gregory (Greg) L. Snow Ph.D.
>> >> >> >> 538280 at gmail.com
>> >> >> >>
>> >> >> >
>> >> >> > [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide http://www.R-project.org/
>> >> >> posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jianling Fan
>> >> >> 樊建凌
>> >> >>
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> >______________________________________________
>> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >PLEASE do read the posting guide
>> >> >http://www.R-project.org/posting-guide.html
>> >> >and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> --
> Cooperative Fish and Wildlife Research Unit
> Zoology and Physiology Dept.
> University of Wyoming
> JoeCeradini at gmail.com / 914.707.8506
> wyocoopunit.org
> [[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list