[R] problems with function read.table
Petr PIKAL
petr.pikal at precheza.cz
Fri Sep 9 14:29:52 CEST 2011
Hi
> Hello,
>
> in the meanwhile i found the problem for the #. The problem was, that #
and
> “none” are comments by default, so I turned comments of with
comment.char=””
> as stated in the help.
Hm. You are not telling the whole story.
I made a sample excel sheet with blank values and #DIV/0! result (simple
two columns with missing values and division by these missing values in
first column)
> x<-read.delim("clipboard")
> x
r a b
1 1.25 5 4
2 1.166666667 7 6
3 0.142857143 1 7
4 #DIV/0! 5 NA
5 4.5 9 2
6 2 6 3
7 #DIV/0! 4 NA
simple read.delim gave me correctly NA in place of blank values and
#DIV/0! in first column, which therefore became factor.
> y<-read.delim("clipboard", na.strings=c("NA", "#DIV/0!"))
> y
r a b
1 1.2500000 5 4
2 1.1666667 7 6
3 0.1428571 1 7
4 NA 5 NA
5 4.5000000 9 2
6 2.0000000 6 3
7 NA 4 NA
When I defined vector of na.strings I correctly got NA values for both
blank and #DIV/0! in Excel sheet. So I wonder where you get problems with
blank values
>
> For the problem with blanks I still didn’t find the solution, even it
seems
> to be a frequent problem. I got the solution with fill=T, but this fills
my
> row at the end with a NA. I want to have the NA exactly in the field
where
> there is a blank in the .txt file.
It seems to me that you do not use correct delimiter. For Excel it is tab
- "\t". For csv it can be either ";" or "," depending on your locale. You
need to show us some small part of a file, preferably together with your
read.* command and the result you got.
Regards
Petr
>
> Date rtn vwretd ewretd
> sprtrn
> 19700102 0.000686 0.00547 0.033450
> 0.010211
> 19700105 0.009596 0.018947
> 0.004946
> 19700106 #DIV0! -0.007233
> -0.006848
> 19700107 0.000678 -0.001272
> 0.003559 -0.002047
> 19700108 0.002034 0.000564 0.11
> 0.000540
> 19700109 -0.002797 0
> -0.003021
> 19700113 0.017335 0.000737 -0.001090
> 0
>
> Can you provide a solution?
>
> Thanks,
> Samir
>
> p.s. I know that my questions seem obvious to you, I’m sorry for that,
but I
> just started to work with R ;)
>
> -----Ursprüngliche Nachricht-----
> Von: Petr PIKAL [mailto:petr.pikal at precheza.cz]
> Gesendet: Freitag, 9. September 2011 09:23
> An: Carlos Ortega
> Cc: Samir Benzerfa; r-help at r-project.org
> Betreff: Re: [R] problems with function read.table
>
> Hi
>
>
> >
> > Hi,
> >
> > If you read carefully the help pages for read.table you get this:
> >
> >
> > na.stringsa character vector of strings which are to be interpreted as
> > NA<../../utils/help/NA> values.
> > Blank fields are also considered to be missing values in logical,
> integer,
> > numeric and complex fields.
> >
> > So, both NAs and blank fields are considered as NAs directly by
> read.table.
> >
> > Once you have imported your data, you can modify with any of the
string
> > manipulation functions (sub() or gsub()) to change your "#DIV/0!" to
the
> > string "NAs". Another option is to manipulate your Excel file and
> consider
> > the division by cero with a "IF" and get back a NA if that happens.
>
> The only problem is that in such case all columns which has "#DIV/0!"
are
> converted to factors and you need to consider changing it back to
numeric.
>
> read.* functions accept as na.string definition not only one value but
> also vector of values and you can get rid of all non numeric and other
> weird Excel values by defining it as a na.strings in read.table call.
>
> > x <- read.delim("clipboard")
>
> > str(x)
> 'data.frame': 6 obs. of 3 variables:
> $ a: int 1 5 9 8 6 3
> $ b: int 3 5 7 0 NA 6
> $ r: Factor w/ 5 levels "#DIV/0!","0.333333333",..: 2 4 5 1 1 3
>
> > y<-read.delim("clipboard", na.strings=c("NA", "#DIV/0!"))
> > str(y)
> 'data.frame': 6 obs. of 3 variables:
> $ a: int 1 5 9 8 6 3
> $ b: int 3 5 7 0 NA 6
> $ r: num 0.333 1 1.286 NA NA ...
> >
>
> Regards
> Petr
>
>
> >
> > And finally, instead of using na.omits use option na.rm=T to get done
> your
> > calculations:
> >
> > > mean(c(12,23,24,45,67,NA), na.rm=T)[1] 34.2
> >
> >
> >
> > Regards,
> > Carlos Ortega
> > www.qualityexcellence.es
> >
> > On Thu, Sep 8, 2011 at 4:23 PM, Samir Benzerfa <benzerfa at gmx.ch>
wrote:
> >
> > > Hello everyone
> > >
> > >
> > >
> > > I have a couple of questions about the usage of the R function
> > > "read.table(.)". My point of departure is that I want to import a
> matrix
> > > (consisting of time and daily stock returns of many stocks) in R.
Most
> of
> > > the data is numeric, however some values are missing (blanks) and in
> other
> > > cases I have the character "#DIV/0!" (from excel). My goal is to do
> some
> > > regression analysis with this matrix. My questions now are the
> following
> > > ones:
> > >
> > >
> > >
> > > 1. How can I in general tell R to automatically replace some
> specific
> > > numbers or characters in tables by others? (for example to replace
all
> > > characters "#DIV/0!" by the number 0 or simply "NA")
> > >
> > > 2. How can I tell R to fill blanks with a number 0 or "NA"?
> > >
> > > 3. How can I tell R to omit the "NA" fields in the
calculations
> but
> > > not the whole row or column? (I realized that the function "na.omit"
> omits
> > > the whole row)
> > >
> > >
> > >
> > > Many thanks for your help!
> > >
> > >
> > >
> > > Sincerely,
> > >
> > > Samir
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list