[R] More difficulties in getting data into R
Liaw, Andy
andy_liaw at merck.com
Tue Jul 6 15:06:19 CEST 2004
This is what I'd try:
col2and4 <- matrix(scan(pipe("cut -d\| -f2,4 cmie_firm_data.text |"),
skip=2), ncol=2, byrow=TRUE)
Andy
> From: Liaw, Andy
>
> Could it be that you happen you have `#' in `col4'? Try
> either (or both):
>
> 1. read.table(..., comment.char="")
> 2. scan(...)
>
> HTH,
> Andy
>
> > From: Ajay Shah
> >
> > In order to get around the problems of my posting a few
> minutes ago, I
> > thought:
> >
> > $ awk -F\| '(NR > 2) {print $2}' cmie_firm_data.text > col2
> > $ awk -F\| '(NR > 2) {print $4}' cmie_firm_data.text > col4
> > $ paste col2 col4 | head -2
> > -510.45 -510.27
> > 60700 101900
> > $ paste col2 col4 | tail -2
> > 28648.12 31617.02
> > 491014.77 494308.52
> > $ wc -l col2 col4
> > 89323 col2
> > 89323 col4
> > 178646 total
> >
> > So all is well.
> >
> > But R doesn't like it:
> >
> > $ R --vanilla < picture.R
> >
> > R : Copyright 2004, The R Foundation for Statistical Computing
> > Version 1.9.1 (2004-06-21), ISBN 3-900051-00-3
> >
> > > col2 <- read.table(file="col2")
> > > col4 <- read.table(file="col4")
> > > print(nrow(col2))
> > [1] 89323
> > > print(nrow(col4))
> > [1] 88746
> >
> > Why might I be getting 89,323 and 88,746 obs for two files
> which `wc'
> > believes are each 89,323 lines long?
> >
> > I checked, and there is no single quote or C-m in either file.
> >
> > --
> > Ajay Shah
> Consultant
> > ajayshah at mayin.org Department of
> Economic Affairs
> > http://www.mayin.org/ajayshah Ministry of
> Finance, New Delhi
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachments,
> contains information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New Jersey, USA 08889), and/or its
> affiliates (which may be known outside the United States as
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for the use
> of the individual or entity named on this message. If you
> are not the intended recipient, and have received this
> message in error, please notify us immediately by reply
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>
More information about the R-help
mailing list