[R] text file imported incorrectly

Weiyang Lim Weiyang.Lim at henderson.com
Thu Sep 4 10:06:22 CEST 2008


Thanks, Prof Ripley.

I did read about the quote argument in the 'R Data Import/Export Manual'. But unfortunately, I did not really understand what it means and did not adjust that argument. But now after adjusting for it, it works for me.

Thanks.

Best Regards,
wy

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: 04 September 2008 15:43
To: Weiyang Lim
Cc: r-help at r-project.org
Subject: Re: [R] text file imported incorrectly

Please do read the help page (as you were asked to do before posting).
See the 'quote' argument.

This is also covered in the 'R Data Import/Export Manual'.

On Thu, 4 Sep 2008, Weiyang Lim wrote:

> Dear R-users,
>
> When I tried to import a text file (tab delimited) which has 2000+ rows with the following command (With the importData in S, it works though),
>
> x <- read.table(textfile, sep= "\t", skip=5, stringAsFactors=F)
>
> I received the following warning message: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,: number of items read is not a multiple of the number of columns. I checked the resulting data frame and found only about 1500 observations rather than 2000+ observations.
>
> Then, I used the command count.fields(textfile, sep="\t") and it showed that I have rows which have either 4 fields or 294 fields. (There are 294 variables altogether) When I tried to check those observations/rows which have only 4 fields indicated by count.fields, I realized that the problem is quite likely due to one of the variables I have. For this company variable,
>
> The "problematic" rows have names such as:
> BANK INT'L INDONESIA
> BEIJING CAP INT'L AIRP H
> BELLE INT'L HLDGS(CN)
>
> The other non-problematic rows have names like
>
> ANZ BANKING GROUP
> BABCOCK & BROWN
> BEC WORLD
>
> which did not give problems.
>
> I believe the ' symbol is causing this variable for some of these rows to be read incorrectly. How do I read this field such that the names
>
> BANK INT'L INDONESIA
> BEIJING CAP INT'L AIRP H
> BELLE INT'L HLDGS(CN) etc
>
> can be interpreted as a single field and that all my rows will have 294 fields correctly interpreted by R. What will be the correct command to issue?
>
> Hope I am not unclear in my explanation of my problem.
>
> Hope to have your kind assistance!
>
> Best Regards,
> wy

>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

**********************************************************

The information provided in this e-mail is confidential and is for the sole use of the recipient. It may not be disclosed, copied or distributed in any form without the express permission of Henderson Global Investors and to the extent that it is passed on, care must be taken to ensure that this is in a form which accurately reflects the information presented here.

Whilst Henderson Global Investors believe that the information is correct at the date of this e-mail, no warranty or representation is given to this effect and no responsibility can be accepted by Henderson Global Investors to any end users for any action taken on the basis of this information.

Henderson Global Investors is the name under which Henderson Global Investors Limited (registered no. 906355), Henderson Fund Management plc (registered no. 2607112), Henderson Investment Funds Limited (registered no. 2678531), Henderson Investment Management Limited (registered no. 1795354) Henderson Alternative Investment Advisor Limited (registered no. 962757) and Henderson Equity Partners Limited (registered no.2606646) (each incorporated and registered in England and Wales with registered office at 4 Broadgate, London EC2M 2DA and authorised and regulated by the Financial Services Authority) provide investment products and services.  Henderson Secretarial Services Limited (incorporated and registered in England and Wales, registered no. 1471624, registered office 4 Broadgate, London EC2M 2DA) is the name under which company secretarial services are provided. All these companies are wholly owned subsidiaries of Henderson Group plc (incorporated and registered in England and Wales, registered no. 2072534, registered office 4 Broadgate, London EC2M 2DA).

We may record telephone calls or email for our mutual protection and to improve customer service. 



More information about the R-help mailing list