[R] How to import ENSEMBL text data using R
Charles C. Berry
cberry at tajo.ucsd.edu
Tue Jan 1 00:58:58 CET 2008
On Mon, 31 Dec 2007, mohamed nur anisah wrote:
> Dear all,
> I have a data which is in text file and i would like to import the data
> to R. From the manual, i've found the read.table command function is
> the most appropriate but when i wrote the command an error had occur.
> It say 'Error in read.table"C:/Users/user/Documents/cfa-1.txt", header
> = T, sep = "\t",skip=10) :more columns than column names'. Please help
> me with this as i'm a first time user to R.
First, did you read
R Data Import/Export
2 Spreadsheet-like data
especially 2.1 Variations on read.table ??
If not go there and study up - there are many useful hints.
Looking at your file I see complications. Consider these questions:
What is the field separator?
looks like '\t', but ...
Do you have the same number of field separators (and fields)
in every row?
apparently not, and there seem to be unusual variations on
the record structure - like tabs missing where I would have
expected to see them (starting line line CSO 8.4 following
the first ']') and text fields in some but not all records
- and the use of square brackets to enclose some fields and
reverse square brackets for others is new to me!
Did some gremlin edit this file in WORD or EXCEL or otherwise
corrupt it?
If so all bets are off. Tell whoever did this to you to
go memorize
http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html
and make him/her/them promise never to do it again.
If you do not have the same number of field separators in each row,
you will quickly run aground without resorting to more programmerish
tricks.
If it were me, I'd swallow all of the data ( or a few thousand lines
for exploratory purposes ) with readLines() and then use string
processing and regular expression trickery to decipher the
records. But if you are not skilled in that art it may take you a
while to catch on.
Also, I might see if there is another output format available (XML?)
that R might parse, and/or I'd see if there is an annotation or
package on BioConductor that can give what is needed (consider posting
to that list, but state the problem you want to solve broadly rather
than just posting the same troublesome line of code as here).
HTH,
Chuck
p.s. Did you read the Posting Guide (as requested)? There have been
lots of read.table questions posted to this list and plenty of
guidance on getting past read.table hiccups.
>
> Thanks in advance.
>
> Cheers,
> Anisah
>
>
>
> ---------------------------------
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list