[R] RES: Read.table problems
Leandro Marino
leandro at cesgranrio.org.br
Mon May 18 19:16:37 CEST 2009
I was having this problem with a file. My problem was a ' indo the name, link Ricardo D'avilla, the ' skipped all sep to the end of the file.
Maybe it is your problem.
Atenciosamente,
Leandro Lins Marino
Centro de Avaliação
Fundação CESGRANRIO
Rua Santa Alexandrina, 1011 - 2º andar
Rio de Janeiro, RJ - CEP: 20261-903
R (21) 2103-9600 R.:236
(21) 8777-7907
( leandro at cesgranrio.org.br
"Aquele que suporta o peso da sociedade
é precisamente aquele que obtém
as menores vantagens". (SMITH, Adam)
Antes de imprimir pense em sua responsabilidade e compromisso com o MEIO AMBIENTE
Esta mensagem, incluindo seus anexos, pode conter informacoes privilegiadas e/ou de carater confidencial, nao podendo ser retransmitida sem autorizacao do remetente. Se voce nao e o destinatario ou pessoa autorizada a recebe-la, informamos que o seu uso, divulgacao, copia ou arquivamento sao proibidos.
Portanto, se você recebeu esta mensagem por engano, por favor, nos informe respondendo imediatamente a este e-mail e em seguida apague-a.
-----Mensagem original-----
De: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Em nome de Marc Schwartz
Enviada em: segunda-feira, 18 de maio de 2009 13:58
Para: Steve Murray
Cc: r-help at r-project.org
Assunto: Re: [R] Read.table problems
On May 18, 2009, at 11:24 AM, Steve Murray wrote:
>
> Dear all,
>
> I have a file which I've converted from NetCDF (.nc) to text (.txt)
> using ncdump in Unix (as I had problems using the ncdf package to do
> this). The first few rows (as copied and pasted from the Unix
> console) of the file appear as follows:
>
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _,
>
>
> As you can see, there are a lot of NA values before the actual
> numeric values start further down the dataset. My problem is that
> I'm having trouble reading this file into R. I think the problem
> lies with the sep= argument, although I may be wrong. I tried the
> following command at first, as the data appear to be comma separated:
>
>> read.table("test86.txt", skip=43, na.strings="-", header=FALSE,
>> sep=",") -> test86 # skip =43 due to meta-data information being
>> held in the initial rows
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> line 29 did not have 25 elements
>
> I then tried sep=" ", followed by sep="" but received a similar-type
> error message (although line 29 doesn't appear to be especially
> different from the rest).
>
> I subsequently tried using sep=\t and then sep=\n. These both result
> in the data being read in without an error message being displayed,
> although the data are formatted as follows:
>
>> head(test86)
> V1
> 1 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _, _,
> 2 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _, _,
> 3 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _, _,
> 4 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _, _,
> 5 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _, _,
> 6 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,
> _, _, _,
>
>
>> dim(test86)
> [1] 179899 1
>
>
> Instead of one column, I'd expect there to be 720.
>
>
> I think I'm getting something wrong relating to the sep= argument
> (or possibly mis-using na.strings?). If anyone has any solutions to
> this then I'd be very grateful to hear them.
>
> Many thanks for any advice,
>
> Steve
Two problems,
1. Your first line above has one more column/entry than the subsequent
lines. If that is correct, you need to use the 'fill = TRUE' argument
so that all subsequent rows are filled to have the same number of
columns. If the above is due to a copy/paste error, then disregard this.
2. You are using a '-' (hyphen) as your 'na.strings' character, when
the data is using a '_' (underscore).
Additionally, I would use 'strip.white = TRUE', to aid in getting rid
of extraneous white space around your fields/separators. That will
also help with column separations.
Thus (on OSX) with the above data copied to the clipboard:
> read.table(pipe("pbpaste"), na.strings = "_", sep = ",", fill =
TRUE, strip.white = TRUE)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
V20 V21 V22 V23 V24 V25 V26
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
HTH,
Marc Schwartz
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list