[R] suggestions regarding reading in a messy file

Juliet Hannah juliet.hannah at gmail.com
Wed Jul 13 16:12:17 CEST 2011


Thanks David. count.fields revealed the problem, and pointed me in a
direction to understand some basics
that I had missed.

Writing the original file with quote=TRUE solved the problem or
reading it in with
quote="" also fixed the problem.


On Tue, Jul 12, 2011 at 4:48 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Jul 12, 2011, at 4:37 PM, Juliet Hannah wrote:
>
>> I have a file in stata format, which I have read in, and I am trying
>> to create a text file. I have exported the data using various
>> delimiters, but I'm unable to read it back in. I originally read in
>> the file with:
>>
>> library(foreign)
>> myData <- read.dta("mydata.dta")
>>
>> I then exported it with write.table using comma, tab, and exclamation
>> marks as a delimiter.
>>
>> When I was unable to read in it, I used readLines to check the number
>> of fields in each row. For example, when using a comma, I checked the
>> number of entries in each line using:
>>
>> con <- file("
>> while ( length(oneLine <- readLines(con, 1)) ) {
>>  lineLength <- length(strsplit(oneLine,",")[[1]])
>>  cat(lineLength,"\n")
>>  }
>> close(con)
>>
>> This prints out 57 for each line.
>
> But does not test for unmatched quotes, extraneous "#",  and such.
>
> Try instead:
>
> count.fields(myfile.txt", sep=",")
>
>>
>> But then I try:
>>
>> cc <- rep("character",57)
>> myData <- read.table("myfile.txt",header=TRUE,sep=",",colClasses=cc)
>>
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>  :
>>  line 10 did not have 57 elements
>>
>> I'm unable to post a sample of the data so I'm just looking for
>> suggestions. The data  is messy meaning some of the fields have
>> comments as the survey response. Still, I was able to work with it as
>> long as I read it in from the stata  file.
>>
>> I was trying to avoid using the 'fill' option because that has given
>> me problems before.
>>
>> Thanks for your help.
>>
>> Juliet
>>
>>> sessionInfo()
>>
>> R version 2.13.0 (2011-04-13)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>                     LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] foreign_0.8-43
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>



More information about the R-help mailing list