[R] Problem reading mixed CSV file
Ashish Agarwal
ashish.agarwala at gmail.com
Tue Mar 20 16:43:37 CET 2012
The file is 20MB having 2 Million rows.
I understand that I two different formats - 6 columns and 7 columns.
How do I read chunks to different files by using scan with modifying
skip and nlines parameters?
On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
>
> I would follow Jims suggestion,
> nFields <- count.fields(fileName, sep = ',')
> count fields and read chunks to different files by using scan with
> modifying skip and nlines parameters. However if there is only few lines
> which differ it would be better to correct those few lines manually in
> some suitable editor.
>
> Elaborating omnipotent function for reading any kind of
> corrupted/nonstandard files seems to me suited only if you expect to read
> such files many times.
>
> Regards
> Petr
>
>
>>
>>
>>
>> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman <jholtman at gmail.com> wrote:
>> > Here is a solution that looks for the line with 7 elements and inserts
>> > the quotes:
>> >
>> >
>> >> fileName <- '/temp/text.txt'
>> >> input <- readLines(fileName)
>> >> # count the fields to find 7
>> >> nFields <- count.fields(fileName, sep = ',')
>> >> # now fix the data
>> >> for (i in which(nFields == 7)){
>> > + # split on comma
>> > + z <- strsplit(input[i], ',')[[1]]
>> > + input[i] <- paste(z[1], z[2]
>> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes
>> > + , z[5], z[6], z[7], sep = ','
>> > + )
>> > + }
>> >>
>> >> # now read in the data
>> >> result <- read.table(textConnection(input), sep = ',')
>> >>
>> >> result
>> > V1 V2 V3 V4 V5 V6
>> > 1 1968 21 0
>> > 2 Boston 1968 13 0
>> > 3 Boston 1968 18 0
>> > 4 Chicago 1967 44 0
>> > 5 Providence 1968 17 0
>> > 6 Providence 1969 48 0
>> > 7 Binky 1968 24 0
>> > 8 Chicago 1968 23 0
>> > 9 Dally 1968 7 0
>> > 10 Raleigh, North Carol 1968 25 0
>> > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0
>> > 12 DEF_REQPRF/ Dartmouth 1967 31 1
>> > 13 PL 1967 38 1
>> > 14 XY PopatLal 1967 5 1
>> > 15 XY PopatLal 1967 6 8
>> > 16 XY PopatLal 1967 7 7
>> > 17 XY PopatLal 1967 9 1
>> > 18 XY PopatLal 1967 10 1
>> > 19 XY PopatLal 1967 13 1
>> > 20 XY PopatLal Boston 1967 6 1
>> > 21 XY PopatLal Boston 1967 7 11
>> > 22 XY PopatLal Boston 1967 9 2
>> > 23 XY PopatLal Boston 1967 10 3
>> > 24 XY PopatLal Boston 1967 7 2
>> >>
>> >
>> >
>> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal
>> > <ashish.agarwala at gmail.com> wrote:
>> >> I have a file that is 5000 records and to edit that file is not easy.
>> >> Is there any way to line 10 differently to account for changes in the
>> >> third field?
>> >>
>> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers <ehlers at ucalgary.ca>
> wrote:
>> >>> On 2012-03-16 10:48, Ashish Agarwal wrote:
>> >>>>
>> >>>> Line 10 has City and State that too separated by comma. For line 10
>> >>>> how can I read differently as compared to the other lines?
>> >>>
>> >>>
>> >>> Edit the file and put quotes around the city-state combination:
>> >>> "Raleigh, North Carol"
>> >>>
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Jim Holtman
>> > Data Munger Guru
>> >
>> > What is the problem that you are trying to solve?
>> > Tell me what you want to do, not how you want to do it.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list