[R] Read file
David Winsemius
dwinsemius at comcast.net
Mon Oct 4 04:05:41 CEST 2010
On Oct 3, 2010, at 9:40 PM, Nilza BARROS wrote:
> Hi, Michael
> Thank you for your help. I have already done what you said.
> But I am still facing problems to deal with my data.
>
> I need to split the data according to station..
>
> I was able to identify where the station information start using:
>
> my.data<-file("d2010100100.txt",open="rt")
> indata <- readLines(my.data, n=20000)
> i<-grep("^[837]",indata) #station number
That would give you the line numbers for any line that had an 8 , _or_
a 3, _or_ a 7 as its first digit. Was that your intent? My guess is
that you did not really want to use the square braces and should have
been using "^837".
?regex # Paragraph starting "A character class .... "
> my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000)
> stn<- my.data2$V1[i]
That would give you the first column values for the lines you earlier
selected.
> ====
This does not look like what I would expect as a value for stn. Is
that what you wanted us to think this was?
--
David.
> 2010 10 01 00
> *82599 -35.25 -5.91 52 1
> * 1008.0 -9999 115 3.1 298.6 294.6 64
> 2010 10 01 00
> *83649 -40.28 -20.26 4 7*
> 1011.0 -9999 0 0.0 298.4 296.1 64
> 1000.0 96 40 5.7 297.9 295.1 32
> 925.0 782 325 3.1 295.4 294.1 32
> 850.0 1520 270 4.1 293.8 289.4 32
> 700.0 3171 240 8.7 284.1 279.1 32
> 500.0 5890 275 8.2 266.2 262.9 32
> 400.0 7600 335 9.8 255.4 242.4 32
> ===========
> As you can see in the data above the line show the number of leves (or
> lines) for each station.
> I need to catch these lines so as to be able to feed my database.
> By the way, I didn't understand the regular expression you've used.
> I've
> tried to run it but it did not work.
>
> Hope you can help me!
> Best Regards,
> Nilza
>
>
>
>
>
> On Sun, Oct 3, 2010 at 2:18 AM, Michael Bedward
> <michael.bedward at gmail.com>wrote:
>
>> Hello Nilza,
>>
>> If your file is small you can read it into a character vector like
>> this:
>>
>> indata <- readLines("foo.dat")
>>
>> If your file is very big you can read it in batches like this...
>>
>> MAXRECS <- 1000 # for example
>> fcon <- file("foo.dat", open="r")
>> indata <- readLines(fcon, n=MAXRECS)
>>
>> The number of lines read will be given by length(indata).
>>
>> You can check to see if the end of the file has been read yet with:
>> isIncomplete( fcon )
>>
>> If a leading "*" character is a flag for the start of a station data
>> block you can find this in the indata vector with grepl...
>>
>> start.pos <- which(indata, grepl("^\\s*\\*", indata)
>>
>> When you're finished reading the file...
>> close(fcon)
>>
>> Hope this helps,
>>
>> Michael
>>
>>
>> On 3 October 2010 13:31, Nilza BARROS <nilzabarros at gmail.com> wrote:
>>> Dear R-users,
>>>
>>> I would like to know how could I read a file with different lines
>> lengths.
>>> I need read this file and create an output to feed my database.
>>> So after reading I'll need create an output like this
>>>
>>> "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460,
>> 39,390)"
>>>
>>> I mean, each line should be read. But I don`t how to do this when
>>> these
>>> lines have different lengths
>>>
>>> I really appreciate any help.
>>>
>>> Thanks.
>>>
>>>
>>>
>>> ====Below the file that should be read ===========
>>>
>>>
>>> *2010 10 01 00
>>> 83746 -43.25 -22.81 6 51*
>>> 1012.0 -9999 320 1.5 299.1 294.4 64
>>> 1000.0 114 250 4.1 298.4 294.8 32
>>> 925.0 797 0 0.0 293.6 292.9 32
>>> 850.0 1524 195 3.1 289.6 288.9 32
>>> 700.0 3156 290 11.3 280.1 280.1 32
>>> 500.0 5870 280 20.1 266.1 260.1 32
>>> 400.0 7570 265 23.7 256.6 222.7 32
>>> 300.0 9670 265 28.8 240.2 218.2 32
>>> 250.0 10920 280 27.3 230.2 220.2 32
>>> 200.0 12390 260 32.4 218.7 206.7 32
>>> 176.0 -9999 255 37.6 -9999.0 -9999.0 8
>>> 150.0 14180 245 35.5 205.1 196.1 32
>>> 100.0 16560 300 17.0 195.2 186.2 32
>>> *2010 10 01 00
>>> 83768 -51.13 -23.33 569 41
>>> * 1000.0 79 -9999 -9999.0 -9999.0 -9999.0 32
>>> 946.0 -9999 270 1.0 295.8 292.1 64
>>> 925.0 763 15 2.1 296.4 290.4 32
>>> 850.0 1497 175 3.6 290.8 288.4 32
>>> 700.0 3140 295 9.8 282.9 278.6 32
>>> 500.0 5840 285 23.7 267.1 232.1 32
>>> 400.0 7550 255 35.5 255.4 231.4 32
>>> 300.0 9640 265 37.0 242.2 216.2 32
>>>
>>>
>>> Best Regards,
>>>
>>> --
>>> Abraço,
>>> Nilza Barros
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list