[R] Read file

David Winsemius dwinsemius at comcast.net
Mon Oct 4 04:05:41 CEST 2010


On Oct 3, 2010, at 9:40 PM, Nilza BARROS wrote:

> Hi, Michael
> Thank you for your help. I have already done what you said.
> But I am still facing problems to deal with my data.
>
> I need to split the data according to station..
>
> I was able to identify where the station information start using:
>
> my.data<-file("d2010100100.txt",open="rt")
> indata <- readLines(my.data, n=20000)
> i<-grep("^[837]",indata)  #station number

That would give you the line numbers for any line that had an 8 , _or_  
a 3, _or_ a 7 as its first digit. Was that your intent? My guess is  
that you did not really want to use the square braces and should have  
been using "^837".

?regex  # Paragraph starting "A character class .... "

> my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000)
> stn<- my.data2$V1[i]

That would give you the first column values for the lines you earlier  
selected.


> ====

This does not look like what I would expect as a value for stn. Is  
that what you wanted us to think this was?

-- 
David.


> 2010 10 01 00
> *82599  -35.25  -5.91     52   1
> * 1008.0  -9999    115     3.1   298.6   294.6 64
> 2010 10 01 00
> *83649  -40.28 -20.26      4  7*
> 1011.0  -9999      0     0.0   298.4   296.1 64
> 1000.0     96     40     5.7   297.9   295.1 32
>  925.0    782    325     3.1   295.4   294.1 32
>  850.0   1520    270     4.1   293.8   289.4 32
>  700.0   3171    240     8.7   284.1   279.1 32
>  500.0   5890    275     8.2   266.2   262.9 32
>  400.0   7600    335     9.8   255.4   242.4 32
> ===========
> As you can see in the data above the line show the number of leves (or
> lines) for each station.
> I need to catch these lines so as to be able to feed my database.
> By the way, I didn't understand the regular expression you've used.  
> I've
> tried to run it but it did not work.
>
> Hope you can help me!
> Best Regards,
> Nilza
>
>
>
>
>
> On Sun, Oct 3, 2010 at 2:18 AM, Michael Bedward
> <michael.bedward at gmail.com>wrote:
>
>> Hello Nilza,
>>
>> If your file is small you can read it into a character vector like  
>> this:
>>
>> indata <- readLines("foo.dat")
>>
>> If your file is very big you can read it in batches like this...
>>
>> MAXRECS <- 1000  # for example
>> fcon <- file("foo.dat", open="r")
>> indata <- readLines(fcon, n=MAXRECS)
>>
>> The number of lines read will be given by length(indata).
>>
>> You can check to see if the end of the file has been read yet with:
>> isIncomplete( fcon )
>>
>> If a leading "*" character is a flag for the start of a station data
>> block you can find this in the indata vector with grepl...
>>
>> start.pos <- which(indata, grepl("^\\s*\\*", indata)
>>
>> When you're finished reading the file...
>> close(fcon)
>>
>> Hope this helps,
>>
>> Michael
>>
>>
>> On 3 October 2010 13:31, Nilza BARROS <nilzabarros at gmail.com> wrote:
>>> Dear R-users,
>>>
>>> I would like to know how could I read a file with different lines
>> lengths.
>>> I need read this file and create an output to feed my database.
>>> So after reading I'll need create an output like this
>>>
>>> "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460,
>> 39,390)"
>>>
>>> I mean,  each line should be read. But I don`t how to do this when  
>>> these
>>> lines have different lengths
>>>
>>> I really appreciate any help.
>>>
>>> Thanks.
>>>
>>>
>>>
>>> ====Below the file that should be read ===========
>>>
>>>
>>> *2010 10 01 00
>>> 83746  -43.25 -22.81      6  51*
>>> 1012.0  -9999    320     1.5   299.1   294.4 64
>>> 1000.0    114    250     4.1   298.4   294.8 32
>>> 925.0    797      0     0.0   293.6   292.9 32
>>> 850.0   1524    195     3.1   289.6   288.9 32
>>> 700.0   3156    290    11.3   280.1   280.1 32
>>> 500.0   5870    280    20.1   266.1   260.1 32
>>> 400.0   7570    265    23.7   256.6   222.7 32
>>> 300.0   9670    265    28.8   240.2   218.2 32
>>> 250.0  10920    280    27.3   230.2   220.2 32
>>> 200.0  12390    260    32.4   218.7   206.7 32
>>> 176.0  -9999    255    37.6 -9999.0 -9999.0  8
>>> 150.0  14180    245    35.5   205.1   196.1 32
>>> 100.0  16560    300    17.0   195.2   186.2 32
>>> *2010 10 01 00
>>> 83768  -51.13 -23.33    569  41
>>> * 1000.0     79  -9999 -9999.0 -9999.0 -9999.0 32
>>> 946.0  -9999    270     1.0   295.8   292.1 64
>>> 925.0    763     15     2.1   296.4   290.4 32
>>> 850.0   1497    175     3.6   290.8   288.4 32
>>> 700.0   3140    295     9.8   282.9   278.6 32
>>> 500.0   5840    285    23.7   267.1   232.1 32
>>> 400.0   7550    255    35.5   255.4   231.4 32
>>> 300.0   9640    265    37.0   242.2   216.2 32
>>>
>>>
>>> Best Regards,
>>>
>>> --
>>> Abraço,
>>> Nilza Barros
>


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list