[R] Read file
jim holtman
jholtman at gmail.com
Tue Oct 5 04:16:17 CEST 2010
Is this what you are looking for:
> input <- readLines(textConnection(" 2010 10 01 00
+ *82599 -35.25 -5.91 52 1*
+ 1008.0 -9999 115 3.1 298.6 294.6 64
+ 2010 10 01 00
+ *83649 -40.28 -20.26 4 7
+ *1011.0 -9999 0 0.0 298.4 296.1 64
+ 1000.0 96 40 5.7 297.9 295.1 32
+ 925.0 782 325 3.1 295.4 294.1 32
+ 850.0 1520 270 4.1 293.8 289.4 32
+ 700.0 3171 240 8.7 284.1 279.1 32
+ 500.0 5890 275 8.2 266.2 262.9 32
+ 400.0 7600 335 9.8 255.4 242.4 32"))
> closeAllConnections()
> # remove the "*" since they seem to be inconsistent
> input <- gsub("\\*|^ ", "", input)
>
> date <- NULL # hold the date
> station <- NULL # hold the station ID
> # now parse each line
> # length = 4 => date
> # length = 5 => station id
> # length = 7 => data
> result <- lapply(input, function(.line){
+ x <- as.numeric(strsplit(.line, '[[:space:]]+')[[1]])
+ if (length(x) == 4) date <<- x[1] * 1000000 + x[2] * 10000 +
+ x[3] * 100 + x[4]
+ else if (length(x) == 5) station <<- x[1]
+ else if (length(x) == 7) return(data.frame(date = date,
+ station = station,
+ x[1], x[2], x[3], x[4], x[5], x[6], x[7]))
+ else cat("invalid line:", .line, '\n')
+ return(NULL)
+ })
>
> # combine into single dataframe
> do.call(rbind, result)
date station x.1. x.2. x.3. x.4. x.5. x.6. x.7.
1 2010100100 82599 1008 -9999 115 3.1 298.6 294.6 64
2 2010100100 83649 1011 -9999 0 0.0 298.4 296.1 64
3 2010100100 83649 1000 96 40 5.7 297.9 295.1 32
4 2010100100 83649 925 782 325 3.1 295.4 294.1 32
5 2010100100 83649 850 1520 270 4.1 293.8 289.4 32
6 2010100100 83649 700 3171 240 8.7 284.1 279.1 32
7 2010100100 83649 500 5890 275 8.2 266.2 262.9 32
8 2010100100 83649 400 7600 335 9.8 255.4 242.4 32
>
On Mon, Oct 4, 2010 at 9:52 PM, Nilza BARROS <nilzabarros at gmail.com> wrote:
> Sorry, guys
> I couldn`t explain what I really wanted.
> I have a file with many station and many information for each one.
> I need identified the line where the station information start. After that
> I`d like to store that data (related to the station) so as to it could be
> work in separate way.
>
> If I was using another language as Fortran , I would save the data in a
> vector.
> But in R I don`t know how to do this :(
>
> ====David`s Questions===========
>
> *my.data<-file("d2010100100.txt",open="rt")
> indata <- readLines(my.data, n=20000)
> i<-grep("^[837]",indata) #station number*
> **
> *That would give you the line numbers for any line that had an 8 , _or_ a 3,
> _or_ a 7 as its first digit. Was that your intent? My guess is that you did
> not really want to use the square braces and should have been using "^837".*
> *?regex # Paragraph starting "A character class .... "*
> *## In fact I am trying to find out the station in the file. As the
> Brazilian station start with `83` I intend to picked them up.*
> **
> **
> *my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000)
> stn<- my.data2$V1[i]*
> **
> *- That would give you the first column values for the lines you earlier
> selected*.
> ## It gave me all the station that started with `873`. I did it just because
> I needed to know how many station there was in the file. But it is not
> helping me to solve the problem.
> Thanks in Advanced
> Nilza Barros
> On Sun, Oct 3, 2010 at 11:05 PM, David Winsemius <dwinsemius at comcast.net>wrote:
>
>>
>> On Oct 3, 2010, at 9:40 PM, Nilza BARROS wrote:
>>
>> Hi, Michael
>>> Thank you for your help. I have already done what you said.
>>> But I am still facing problems to deal with my data.
>>>
>>> I need to split the data according to station..
>>>
>>> I was able to identify where the station information start using:
>>>
>>> my.data<-file("d2010100100.txt",open="rt")
>>> indata <- readLines(my.data, n=20000)
>>> i<-grep("^[837]",indata) #station number
>>>
>>
>> That would give you the line numbers for any line that had an 8 , _or_ a 3,
>> _or_ a 7 as its first digit. Was that your intent? My guess is that you did
>> not really want to use the square braces and should have been using "^837".
>>
>> ?regex # Paragraph starting "A character class .... "
>>
>>
>> my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000)
>>> stn<- my.data2$V1[i]
>>>
>>
>> That would give you the first column values for the lines you earlier
>> selected.
>>
>>
>> ====
>>>
>>
>> This does not look like what I would expect as a value for stn. Is that
>> what you wanted us to think this was?
>>
>> --
>> David.
>>
>>
>>
>> 2010 10 01 00
>>> *82599 -35.25 -5.91 52 1
>>> * 1008.0 -9999 115 3.1 298.6 294.6 64
>>> 2010 10 01 00
>>> *83649 -40.28 -20.26 4 7*
>>> 1011.0 -9999 0 0.0 298.4 296.1 64
>>> 1000.0 96 40 5.7 297.9 295.1 32
>>> 925.0 782 325 3.1 295.4 294.1 32
>>> 850.0 1520 270 4.1 293.8 289.4 32
>>> 700.0 3171 240 8.7 284.1 279.1 32
>>> 500.0 5890 275 8.2 266.2 262.9 32
>>> 400.0 7600 335 9.8 255.4 242.4 32
>>> ===========
>>> As you can see in the data above the line show the number of leves (or
>>> lines) for each station.
>>> I need to catch these lines so as to be able to feed my database.
>>> By the way, I didn't understand the regular expression you've used. I've
>>> tried to run it but it did not work.
>>>
>>> Hope you can help me!
>>> Best Regards,
>>> Nilza
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Oct 3, 2010 at 2:18 AM, Michael Bedward
>>> <michael.bedward at gmail.com>wrote:
>>>
>>> Hello Nilza,
>>>>
>>>> If your file is small you can read it into a character vector like this:
>>>>
>>>> indata <- readLines("foo.dat")
>>>>
>>>> If your file is very big you can read it in batches like this...
>>>>
>>>> MAXRECS <- 1000 # for example
>>>> fcon <- file("foo.dat", open="r")
>>>> indata <- readLines(fcon, n=MAXRECS)
>>>>
>>>> The number of lines read will be given by length(indata).
>>>>
>>>> You can check to see if the end of the file has been read yet with:
>>>> isIncomplete( fcon )
>>>>
>>>> If a leading "*" character is a flag for the start of a station data
>>>> block you can find this in the indata vector with grepl...
>>>>
>>>> start.pos <- which(indata, grepl("^\\s*\\*", indata)
>>>>
>>>> When you're finished reading the file...
>>>> close(fcon)
>>>>
>>>> Hope this helps,
>>>>
>>>> Michael
>>>>
>>>>
>>>> On 3 October 2010 13:31, Nilza BARROS <nilzabarros at gmail.com> wrote:
>>>>
>>>>> Dear R-users,
>>>>>
>>>>> I would like to know how could I read a file with different lines
>>>>>
>>>> lengths.
>>>>
>>>>> I need read this file and create an output to feed my database.
>>>>> So after reading I'll need create an output like this
>>>>>
>>>>> "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460,
>>>>>
>>>> 39,390)"
>>>>
>>>>>
>>>>> I mean, each line should be read. But I don`t how to do this when these
>>>>> lines have different lengths
>>>>>
>>>>> I really appreciate any help.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>> ====Below the file that should be read ===========
>>>>>
>>>>>
>>>>> *2010 10 01 00
>>>>> 83746 -43.25 -22.81 6 51*
>>>>> 1012.0 -9999 320 1.5 299.1 294.4 64
>>>>> 1000.0 114 250 4.1 298.4 294.8 32
>>>>> 925.0 797 0 0.0 293.6 292.9 32
>>>>> 850.0 1524 195 3.1 289.6 288.9 32
>>>>> 700.0 3156 290 11.3 280.1 280.1 32
>>>>> 500.0 5870 280 20.1 266.1 260.1 32
>>>>> 400.0 7570 265 23.7 256.6 222.7 32
>>>>> 300.0 9670 265 28.8 240.2 218.2 32
>>>>> 250.0 10920 280 27.3 230.2 220.2 32
>>>>> 200.0 12390 260 32.4 218.7 206.7 32
>>>>> 176.0 -9999 255 37.6 -9999.0 -9999.0 8
>>>>> 150.0 14180 245 35.5 205.1 196.1 32
>>>>> 100.0 16560 300 17.0 195.2 186.2 32
>>>>> *2010 10 01 00
>>>>> 83768 -51.13 -23.33 569 41
>>>>> * 1000.0 79 -9999 -9999.0 -9999.0 -9999.0 32
>>>>> 946.0 -9999 270 1.0 295.8 292.1 64
>>>>> 925.0 763 15 2.1 296.4 290.4 32
>>>>> 850.0 1497 175 3.6 290.8 288.4 32
>>>>> 700.0 3140 295 9.8 282.9 278.6 32
>>>>> 500.0 5840 285 23.7 267.1 232.1 32
>>>>> 400.0 7550 255 35.5 255.4 231.4 32
>>>>> 300.0 9640 265 37.0 242.2 216.2 32
>>>>>
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> --
>>>>> Abraço,
>>>>> Nilza Barros
>>>>>
>>>>
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
>
> --
> Abraço,
> Nilza Barros
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list