[R] reading a matrix from a file
Zhuanshi He
zhuanshi.he at gmail.com
Tue Jun 27 18:58:25 CEST 2006
Maybe this link is useful
http://www.bic.mni.mcgill.ca/users/jason/cortex/stats-manuals/mni.read.glim.file.html
Also, section 2.3 through http://cran.r-project.org/doc/manuals/R-data.html
2.3 Using scan directly
Both read.table and read.fwf use scan to read the file, and then
process the results of scan. They are very convenient, but sometimes
it is better to use scan directly.
Function scan has many arguments, most of which we have already
covered under read.table. The most crucial argument is what, which
specifies a list of modes of variables to be read from the file. If
the list is named, the names are used for the components of the
returned list. Modes can be numeric, character or complex, and are
usually specified by an example, e.g. 0, "" or 0i. For example
cat("2 3 5 7", "11 13 17 19", file="ex.dat", sep="\n")
scan(file="ex.dat", what=list(x=0, y="", z=0), flush=TRUE)
returns a list with three components and discards the fourth column in
the file.
There is a function readLines which will be more convenient if all you
want is to read whole lines into R for further processing.
One common use of scan is to read in a large matrix. Suppose file
matrix.dat just contains the numbers for a 200 x 2000 matrix. Then we
can use
A <- matrix(scan("matrix.dat", n = 200*2000), 200, 2000, byrow = TRUE)
On one test this took 1 second (under Linux, 3 seconds under Windows
on the same machine) whereas
A <- as.matrix(read.table("matrix.dat"))
took 10 seconds (and more memory), and
A <- as.matrix(read.table("matrix.dat", header = FALSE, nrows = 200,
comment.char = "", colClasses = "numeric"))
took 7 seconds. The difference is almost entirely due to the overhead
of reading 2000 separate short columns: were they of length 2000, scan
took 9 seconds whereas read.table took 18 if used efficiently (in
particular, specifying colClasses) and 125 if used naively.
Note that timings can depend on the type read and the data. Consider
reading a million distinct integers:
writeLines(as.character((1+1e6):2e6), "ints.dat")
xi <- scan("ints.dat", what=integer(0), n=1e6) # 0.77s
xn <- scan("ints.dat", what=numeric(0), n=1e6) # 0.93s
xc <- scan("ints.dat", what=character(0), n=1e6) # 0.85s
xf <- as.factor(xc) # 2.2s
DF <- read.table("ints.dat") # 4.5s
and a million examples of a small set of codes:
code <- c("LMH", "SJC", "CHCH", "SPC", "SOM")
writeLines(sample(code, 1e6, replace=TRUE), "code.dat")
y <- scan("code.dat", what=character(0), n=1e6) # 0.44s
yf <- as.factor(y) # 0.21s
DF <- read.table("code.dat") # 4.9s
DF <- read.table("code.dat", nrows=1e6) # 3.6s
Note that these timings depend heavily on the operating system (the
basic reads in Windows take at least as twice as long as these Linux
times) and on the precise state of the garbage collector.
Hope this works.
Z. He
On 6/28/06, Cuau <cuauv at yahoo.com> wrote:
>
>
> Hello everyone,
>
> I'm writting a little script that will read a matrix from a file
>
> i.e.
>
> 0,.11,.22,.4
> .11,0,.5,.3
> .22,.5,0,.7
> anb so on
>
> and will then calculate some standard stats for nets (i.e. centralization, degree, etc).
>
> So far I have opened the file and read the contents, however I' m using readLines(filename)
> to read the file and it returns it as one big String with no divitions. I tried using
> strsplit(String)
> to split it but eventhough is working I'm not able to put the output of the above into a matrix.
>
> Below is an example of what I have done
>
>
> > INfile<-file("mTest.txt", "r")
> > readLines(INfile)->matrix
> > matrix
> [1] "1, 2, 3"
> > strsplit(matrix, ",")->splitLine
> > splitLine
> [[1]]
> [1] "1" " 2" " 3"
>
> > netMatrix <-matrix(c(splitLine), nrow=1,ncol=3)
> > netMatrix
> [,1] [,2] [,3]
> [1,] Character,3 Character,3 Character,3
>
>
> Does anyone have an idea how can I read a matrix and store it in the form of a matrix.
>
> thks
>
> -Cuau Vital
>
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>
--
Zhuanshi He / Z. He (PhD)
ADvanced Environmental Monitoring Research Center (ADEMRC)
Gwangju Institute of Science and Technology
1 Oryong-dong, Buk-gu, Gwangju 500-712, Republic of Korea.
Tel. +82-62-970-3406 Fax. +82-62-970-3404
Email: Zhuanshi.He at gmail.com
Web: http://atm1.gist.ac.kr/~hzs/
More information about the R-help
mailing list