[R] Largest allowable matrix
Spencer Graves
spencer.graves at pdf.com
Mon Nov 21 18:08:55 CET 2005
What do you want to do with these large matrices? Both "scan" and
"read.table" allow you to skip a certain number of lines at the
beginning of a file and process however many lines you want from that
point.
I recently had large files that were too big for S-Plus 6. I moved
to R, and processed them as submatrices without a problem. I typically
use "readLines" to check the format of the first few records and
"count.fields" to determine if all records have the same numbers of
fields. In one case recently, I had a file that was almost but not
quite regular. I processed the file in pieces, carefully examining
records right before and after each change in the number of records, and
recovered basically everything without going back to my client (through
several layers of bureaucracy) to ask for their help in parsing that file.
I frequently use a construct like the following:
File. <- ".....<filename>"
readLines(File., 9)
# to check the format including the "sep" character
quantile(nFlds <- count.fields(File., sep="\t")) #or sep="," for csv
# If the file honestly has a fixed number of fields,
# this will show that.
# If not, either the "sep" character is wrong or the file has problems.
# In either case, this helps me plan what to do next.
hope this helps.
spencer graves
Prof Brian Ripley wrote:
> On Mon, 21 Nov 2005, Uwe Ligges wrote:
>
>
>>Barry Baker wrote:
>>
>>
>>>Hello,
>>>
>>>I am a new R user and have two datasets that I would like to analyze. The
>>>first is (2409222 x 17) and the other is (21682998 x 17). Is this possible
>>>in R? If not then what is the maximum number of rows and columns or number
>>>of elements that R can handle?
>>
>>
>>The number of columns and rows is not a problem here, but you will need
>>21682998 * 17 * 4 bytes to store the latter matrix (assuming floats) in
>>memory, that is 1406.139 Mb.
>
>
> R does not use floats internally. So unless these are integers/logicals
> you are going to need twice that,
>
>
>>In order to do something sensible with the data, you need *at least*
>>twice the amount of RAM, hence at least 3Gb.
>
>
> Here I think the issue is rather virtual memory and address space. You
> will need a 64-bit OS to do anything with this object.
>
--
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA
spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel: 408-938-4420
Fax: 408-280-7915
More information about the R-help
mailing list