[R] porting an access database to sqlite
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Fri Nov 28 08:17:31 CET 2014
This is not really on topic here. At best the R-sig-db mailing list would be on topic, but it seems you may be dealing with issues related to nonstandard sql so you might need to find some additional help in a more Access-specific forum.
One thing you might try is using a specific column name from your table instead of the "*" symbol in the COUNT function... preferably a unique primary key column.
The reason the primary key is important is because sql doesn't assume the date are ordered the way R does... it is only as ordered as the indexes that are used to retrieve it. That may also be a clue for you to use in figuring out how to divide up the data into smaller chunks.
One thing I can say is that allocating chunks of memory and then freeing them can break up your available memory into pieces and make allocating more memory impossible. Unfortunately, troubleshooting such problems is highly dependent on your hardware and software configuration as well as the exact steps you are following. Since you have not attempted to create a reproducible example yet, you might want to do that before posting at R-sig-db to improve your chances of getting help.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On November 27, 2014 9:00:03 AM PST, ravi <rv15i at yahoo.se> wrote:
>Hi,Thanks for the replies. My problem is that the tables (at least, a
>few of them) are too large. I cannot directly export the tables to get
>text files. I tried the following :
>c1<-sqlQuery(ch,"SELECT COUNT(*) FROM table1")
>I got the following error (changed the text to alter table names) :[1]
>"AB000 -2001 [Microsoft][ODBC MicrosoftAccess Driver] ODBC--call
>failed." [2] "[RODBC] ERROR: Could not
>SQLExecDirect'SELECT COUNT(*) FROM Table1'"
>
>But the following works (with about 5 minutes for each step)
>:con<-odbcConnect("TestDB")
>d1<-sqlFetch(ch,"table1",max=1e6,as.is=TRUE)
>d2<-sqlFetchMore(ch,max=1e6,as.is=TRUE)
>
>I could successively retrieve 6 million rows (with 52 columns). After
>that (with the next sqlFetchMore command), I got the following error :
>Error : cannot allocate vector of size 7.6 Mb
>When this happened, I had saved all the data frames obtained with the
>saveRDS (found this to be the easiest method of limiting the size of
>the file on disk) command and then deleted the data frames from
>memory. So, I do not know why the last error turned up.
>Though I am not an expert in databases, I am quite sure that the
>database has not been put together in a sensible way. I am now trying
>to work with my amateurish methods and trying to see what I can.
>
>I would like to know if there is any sql query by which I can continue
>in retrieving data from 6e6 row onwards. That is, instead of starting
>from row 1, can I fetch rows from row number, 6e6+1?How do I know that
>I am near the tail end? There seems to be no simple method for finding
>the size of the table.
>Thanks,Ravi
>>>> ravi <rv15i at yahoo.se> 11/25/14 12:42 PM >>>
>
>
>Hi,All my data is presently locked in a Microsoft access database. This
>has huge data in a number of large tables. Using RODBC and connecting
>to it takes too long a time, sometimes making the system to hang up.
>
>To make things more manageable, I have tried to transfer the data to
>manageable .RData or .csv files. But I am not able to do this with some
>of the larger files. I am currently stuck in the one of the preliminary
>steps. I am not able to find the number of rows in a table. If I know
>this, I can transfer chunks of the tables to a sqlite database.
>I am able to connect to connect to the access data base with :
>library(RODBC)
>con<-odbcConnect("TestDB")d1<-sqlFetch(ch,"table1",max=1e5,as.is=TRUE)d2<-sqlFetchMore(ch,max=1e5,as.is=TRUE)
>d3<-rbind(d1,d2)I wanted to develop this into a loop to get a
>concatenated data frame, which I wanted to save either as a binary
>file, or transfer it to a sqlite data base. I would like to have some
>help on the simplest route to follow. But first, I will proceed in
>describing my immediate problem. In some of the tables, I find that the
>sqlFetchMore returns a value of -1L, meaning that the end has been
>reached. In RODBC, I find no command for getting the row count in a
>table. I have found this in DBI. But I have not been able to figure out
>how I should specify the connection to an acccess database (con in the
>following).
>
>while(!dbHasCompleted(con)){ print(dbGetRowCount(con))}
>I would appreciate help on the following points :1. How can I get the
>row count (and size) of a table in an access data base? With RODBC, DBI
>or any other way.2. I have found that saving the tables as .RData files
>reduces the file size and and reduces the reading time. Is there some
>way of appending to an already saved data frame with this method?3. I
>have come across alternative ways of saving data - using writeBin and
>packages like saves, rhdf5 etc. Would they be useful alternatives?
>4. Is there an advantage in combining binary files and databases like
>sqlite? Or, are files already saved in a binary format in databases
>like sqlite?5. What is the simplest method of porting from the access
>to the sqlite database? With RSQlite and RODBC, can I have connections
>to the access and sqlite databases open at the same time? Or, should I
>close one and then open the other? It would help if I can get a
>detailed bit if code for doing this in a simple way.
>
>I would appreciate all help that I can get.Thanks,Ravi
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>Confidentiality Statement: This email message, including any
>attachments, is for the sole use of the intended recipient(s) and may
>contain confidential and privileged information. Any unauthorized use,
>disclosure or distribution is prohibited. If you are not the intended
>recipient, please contact the sender by reply email and destroy all
>copies of the original message.
More information about the R-help
mailing list