[R] Query about data manipulation

Marc Schwartz marc_schwartz at comcast.net
Fri Mar 2 00:03:42 CET 2007


On Thu, 2007-03-01 at 12:35 -0800, lalitha viswanath wrote:
> Hi
> Thanks much for that input.It was extremely helpful.
> 
> I am seeking some input about another stumbling block
> using RODBC; SQLQuery et al with large BLOB values.
> 
> Although the following query 
> dataFromDB <- sqlQuery(channel, "select
> uncompress(columnName) from tableName where Id=id ");
> returns just one row , dataFromDB[1,1] actually
> contains 4000+ rows of the form 
> field1 \t field2 \t value\n.... described earlier.
> (4000+ rows compressed as one long string)
> 
> On printing dataFromDB[1,1], it does not print beyond
> 3600 such rows or so (printing in fact "field1 \t
> field2 \t value \n.....field3600 \t field3601"),
> abruptly missing the rest of the result. 
> 
> Hence it throws an error when I try to use read.table
> (after using textConnection as suggested) that row xyz
> does not contain 3 values,etc.
> 
> It seems to be missing 1/4th of the actual result that
> should contain 4000+ such pairs.
> 
> The set of 4000+ rows occupy just 100KB if written out
> to a file directly from MySQL.
> Is there anyway to increase the capacity of the return
> result in R so that it does not get thrown off as
> above and retrieves the ENTIRE result?
> 
> I tried increasing buffsize, but as I understand,
> since SqlQuery itself returns just one row in this
> case, it  is possibly not very relevant here?
> 
> Note that the above mentioned problem does not arise
> when the data returned from SQL query contains less
> than 3500 such concatenated entries.
> 
> Your input is greatly appreciated.
> Thanks
> Lalitha


Lalitha,

Post the results of the following, when reading the blob where there is
4,000+ rows:

  str(dataFromDB)

  nchar(dataFromDB[1,1])


That might help us ascertain what is causing the problem relative to the
size of the objects.

RODBC has a field length limit of 64k, so if the blob is larger than
that, which it seems may be the case, this would be the root problem and
would result in your data frame column being truncated at that point
during the SQL query.

I don't know if the RMySQL package has larger limits, but if so, you may
have to go that route for a solution.

HTH,

Marc Schwartz



More information about the R-help mailing list