[R] splitting very long character string

Arne.Muller at sanofi-aventis.com Arne.Muller at sanofi-aventis.com
Thu Nov 2 11:24:50 CET 2006


Hello,

thanks a lot for your help on splitting the string to get a numeric vector. I'm now writign the string to a tempfile and read it in via scan - this is fa&st enough for me:

library(XML);

...
tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T);
tmp = xmlValue(tmp[[1]]);
cat(paste('splitting', nchar(tmp), 'string ...\n'));
tmp.file = tempfile();
sink(tmp.file);
cat(tmp);
sink();
tmp = scan(tmp.file);
unlink(tmp.file);
cat(paste('splitting done,', length(tmp), 'elements\n'));

	thanks again
	and kind regards,

	Arne

> -----Original Message-----
> From: john seers (IFR) [mailto:john.seers at bbsrc.ac.uk]
> Sent: Wednesday, November 01, 2006 17:01
> To: Muller, Arne PH/FR; r-help at stat.math.ethz.ch
> Subject: RE: [R] splitting very long character string
> 
> 
> 
> Hi Arne
> 
> If you are reading in from files and they are just one number per line
> it would be more efficient to use scan directly.  ?scan
> 
> For example:
> 
> > filen<-"C:/temp/tt.txt"
> > i<-scan(filen)
> Read 5 items
> > i
> [1]   12345  564376    5674 6356656    5666
> > 
> 
> 
>  
> 
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> Arne.Muller at sanofi-aventis.com
> Sent: 01 November 2006 15:47
> To: r-help at stat.math.ethz.ch
> Subject: [R] splitting very long character string
> 
> 
> Hello,
> 
> I've a very long character array (>500k characters) that need to split
> by '\n' resulting in an array of about 60k numbers. The help 
> on strsplit
> says to use perl=TRUE to get better formance, but still it 
> takes several
> minutes to split this string.
> 
> The massive string is the return value of a call to 
> xmlElementsByTagName
> from the XML library and looks like this:
> 
> ....
> 12345
> 564376
> 5674
> 6356656
> 5666
> ....
> 
> I've to read about a hundred of these files and was wondering whether
> there's a more efficient way to turn this string into an array of
> numerics. Any ideas?
> 
> 	thanks a lot for your help
> 	and kind regards,
> 
> 	Arne
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list