[R] strange strsplit gsub problem 0 is this a bug or a string length limitation?
Marc Schwartz
marc_schwartz at me.com
Fri Jul 10 14:58:28 CEST 2009
On Jul 10, 2009, at 7:18 AM, tradenet wrote:
>
> I was working with the rmetrics portfolioBacktesting function and
> dug into
> the code to try to find why my formula with 113 items, i.e. A1 thru
> A113,
> was being truncated and I only get 85 items, not 113.
>
> Is it due to a string length limitation in R or is it a bug in the
> strsplit
> or gsub functions, or in my string?
>
> I'd very much appreciate any suggestions
>
>
> ============Input script:
>
> backtestFormula<-
> SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113
> benchmarkName = as.character(backtestFormula)[2]
> print(as.character(backtestFormula)[3])
> print(benchmarkName)
> assetsNames <- strsplit(gsub(" ", "",
> as.character(backtestFormula)[3]),
> "\\+")[[1]]
> nAssets = length(assetsNames)
> print(nAssets)
> list(assetsNames)
>
> ===============output:
>
>
>> backtestFormula<-
>> SPX~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17+A18+A19+A20+A21+A22+A23+A24+A25+A26+A27+A28+A29+A30+A31+A32+A33+A34+A35+A36+A37+A38+A39+A40+A41+A42+A43+A44+A45+A46+A47+A48+A49+A50+A51+A52+A53+A54+A55+A56+A57+A58+A59+A60+A61+A62+A63+A64+A65+A66+A67+A68+A69+A70+A71+A72+A73+A74+A75+A76+A77+A78+A79+A80+A81+A82+A83+A84+A85+A86+A87+A88+A89+A90+A91+A92+A93+A94+A95+A96+A97+A98+A99+A100+A101+A102+A103+A104+A105+A106+A107+A108+A109+A110+A111+A112+A113
>
>> benchmarkName = as.character(backtestFormula)[2]
>
>> print(benchmarkName)
> [1] "SPX"
>
>> print(as.character(backtestFormula)[3])
> [1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 +
> A13 +
> A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24 +
> A25 + A26
> + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 + A36 + A37 +
> A38 +
> A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47 + A48 + A49 +
> A50 + A51
> + A52 + A53 + A54 + A55 + A56 + A57 + A58 + A59 + A60 + A61 + A62 +
> A63 +
> A64 + A65 + A66 + A67 + A68 + A69 + A70 + A71 + A72 + A73 + A74 +
> A75 + A76
> + A77 + A78 + A79 + A80 + A81 + A82 + A83 + A84 + A85 + "
>
>> assetsNames <- strsplit(gsub(" ", "", as.character(backtestFormula)
>> [3]),
>> "\\+")[[1]]
>
>> print(nAssets)
> [1] 85
>
>> nAssets = length(assetsNames)
>
>> print(nAssets)
> [1] 85
>
>> list(assetsNames)
> [[1]]
> [1] "A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" "A9" "A10"
> "A11" "A12"
> "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20" "A21" "A22" "A23"
> "A24"
> "A25" "A26" "A27" "A28" "A29" "A30" "A31" "A32" "A33"
> [34] "A34" "A35" "A36" "A37" "A38" "A39" "A40" "A41" "A42" "A43"
> "A44" "A45"
> "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54" "A55" "A56"
> "A57"
> "A58" "A59" "A60" "A61" "A62" "A63" "A64" "A65" "A66"
> [67] "A67" "A68" "A69" "A70" "A71" "A72" "A73" "A74" "A75" "A76"
> "A77" "A78"
> "A79" "A80" "A81" "A82" "A83" "A84" "A85"
You appear to be bumping up against the 500 character length limit of
as.character() when used with R language objects.
Review the Note in ?as.character:
"as.character truncates components of language objects to 500
characters (was about 70 before 1.3.1)."
It is not a string length limitation or a bug in strsplit():
> paste("A", 1:113, sep = "", collapse = " + ")
[1] "A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 +
A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A21 + A22 + A23 + A24
+ A25 + A26 + A27 + A28 + A29 + A30 + A31 + A32 + A33 + A34 + A35 +
A36 + A37 + A38 + A39 + A40 + A41 + A42 + A43 + A44 + A45 + A46 + A47
+ A48 + A49 + A50 + A51 + A52 + A53 + A54 + A55 + A56 + A57 + A58 +
A59 + A60 + A61 + A62 + A63 + A64 + A65 + A66 + A67 + A68 + A69 + A70
+ A71 + A72 + A73 + A74 + A75 + A76 + A77 + A78 + A79 + A80 + A81 +
A82 + A83 + A84 + A85 + A86 + A87 + A88 + A89 + A90 + A91 + A92 + A93
+ A94 + A95 + A96 + A97 + A98 + A99 + A100 + A101 + A102 + A103 + A104
+ A105 + A106 + A107 + A108 + A109 + A110 + A111 + A112 + A113"
> nchar(paste("A", 1:113, sep = "", collapse = " + "))
[1] 680
> strsplit(paste("A", 1:113, sep = "", collapse = " + "), " \\+ ")[[1]]
[1] "A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" "A9"
[10] "A10" "A11" "A12" "A13" "A14" "A15" "A16" "A17" "A18"
[19] "A19" "A20" "A21" "A22" "A23" "A24" "A25" "A26" "A27"
[28] "A28" "A29" "A30" "A31" "A32" "A33" "A34" "A35" "A36"
[37] "A37" "A38" "A39" "A40" "A41" "A42" "A43" "A44" "A45"
[46] "A46" "A47" "A48" "A49" "A50" "A51" "A52" "A53" "A54"
[55] "A55" "A56" "A57" "A58" "A59" "A60" "A61" "A62" "A63"
[64] "A64" "A65" "A66" "A67" "A68" "A69" "A70" "A71" "A72"
[73] "A73" "A74" "A75" "A76" "A77" "A78" "A79" "A80" "A81"
[82] "A82" "A83" "A84" "A85" "A86" "A87" "A88" "A89" "A90"
[91] "A91" "A92" "A93" "A94" "A95" "A96" "A97" "A98" "A99"
[100] "A100" "A101" "A102" "A103" "A104" "A105" "A106" "A107" "A108"
[109] "A109" "A110" "A111" "A112" "A113"
HTH,
Marc Schwartz
More information about the R-help
mailing list