[R] bug & paste (continues...)
Ott Toomet
siim at obs.ee
Tue Oct 9 18:04:32 CEST 2001
Hello again,
There are some new facts:
1) if you save image and restart the R, paste behaves normally:
> paste( ce0, m, sep="", collapse="")
[1] "1985<1>9<2>2<3>2<4>1<5>A<6>1<7><8>NA<9>5<0>1999END"
> paste( ce0a, m, sep="", collapse="")
[1] "1985<1>9<2>2<3>2<4>1<5>A<6>1<7><8>NA<9>5<0>1999END"
But making a new subset (a bit shorter this time and not text):
> e2000 <- read.dta( "/home/siim/tyy/andmebaasid/etu0012.dta")
> e0 <- e2000[1,231:239]
> e0
c01b c02 c03 cx c05a01 c05ak01 c05b01 c05bk01 c0601
1 9 2 2 1 A 1 NA 5
(c05b01 should be a string variable, empty string in this case.
C05bk01 should be numerical, NA when empty.)
The problem arises again (m is correspondingly shorter now):
> paste( e0, m, sep="", collapse="")
[1] "9<1>2<2>2<3>1<4>A1<6>NA<8>5END"
> m
[1] "<1>" "<2>" "<3>" "<4>" "<5>" "<6>" "<7>" "<8>" "END"
paste without collapse gives a bit different picture, but that is
not correct either:
> paste( e0, m, sep="")
[1] "9<1>" "2<2>" "2<3>" "1<4>" "A" "1<6>" "" "NA<8>" "5END"
nchar() do not show any hidden chars:
> nchar( e0)
[1] 1 1 1 1 1 1 0 2 1
So, it seems that R somehow remembers that e0 is taken from the big
dataframe, but I do not know how it is possible. The memorisation is
passed in assignation:
> e1 <- e0
> paste( e1, m, sep="")
[1] "9<1>" "2<2>" "2<3>" "1<4>" "A" "1<6>" "" "NA<8>" "5END"
but it vanishes when you save and load data:
> save( e0, file="jama.rd")
> load( "jama.rd")
> paste( e0, m, sep="")
[1] "9<1>" "2<2>" "2<3>" "1<4>" "A<5>" "1<6>" "<7>" "NA<8>" "5END"
If you have any more ideas...
Best wishes,
Ott
P.S I do not know if this is related with the previous problem, but
when I remove the database:
>rm(e2000)
and then look memory:
> gc()
used (Mb) gc trigger (Mb)
Ncells 220608 5.9 741108 19.8
Vcells 88222 0.7 11163343 85.2
Then it shows the memory usage less than 10M. However, operating
system shows R is still using more than 80M.
On Tue, 9 Oct 2001, Ott Toomet wrote:
> Hi,
>
> dput( ce0) gives a correct answer:
> > dput( ce0)
> c("1985", "9", "2", "2", "1", "A", "1", "", "NA", "5", "1999" )
>
> The same does just print( ce0):
> > print( ce0)
> [1] "1985" "9" "2" "2" "1" "A" "1" "" "NA" "5"
> [11] "1999"
>
> However, if I make a new similar vector ce0a:
> > ce0a <- c( 1985,9,2,2,1,"A",1,"",NA,5,1999)
>
> Then the paste works correctly:
> > paste( ce0a, m, sep="", collapse="")
> [1] "1985<1>9<2>2<3>2<4>1<5>A<6>1<7><8>NA<9>5<0>1999END"
>
> I had M as
> > m
> [1] "<1>" "<2>" "<3>" "<4>" "<5>" "<6>" "<7>" "<8>" "<9>" "<0>" "END"
>
> So I have two apparently similar vectors which behave differently with
> paste:
> > paste( ce0a, m, sep="", collapse="")
> [1] "1985<1>9<2>2<3>2<4>1<5>A<6>1<7><8>NA<9>5<0>1999END"
> > paste( ce0, m, sep="", collapse="")
> [1] "1985<1>9<2>2<3>2<4>1<5>A1<7>NA<9>5<0>1999END"
> > ce0a
> [1] "1985" "9" "2" "2" "1" "A" "1" "" "NA" "5"
> [11] "1999"
> > ce0
> [1] "1985" "9" "2" "2" "1" "A" "1" "" "NA" "5"
> [11] "1999"
>
> I suggest there can be some hidden attributes somewhere in ce0 which I have
> not noticed (there seem not to be factors), the problem seems to arise with
> the non-numerical columns (ce0 is just part of one row of the big
> dataframe). Is it possible to figure it out, and possible change? At least
> attributes() do show nothing:
> > attributes(ce0)
> NULL
> > attributes(ce0a)
> NULL
>
> The problem is actully that I cannot transform a stata7 dataset to ASCII, R
> seems to be the only program here which is able to open it, but I have still
> problems with saving.
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list