[R] Sorting Text Frames
Uwe Ligges
ligges at statistik.uni-dortmund.de
Wed Sep 7 08:48:54 CEST 2005
Murray Jorgensen wrote:
> [Using 2.0.1 under Windows XP]
> There are a few pages on the internet that list equivalents of
> "thank you" in many languages. I downloaded one from a Google search
> and I thought that it would be interesting and a good R exercise to
> sort the file into the order of the expressions, rather than the languages.
>
> I tidied up the web page and got it into the format that it was nearly
> in: Language Name in columns 1-43, the expression in the remaining
> columns.
>
> Then I read it in:
>
> > thanks <- read.fwf("C:\\Files\\Reading\\thankyou.txt", c(43,37))
> > thanks[1:4,]
> V1 V2
> 1 Abenaki (Maine USA, Montreal Canada) Wliwni ni
> 2 Abenaki (Maine USA, Montreal Canada) Wliwni
> 3 Abenaki (Maine USA, Montreal Canada) Oliwni
> 4 Achí (Baja Verapaz Guatemala) Mantiox chawe
>
> > dim(thanks)
> [1] 1254 2
>
> Now I tried sorting the frame into the order of the second column:
>
> tord <- order(thanks$V2)
> sink("C:\\Files\\Reading\\thanks.txt")
> thanks[tord[1:74],]
> sink()
>
> This gives more or less the expected output, the file thanks.txt beginning
>
> V1
> V2
> 145 Cahuila (United States) '\301cha-ma
> 862 Paipai (Mexico, USA) 'Ara'ya:ikm
> 863 Paipai (Mexico, USA) 'Ara'yai:km
> 864 Paipai (Mexico, USA) 'Ara'ye:km
> 311 Eyak (Alaska) 'Awa'ahdah
>
> [you may get a bit of wrapping there!]
>
> However I don't really want just 74 lines, I would like the whole file. But
> if I get rid of the [1:74] or replace 74 with any larger number I get
> output
> like this, with no second column:
>
> V1
> 145 Cahuila (United States)
> 862 Paipai (Mexico, USA)
> 863 Paipai (Mexico, USA)
> 864 Paipai (Mexico, USA)
> 311 Eyak (Alaska)
I guess there is just too much space or some special characters in your
variables that cause problems when printing ...
Hence you have to "debug" your data yourself.
Uwe Ligges
> Does anyone know what is going on?
> Tusen tak in advance, in fact 1254 tak in advance!
>
> Murray Jorgensen
More information about the R-help
mailing list