[R] Sorting Text Frames
Murray Jorgensen
maj at waikato.ac.nz
Wed Sep 7 05:45:55 CEST 2005
[Using 2.0.1 under Windows XP]
There are a few pages on the internet that list equivalents of
"thank you" in many languages. I downloaded one from a Google search
and I thought that it would be interesting and a good R exercise to
sort the file into the order of the expressions, rather than the languages.
I tidied up the web page and got it into the format that it was nearly
in: Language Name in columns 1-43, the expression in the remaining
columns.
Then I read it in:
> thanks <- read.fwf("C:\\Files\\Reading\\thankyou.txt", c(43,37))
> thanks[1:4,]
V1 V2
1 Abenaki (Maine USA, Montreal Canada) Wliwni ni
2 Abenaki (Maine USA, Montreal Canada) Wliwni
3 Abenaki (Maine USA, Montreal Canada) Oliwni
4 Achí (Baja Verapaz Guatemala) Mantiox chawe
> dim(thanks)
[1] 1254 2
Now I tried sorting the frame into the order of the second column:
tord <- order(thanks$V2)
sink("C:\\Files\\Reading\\thanks.txt")
thanks[tord[1:74],]
sink()
This gives more or less the expected output, the file thanks.txt beginning
V1
V2
145 Cahuila (United States) '\301cha-ma
862 Paipai (Mexico, USA) 'Ara'ya:ikm
863 Paipai (Mexico, USA) 'Ara'yai:km
864 Paipai (Mexico, USA) 'Ara'ye:km
311 Eyak (Alaska) 'Awa'ahdah
[you may get a bit of wrapping there!]
However I don't really want just 74 lines, I would like the whole file. But
if I get rid of the [1:74] or replace 74 with any larger number I get
output
like this, with no second column:
V1
145 Cahuila (United States)
862 Paipai (Mexico, USA)
863 Paipai (Mexico, USA)
864 Paipai (Mexico, USA)
311 Eyak (Alaska)
Does anyone know what is going on?
Tusen tak in advance, in fact 1254 tak in advance!
Murray Jorgensen
--
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz Fax 7 838 4155
Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862
More information about the R-help
mailing list