[R] Reading Chinese Language (GB2312) Input
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Oct 27 09:12:34 CEST 2012
On 26/10/2012 18:25, jgreenb1 wrote:
> I am trying to read a csv file with Chinese language text in it. The file
> should look like this:
>
> userid,jobid,Title,companyid,industryids1
> 82497,1160,互联网产品经理,12
> 96429,658,企划经理(商业公司),24
> 14471,95,产品运营经理,25,6
> 14471,1708,产品营销高级经理,727,2
> 14471,1558,产品总监,611,4
> 14471,1777,产品总监,743,1
> 14471,1697,产品经理,725,234
> 14471,1716,度假产品总监 ,730,234
> 14471,1717,产品经理,730,5
> but when I read the data in using read.csv() it looks like this in the R
> console:
How exactly? Did you use the fileEncoding or encoding argument (see the
help page)?
>
> userid jobid Title companyid industryids1
> 1 82497 1160 »¥ÁªÍø²úÆ·¾Àí 12 NA
> 2 96429 658 Æó»®¾Àí£¨ÉÌÒµ¹«Ë¾£© 24 NA
> 3 14471 95 ²úÆ·ÔËÓª¾Àí 25 6
> 4 14471 1708 ²úÆ·ÓªÏú¸ß¼¶¾Àí 727 2
> 5 14471 1558 ²úÆ·×ܼà 611 4
> 6 14471 1777 ²úÆ·×ܼà 743 1
> 7 14471 1697 ²úÆ·¾Àí 725 234
> 8 14471 1716 ¶È¼Ù²úÆ·×ܼà 730 234
> 9 14471 1717 ²úÆ·¾Àí 730 5
> How can I read this in properly?
Using fileEncoding and encoding arguments.
>
> Session info:
>
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
However, you will most likely not be able to display it in that locale
unless you select non-default faults: see the rw-FAQ.
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> loaded via a namespace (and not attached):
> [1] tools_2.14.1
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Reading-Chinese-Language-GB2312-Input-tp4647581.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list