[R] iconv question: SQL Server 2005 to R

Milan Bouchet-Valat nalimilan at club.fr
Fri Oct 11 11:43:35 CEST 2013


Le jeudi 10 octobre 2013 à 21:45 -0700, Ira Sharenow a écrit :
> Thanks for the suggestion. From R version 3.0.2, I tried
> 
>  
> 
> > testDF7 =  iconv(x = test07 , from = "UCS-2", to = "")
> 
> > Encoding(testDF7)
> 
> [1] "unknown"
> 
>  
> 
> > testDF7[1:6]
> 
> [1] NA NA NA NA NA NA
> 
>  
> 
> So using "UCS-2" produced the same results as before.
> 
>  
> 
> I do not think there are any NA values. I cleaned up the csv file from
> within Excel. Then read it into R
> 
> > sum(is.na(workingDF)) 
> 
> [1] 0
> 
>  
> 
> Also the Excel COUNTBLANK function gave me zero.
In a previous message, Brian told you to use the 'fileEncoding' argument
to read.table(). Please do that.


Regards

> On 10/9/2013 11:33 PM, Prof Brian Ripley wrote:
> 
> > On 09/10/2013 10:37, Milan Bouchet-Valat wrote: 
> > > Le mardi 08 octobre 2013 à 16:02 -0700, Ira Sharenow a écrit : 
> > > > A colleague is sending me quite a few files that have been saved
> > > > with MS 
> > > > SQL Server 2005. I am using R 2.15.1 on Windows 7. 
> > > > 
> > > > I am trying to read in the files using standard techniques.
> > > > Although the 
> > > > file has a csv extension when I go to Excel or WordPad and do
> > > > SAVE AS I 
> > > > see that it is Unicode Text. Notepad indicates that the encoding
> > > > is 
> > > > Unicode. Right now I have to do a few things from within Excel
> > > > (such as 
> > > > Text to Columns) and eventually save as a true csv file before I
> > > > can 
> > > > read it into R and then use it. 
> > > > 
> > > > Is there an easy way to solve this from within R? I am also open
> > > > to easy 
> > > > SQL Server 2005 solutions. 
> > > > 
> > > > I tried the following from within R. 
> > > > 
> > > > testDF = read.table("Info06.csv", header = TRUE, sep = ",") 
> > > > 
> > > > > testDF2 =  iconv(x = testDF, from = "Unicode", to = "") 
> > > > 
> > > > Error in iconv(x = testDF, from = "Unicode", to = "") : 
> > > > 
> > > > unsupported conversion from 'Unicode' to '' in codepage 1252 
> > > > 
> > > > # The next line did not produce an error message 
> > > > 
> > > > > testDF3 =  iconv(x = testDF, from = "UTF-8" , to = "") 
> > > > 
> > > > > testDF3[1:6,  1:3] 
> > > > 
> > > > Error in testDF3[1:6, 1:3] : incorrect number of dimensions 
> > > > 
> > > > # The next line did not produce an error message 
> > > > 
> > > > > testDF4 =  iconv(x = testDF, from = "macroman" , to = "") 
> > > > 
> > > > > testDF4[1:6,  1:3] 
> > > > 
> > > > Error in testDF4[1:6, 1:3] : incorrect number of dimensions 
> > > > 
> > > > >   Encoding(testDF3) 
> > > > 
> > > > [1] "unknown" 
> > > > 
> > > > >   Encoding(testDF4) 
> > > > 
> > > > [1] "unknown" 
> > > > 
> > > > This is the first few lines from WordPad 
> > > > 
> > > > Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2 
> > > > 
> > > > 2006-01-03 
> > > > 00:00:00.000, at Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834 
> > > > 
> > > > 2006-01-03 
> > > > 00:00:00.000, at Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929 
> > > What's the actual problem? You did not state any. Do you get
> > > accentuated 
> > > characters that are not printed correctly after importing the
> > > file? In 
> > > the two lines above it does not look like there would be any
> > > non-ASCII 
> > > characters in this file, so encoding would not matter. 
> > 
> > It is most likely UCS-2.  That has embedded NULs, so the encoding
> > does matter.  All 8-bit encodings extend ASCII: others do not, in
> > general. 
> > 
> > 
>



More information about the R-help mailing list