[R] How to read this data correctly

Marc Schwartz marc_schwartz at me.com
Fri Jan 24 22:47:30 CET 2014


Hi,

I don't know that it is a problem in R reading the file per se. It is more of an issue, as far as I can see, that read.xls() is not written to deal with some aspects of cell formatting of certain types. In this case, the cell is formatted using a financial format with Japanese Yen. I did not take the time to look through the Perl script included.

The intermediate CSV file that is created by the Perl script that opens and reads the Excel file contains:

"-0.419547704894512"
"-[$¥-411]0.42"

I captured this while running read.xls() under debug(), since the CSV is created as a temp file that is deleted upon function exit. It would seem that the financial cell data is not simply read as a numeric value.

The CSV file will then be directly converted to a data frame in R as is using read.csv() to result in:

> read.xls("Book1.xlsx", 1, header = FALSE)
                  V1
1 -0.419547704894512
2      -[$¥-411]0.42



You may need to use alternative Excel file importing functions, such as XLConnect or similar, that provide more robust functionality. Of course, R itself does not have financial data types, thus there may yet need to be some form of post import data clean up, even with the other options depending upon how they function.

Regards,

Marc Schwartz



On Jan 24, 2014, at 2:49 PM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:

> Hi Rui,
> 
> Thanks for your reply.
> 
> However why you said, 'shouldn't read properly in R'?
> 
> Basically I was looking for some way so that I would get -0.419547704894512
> value in R against cell F4 & F7. Because F7 is linked with F4.
> 
> Ofcourse I can open Excel file then format that cell accordingly. However I
> am looking for some way in R so to avoid any manual process.
> 
> Thanks and regards,
> 
> 
> On Sat, Jan 25, 2014 at 1:21 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> 
>> Hello,
>> 
>> Cell F7 has a formula, =F4, and when I open the file in excel, I get
>> -¥0.42, which shouldn't read properly in R.
>> 
>> The problem seems to be in the file, not in read.xls.
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> Em 24-01-2014 19:22, Christofer Bogaso escreveu:
>> 
>>> Hi again,
>>> 
>>> I need to read below xlsx file correctly (available here:
>>> http://snk.to/f-ch3exae5), and used following code (say, file is saved in
>>> F: drive)
>>> 
>>> 
>>> library(gdata)
>>>> read.xls("f:/Book1.xlsx", 1, header = F)
>>>> 
>>>                   V1
>>> 1 -0.419547704894512
>>> 2     -[$¥-411]0.42
>>> 
>>> 
>>> 
>>> However please notice that, in my original excel file the cells F4 and F7
>>> have essentially the same values. Therefore I should get
>>> -0.419547704894512, for either cases above.
>>> 
>>> Any idea on how to achieve that, without opening the xlsx file manually
>>> and
>>> then formatting the cell before reading it in R?
>>> 
>>> Thanks for your help




More information about the R-help mailing list