[R] Reading XML files masquerading as XL files

Dennis Fisher fisher at plessthan.com
Wed Aug 10 16:26:35 CEST 2011


R version 2.13.1
OS X (or Windows)

Colleagues,

I received a number of files with a .xls extension.  These files open in XL and, by all appearances, are XL files.  However, it appears to me that the files are actually XML:

> readLines(dir()[16])[1:10]
 [1] "<?xml version=\"1.0\"?>"                                                    
 [2] "<Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\""           
 [3] " xmlns:o=\"urn:schemas-microsoft-com:office:office\""                       
 [4] " xmlns:x=\"urn:schemas-microsoft-com:office:excel\""                        
 [5] " xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\""                 
 [6] " xmlns:html=\"http://www.w3.org/TR/REC-html40\">"                           
 [7] " <DocumentProperties xmlns=\"urn:schemas-microsoft-com:office:office\">"    
 [8] "  <Version>12.0</Version>"                                                  
 [9] " </DocumentProperties>"                                                     
[10] " <OfficeDocumentSettings xmlns=\"urn:schemas-microsoft-com:office:office\">"

 I had initially tried to read the files using read.xls (gdata) but that failed (not surprisingly).  I could open each Excel file, then "save as" csv, then use read.csv.  However, there are many files so I would love to have a solution that does not require this brute force approach.

Are there any packages that would allow me to read these files without the additional steps?

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com



More information about the R-help mailing list