[R] Writing Excel (.xls) files on non-Windows OSs using Perl
Marc Schwartz
marc_schwartz at comcast.net
Sun Jul 8 22:20:37 CEST 2007
Hi all,
There have been quite a few threads in the recent months pertaining to
the ability to directly write native Excel (.xls) files from R. For
example, exporting R matrices and/or data frames to an Excel file, with
perhaps the ability to create multiple tabs (worksheets) within a single
file, with one tab/sheet per R object.
There exists the xlsReadWrite package on CRAN by Hans-Peter Suter, which
is restricted to Windows, since it utilizes the non-FOSS MS Office API
to write the Excel formats.
I recently had the need, under Linux (FC6/F7) to create an Excel file
containing multiple worksheets, each worksheet containing an 'exported'
data frame from R. While one could export the data frames to delimited
files (ie. using write.table() ) and then open those files from Excel
(or OO.org's Calc), it was rather tedious to do so with a larger number
of R objects. Since I would now have the need to engage in this process
with some level of frequency, the preceding approach would not be time
efficient.
I thus embarked on a mini-project to create a Perl script utilizing
openly available functions from CPAN and then facilitate the calling of
the script directly from R.
I am posting the Perl code here for the benefit of others who may have
similar requirements. Please note that I am providing this 'as is' and
don't have any plans to substantively modify or enhance the code. It
does what I need it to do. Feel free to modify for other needs as may be
required.
The basic calling schema is:
WriteXLS.pl [--CSVpath] [--CSVfiles] ExcelFileName
Where:
CSVpath = Path to the csv files created in R, typically done
using write.table()
CSVfiles = globbed file name specification (ie. *.csv)
ExcelFileName = FULL name of Excel .xls file to create
When the Excel file is created, a new worksheet (tab) will be created
for each CSV file imported. The worksheet name will be the basename (no
path or extension) of the CSV file, up to the first 31 characters, which
is a limitation for Excel worksheet names.
Note of course that Excel has certain (version specific) limitations
with respect to file formats. I list the MS link below for Excel 2007.
Similar specs are available for earlier versions:
http://office.microsoft.com/en-us/excel/HP100738491033.aspx
Finally, note that I use 'Spreadsheet::WriteExcel::Big', as the regular
version of the Perl package has a constraint where the ENTIRE Excel file
cannot be larger than 7 Mb, which was a problem for my application.
Here is the Perl code:
#!/usr/bin/perl -w
use strict;
use Spreadsheet::WriteExcel::Big;
use Getopt::Long;
use File::Glob;
use File::Basename;
use Text::CSV_XS;
# Initialize and get command line arguments
my $CSVPath = '.';
my $CSVFiles = "*.csv";
GetOptions ('CSVpath=s' => \$CSVPath,
'CSVfiles=s' => \$CSVFiles);
my $ExcelFileName = $ARGV[0];
# Create Excel XLS File
print "Creating Excel File: $ExcelFileName\n\n";
my $XLSFile = Spreadsheet::WriteExcel::Big->new($ExcelFileName);
# Glob file path and names
my @FileNames = <$CSVPath/$CSVFiles>;
foreach my $FileName (@FileNames) {
print "Reading: $FileName\n";
# Open CSV File
my $csv = Text::CSV_XS->new();
open (CSVFILE, "$FileName") || die "ERROR: cannot open $FileName. $!\n";
# Create new sheet with filename prefix
# ($base, $dir, $ext) = fileparse ($FileName, '..*');
my $FName = (fileparse ($FileName, '\..*'))[0];
# Only take the first 31 chars, which is the
# limit for a worksheet name
my $SheetName = substr($FName, 0, 31);
print "Creating New WorkSheet: $SheetName\n\n";
my $WorkSheet = $XLSFile->add_worksheet($SheetName);
# Rows and columns are zero indexed
my $Row = 0;
# Write to Sheet
while (<CSVFILE>) {
if ($csv->parse($_)) {
my @Fields = $csv->fields();
my $Col = 0;
foreach my $Fld (@Fields) {
$WorkSheet->write($Row, $Col, $Fld);
$Col++;
}
$Row++;
}
}
close CSVFILE;
}
A 'typical' sequence for the use of the code from within R might be:
# Create a character vector of R objects to be exported
RObjects <- c(VectorOfRObjectNames, ...)
# Now loop through the vector, creating CSV files
# In this case, export to a 'CSVFILES' sub-directory
for (i in RObjects)
{
write.table(get(i), file = paste("CSVFILES/", i, ".csv", sep = ""),
sep = ",", quote = TRUE, na = "", row.names = FALSE)
}
# Now call the Perl script from within R, presuming
# that the script is in the current default directory
system("./WriteXLS.pl --CSVPath CSVFILES RExport.xls")
This process has worked for me, given the current functional
requirements for my project. I hope that this is of some help to others.
Regards,
Marc Schwartz
More information about the R-help
mailing list