[R] Treatment for Unequal Column Lengths?

Greg Snow Greg.Snow at intermountainmail.org
Thu Jan 11 19:50:27 CET 2007


One of the ways that R (and S-plus) is different from most other stats
packages (all that I can think of) is that it forces you to think about
your data up front.  This is a good thing.  It sounds like you really
have multiple datasets in one file, it is best to read them into R as
separate datasets, not try to force them into 1 dataset (like other
packages do).  If you need to keep the multiple datasets grouped
together you can combine them together in a list.

To read one dataset out of the file you can use read.table (read.csv)
with the colClasses (set other columns to "NULL") and nrows to grab just
the columns and rows from a given dataset.  I would recommend writing a
script to read each of the separate pieces.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gerald Gamoric
> Sent: Thursday, January 11, 2007 9:52 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Treatment for Unequal Column Lengths?
> 
> Fellow R Users:
> 
> I have a .csv dataset that I have brought into R via 
> read.table (and also via read.csv). The dataset has columns 
> that are not equal in length.
> Essentially, this data file has vectors/columns in which I 
> plan to use different analyses on, hence they are unequal in 
> length. Also, the columns are either numeric or calendar 
> dates. Is there a way to prevent R from appending "NA"s to 
> the numeric columns that are not the longest? Is there a way 
> to prevent R from appending blank cells to the columns of 
> dates in the dataset that are not the longest? In other 
> words, I'd like to have R maintain each column's length.
> 
> I am aware that I can use "na.omit" before calling each 
> numeric column in my analysis in order to work with the 
> subset of that column that does not contain the "NA" values. 
> However, the na.omit command does not work when R appends 
> blank cells to my date column lengths. Is there something 
> analogous to "na.omit" that I might be able to use when I am 
> working with a column of dates to ignore the blank cells?
> 
> Further, I am curious as to whether there is an option that 
> one might use when the dataset is read in to R in order to 
> keep all the column lengths as they are. Any ideas/hints 
> would be very much appreciated.
> 
> platform i386-pc-mingw32
> 
> arch i386
> 
> os mingw32
> 
> system i386, mingw32
> 
> status
> 
> major 2
> 
> minor 3.1
> 
> year 2006
> 
> month 06
> 
> day 01
> 
> svn rev 38247
> 
> language R
> 
> Thank you,
> 
> Dave H
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list