[R] RE : Create sequence for dataset

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Nov 22 02:38:54 CET 2004


ssim at lic.co.nz (Stella) asked
	I want to create a sequence of numbers for the multiple records of
	individual animal in my dataset. The SAS code below will do the trick, but
	I want to learn to do it in R. Can anyone help ?
	
	data ht&ssn;
	set ht&ssn;
	by anml_key;
	if first.anml_key then do;
	seq_ht_rslt=0;
	end;
	seq_ht_rslt+1;
	
Someone was saying how readable SAS data steps were.
I must say that as someone who has written code in more than 160
programming languages I find this _completely_ unreadable.
(Is the initial value for seq_ht_rslt 0 or 1?)
So I'm going to have to guess what was intended.

Suppose you have a data.frame ht_ssn and want to add a sequence number
column for it.  That's easy:

	ht_ssn$seqno <- seq(length = nrow(ht_ssn))

Now suppose that there is an ht_ssn$anml_key column which says which
individual animal each row corresponds to, and many rows may correspond
to the same animal.

	data_sequence_number <- function (data, column = "anml_key") {
	    # Extract the key column.
	    # If it is not already a factor, make it one.
	    # From this factor, extract the level numbers.
	    as.numeric(as.factor(data[[column]]))
	}

	ht_ssn$seq_ht_rslt <- data_sequence_number(ht_ssn)

Probably I have completely misunderstood the question.

One thing which will be different is the actual numeric values.
If I've understood the SAS version, it will assign numbers to keys
in the order in which the keys are encountered, while the R code
above will assign numbers to keys in increasing order of key.  So
if the input contains just "Sammy" then "Jumbo" the SAS version
might assign numbers 1, 2 while the R version would assign 2, 1.

If this really matters, use
	x <- data[[column]]
	as.numeric(as.factor(x, levels = unique(x)))




More information about the R-help mailing list