[R] newbie: new_data_frame <- selected set of rows

Philipp Pagel philipp.pagel.lists at t-online.de
Sat Dec 2 14:16:06 CET 2006


	Hi!

> distances <- order(distancevector(scaled_DB, scaled_DB['query',],
> d="euclid"))

Just compute the distances WITHOUT ordering, here. And then

> 1) create a small top_five frame

top = scaled_DB[rank(distances)<=5, ]

rank() is better for this than order() in case there are ties.

> 2) after I got top_five I woul like to get the index
> of my query entry, something along Pythons 
> top_five.index('query_string')

You mean by row name?

which(row.names(scaled_DB)=='query_string')

But why would you need the index? If you want to get the respective row
use logical indexing:

my_dataframe['query_string', ]

> 3) possibly combine values in distances with row names
> from my_dataframe:
> row_1 distance_from_query1
> row_2 distance_from_query2

The easiest way to store the distances along with the original names and
data would be to simply make distances a column in your data frame,
which is what I would have done to begin with. The entire procedure
would then look like this:

my_dataframe = read.table( ... )
scaled_DB <- scale(my_dataframe, center=FALSE)
scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...)
scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...)
scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...)
...
top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ]
...

cu
	Philipp

-- 
Dr. Philipp Pagel                            Tel.  +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics      Fax.  +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany

 and

Institute for Bioinformatics / MIPS          Tel.  +49-89-3187 3675
GSF - National Research Center               Fax.  +49-89-3187 3585
      for Environment and Health
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel




More information about the R-help mailing list