[R] newbie: new_data_frame <- selected set of rows
Philipp Pagel
philipp.pagel.lists at t-online.de
Sat Dec 2 14:16:06 CET 2006
Hi!
> distances <- order(distancevector(scaled_DB, scaled_DB['query',],
> d="euclid"))
Just compute the distances WITHOUT ordering, here. And then
> 1) create a small top_five frame
top = scaled_DB[rank(distances)<=5, ]
rank() is better for this than order() in case there are ties.
> 2) after I got top_five I woul like to get the index
> of my query entry, something along Pythons
> top_five.index('query_string')
You mean by row name?
which(row.names(scaled_DB)=='query_string')
But why would you need the index? If you want to get the respective row
use logical indexing:
my_dataframe['query_string', ]
> 3) possibly combine values in distances with row names
> from my_dataframe:
> row_1 distance_from_query1
> row_2 distance_from_query2
The easiest way to store the distances along with the original names and
data would be to simply make distances a column in your data frame,
which is what I would have done to begin with. The entire procedure
would then look like this:
my_dataframe = read.table( ... )
scaled_DB <- scale(my_dataframe, center=FALSE)
scaled_DB$dist1 = distancevector(scaled_DB, scaled_DB['query1',], ...)
scaled_DB$dist2 = distancevector(scaled_DB, scaled_DB['query2',], ...)
scaled_DB$dist3 = distancevector(scaled_DB, scaled_DB['query3',], ...)
...
top1 = scaled_DB[rank(scaled_DB$dist1)<=5, ]
...
cu
Philipp
--
Dr. Philipp Pagel Tel. +49-8161-71 2131
Dept. of Genome Oriented Bioinformatics Fax. +49-8161-71 2186
Technical University of Munich
Science Center Weihenstephan
85350 Freising, Germany
and
Institute for Bioinformatics / MIPS Tel. +49-89-3187 3675
GSF - National Research Center Fax. +49-89-3187 3585
for Environment and Health
Ingolstädter Landstrasse 1
85764 Neuherberg, Germany
http://mips.gsf.de/staff/pagel
More information about the R-help
mailing list