[R] newbie: new_data_frame <- selected set of rows
Darek Kedra
darked90 at yahoo.com
Thu Nov 30 23:23:38 CET 2006
Hello,
this is probably trivial but I failed to find this
particular snippet of code.
What I got:
my_dataframe (contains say a 40k rows and 4 columns)
distances (vector with euclidean distances between a
query vector and each of the rows of my_dataframe)
What I do:
after scaling data my_dataframe I calculate distances.
order them then extract top five hits
my_dataframe <- read.table("myDB.csv", header=F,
dec=".", sep=";",
row.names=1)
#reads the whole file
scaled_DB <- scale(my_dataframe, center=FALSE)
#scales the values
require(hopach)
#checks necessary R package
distances <- order(distancevector(scaled_DB,
scaled_DB['query',], d="euclid"))
#calculates distances and orders the results from
lowest
for(i in distances[1:5]) print( dbfile[i,])
#prints top five hits just for debugging
What I want to do:
1) create a small top_five frame
sadly this does not work:
for(i in distances[1:5]) top_five[i,] <-
my_dataframe[i,]
2) after I got top_five I woul like to get the index
of my query entry, something along Pythons
top_five.index('query_string')
3) possibly combine values in distances with row names
from my_dataframe:
row_1 distance_from_query1
row_2 distance_from_query2
Thank you very much for your help
Darek Kedra
More information about the R-help
mailing list