[R] Clustering problem
Abhishek Pratap
abhishek.vit at gmail.com
Mon Mar 21 18:48:11 CET 2011
Hi Guys
I want to apply a clustering algo to my dataset in order to find the
regions points(X,Y) which have similar values(percent_GC and
mean_phred_quality). Details below.
I have sampled 1% of points from my main data set of 85 million
points. The result is still somewhat large 800K points and looks
like following.
X Y percent_GC mean_phred_quality
1 4286 930 0.50 0.13
2 4825 947 0.50 20.33
3 8207 932 0.32 26.50
4 8451 940 0.48 24.81
5 9331 931 0.38 16.93
6 11501 949 0.49 31.28
What I want to do is find local regions in which I have associations
between these 4 values i.e points X,Y have close correlation with
percent_GC and mean_phred_quality.
PS: I did calculate the overall pearson correlation coeff between
percent_GC and mean_phred_quality and it is not statistically
significant which got me interested into finding local regions where
it may be.
I would really appreciate your help as I am still a rookie in applying
clustering algorithms.
Thanks!
-Abhi
More information about the R-help
mailing list