[R] Changing the label name in the plot
Subhamitra Patra
@ubh@m|tr@@p@tr@ @end|ng |rom gm@||@com
Wed Jun 12 10:01:59 CEST 2019
Dear Sir,
Thank you very much. The code which you suggested worked like awesome magic
which gave the expected result what I really wanted. For this help, I shall
be always grateful to you. Last time also, your intelligent help, and
encouraging support had solved my problem. I really appreciate your
kindness, patience and cooperative hands to the R beginners.
About the crowding of the points, it will remain somehow because of less
distance between each data point, but the points corresponding to the
labels can be clearly accessible in the zoom mode.
Thank you very much Sir.
[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
06/12/19,
1:21:19 PM
On Wed, Jun 12, 2019 at 12:24 PM Jim Lemon <drjimlemon using gmail.com> wrote:
> Hi Subhamitra,
> I may have a better understanding. I have made up the second set of top
> level domains as I have no idea what they may be. I hope that this is an
> improvement.
>
> spdf<-read.table(text="States1 DMs States2 EMs
> JP 2.071 YA 2.038
> CH 2.0548 YB 2.017
> AT 2.0544 YC 2.007
> CL 2.047 YD 1.963
> ES 2.033 YE 1.947
> PT 2.0327 YF 1.942
> PL 2.0321 YG 1.932
> FR 2.031 YH 1.924
> SE 2.0293 YI 1.913
> DE 2.0291 YJ 1.906
> DK 2.027 YK 1.892
> UK 2.022 YL 1.877
> TW 1.9934 YM 1.869
> NL 1.993 YN 1.849
> HK 1.989 YO 1.848
> LU 1.988 YP 1.836
> CA 1.987 YQ 1.835
> NZ 1.9849 YR 1.819
> US 1.9842 YS 1.798
> AU 1.981 YT 1.771
> MY 1.978 YU 1.762
> HU 1.968 YV 1.717
> LT 1.96 YW 1.707
> SG 1.958 YX 1.688
> FI 1.955 YZ 1.683
> CR 1.953 ZA 1.671
> BY 1.952 ZB 1.664
> IL 1.95 ZC 1.646
> EE 1.948 ZD 1.633
> NO 1.945 ZE 1.624
> IE 1.937 ZF 1.621
> SI 1.913 ZG 1.584
> LV 1.901 ZH 1.487
> SK 1.871 ZI 1.482
> BH 1.801 ZJ 1.23
> SK 1.761 ZK 1.129
> AE 1.751 ZL 1.168
> IS 1.699 ZM 0.941
> BM 1.687 ZN 0.591
> KW 1.668 ZO 0.387
> CY 1.633 ZP 0.16
> AP 1.56 ZQ 0.0002",
> header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(spdf[,2],centers=2,nstart=25)
> k2<-kmeans(spdf[,4],centers=2,nstart=25)
> plot(spdf[,2],col=k1$cluster,pch=19,ylim=c(0,2.1))
> points(spdf[,4],col=k2$cluster+2,pch=19)
> text(1:42+0.5,spdf[,2]+0.05,spdf[,1],col=k1$cluster)
> text(1:42-0.5,spdf[,4]-0.05,spdf[,3],col=k2$cluster+2)
> legend(10,1,c("cluster 1 DM","cluster 2 DM","cluster 1 EM","cluster 2 EM"),
> col=1:4,pch=19)
>
> The crowding of points and labels remains.
>
> Jim
>
> On Wed, Jun 12, 2019 at 2:37 PM Subhamitra Patra <
> subhamitra.patra using gmail.com> wrote:
>
>> Hello Sir,
>>
>> Thank you very much for your help for which I shall be always grateful to
>> you.
>>
>> Concerning your questions,
>> "*1) Are the states in column 3 the same as those in column 1? As
>> you initially named the data frame "ts", perhaps the values in columns 2
>> and are taken at different times. If not, perhaps they are measured in
>> another set of countries, as yet unknown. Perhaps "DMs" and "EMs" are codes
>> that will resolve this.", *
>> I would like to answer that
>> the states in column 3 are not the same as in column 1. The data points
>> in column 2, and 4 are the values measured for different sets of countries.
>> Thus, they are a different set of values for a different set of countries,
>> and will not require one label for both the points in column 2 and 4 (i.e.
>> data columns). In particular, column 2 consists of the 45 data points that
>> measured for 45 countries (state names in column 1) whereas column 4
>> contains the 42 data points that measured for 42 another set of countries
>> (state name mentioned in column 3). I tried both the column 2, and 4
>> separately along with their respective name columns, but unable to do
>> because K-mean test clusters only the numeric data points and is not
>> considering any non-numeric columns (i.e. state names). Thus, I considered
>> both the data points simultaneously, and after removing NAs from the data
>> table, both columns consist of the 42 data points. Hence, the number of
>> observations rather than the states name is coming in the clustered plot.
>> In this case, I stuck with the problem of setting a different label
>> (mentioned in column 1, and 3) for the different data points of column 2,
>> and 4.
>>
>> Hope I successfully answered your question.
>>
>> Thank you.
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 06/12/19,
>> 9:47:04 AM
>>
>> On Wed, Jun 12, 2019 at 6:04 AM Jim Lemon <drjimlemon using gmail.com> wrote:
>>
>>> Hi Subhamitra,
>>> It is time to admit that I had the wrong idea about what you wanted to
>>> do, due to the combination of trying to solve two problems at once
>>> while I was very tired. I appreciate your patience.
>>>
>>> From your last email, you have a data frame with four columns. The
>>> first and third are cryptic names for political states and the second
>>> and fourth are values that I assume are measured in those states.
>>> 1) Are the states in column 3 the same as those in column 1? As you
>>> initially named the data frame "ts", perhaps the values in columns 2
>>> and 4 are taken at different times. If not, perhaps they are measured
>>> in another set of countries, as yet unknown. Perhaps "DMs" and "EMs"
>>> are codes that will resolve this.
>>>
>>> I assumed that "DMs" and "EMs" should be used as the X and Y values on
>>> a scatterplot, as your initial example seemed to indicate. If so, they
>>> are different values for the same country and only require one label
>>> for each point. Proceeding from this, you can do something like this:
>>>
>>> spdf<-read.table(text="State DMs EMs
>>> JP 2.071 2.038
>>> CH 2.0548 2.017
>>> AT 2.0544 2.007
>>> CL 2.047 1.963
>>> ES 2.033 1.947
>>> PT 2.0327 1.942
>>> PL 2.0321 1.932
>>> FR 2.031 1.924
>>> SE 2.0293 1.913
>>> DE 2.0291 1.906
>>> DK 2.027 1.892
>>> UK 2.022 1.877
>>> TW 1.9934 1.869
>>> NL 1.993 1.849
>>> HK 1.989 1.848
>>> LU 1.988 1.836
>>> CA 1.987 1.835
>>> NZ 1.9849 1.819
>>> US 1.9842 1.798
>>> AU 1.981 1.771
>>> MY 1.978 1.762
>>> HU 1.968 1.717
>>> LT 1.96 1.707
>>> SG 1.958 1.688
>>> FI 1.955 1.683
>>> CR 1.953 1.671
>>> BY 1.952 1.664
>>> IL 1.95 1.646
>>> EE 1.948 1.633
>>> NO 1.945 1.624
>>> IE 1.937 1.621
>>> SI 1.913 1.584
>>> LV 1.901 1.487
>>> SK 1.871 1.482
>>> BH 1.801 1.23
>>> SK 1.761 1.129
>>> AE 1.751 1.168
>>> IS 1.699 0.941
>>> BM 1.687 0.591
>>> KW 1.668 0.387
>>> CY 1.633 0.16
>>> AP 1.56 0.0002",
>>> header = TRUE,stringsAsFactors=FALSE)
>>> library(cluster)
>>> k2 <- kmeans(spdf[,c(2,3)], centers = 2, nstart = 25)
>>> plot(spdf[,c(2,3)],col=k2$cluster,pch=19,xlim=c(1.55,2.1))
>>> text(spdf[,2]+rep(c(0.02,-0.02),42),
>>> spdf[,3]+rep(c(-0.05,0.05),42),spdf[,1],col=k2$cluster)
>>> segments(spdf[,2],spdf[,3],spdf[,2]+rep(c(0.02,-0.02),42),
>>> spdf[,3]+rep(c(-0.05,0.05),42),col=k2$cluster)
>>>
>>> I took the liberty of replacing your abbreviations with internet top
>>> level domains. As I hope you can see, you have a problem with crowded
>>> points and labels, even with the trick of spreading the labels out.
>>> You could modify the X and Y offsets by hand and get a much more
>>> readable plot.
>>>
>>> If this is not what you want, a bit more explanation of what you do
>>> want may get you there.
>>>
>>> Jim
>>>
>>
>>
>> --
>> *Best Regards,*
>> *Subhamitra Patra*
>> *Phd. Research Scholar*
>> *Department of Humanities and Social Sciences*
>> *Indian Institute of Technology, Kharagpur*
>> *INDIA*
>>
>
--
*Best Regards,*
*Subhamitra Patra*
*Phd. Research Scholar*
*Department of Humanities and Social Sciences*
*Indian Institute of Technology, Kharagpur*
*INDIA*
[[alternative HTML version deleted]]
More information about the R-help
mailing list