[R] Changing the label name in the plot

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Wed Jun 12 10:40:35 CEST 2019


It is good to see someone learning about how the code works, beyond just
getting the answer.

1) The header argument means, "is the first line of the file a header?". In
this case it should be TRUE. the stringsAsFactors argument means "should
character strings be converted to factors" (a combination of strings and
numbers). Unless you really want them, it is better not to convert.
2.1) pch is the Point CHaracter to use in plotting. Number 19 is a filled
circle.
2.2) Yes, I set the value to the range of both sets of values. You could
also do it like this - ylim=range(spdf[,c(2,4)])
2.3) The values of "cluster" in a kmeans object are sequential integers
from 1 to the number of clusters. That means that both k1 and k2 will have
cluster values of 1 and 2. By adding 2 to the cluster values in k2, they
become 3 and 4. Specifying colors by integers uses the values as indices
into the current color palette. The default palette is:
 1 = black
 2 = red
 3 = green
 4 = blue
and so on. If I had not added 2, both k1 and k2 would have the same colors.
3.1) You could have different numbers of points in the two cluster
analyses. Then you would have to add xlim=c(1,45) to the plot command so
that there was room for the second set of observations.
3.2) These values offset the text from the points so that the points
reflect the real values of the data, while the labels are positioned
adjacent to the points.
4.1) 10 and 1 are the X and Y positions of the upper left corner of the
legend in user units.
4.2) The legend explains the relationship of cluster memberships to colors.
pch=19 shows the same points as in the plot.

Jim

On Wed, Jun 12, 2019 at 6:08 PM Subhamitra Patra <subhamitra.patra using gmail.com>
wrote:

> Sir, I have one more request that being a beginner, I would like to learn
> the logic behind your suggested code. Kindly help me to learn also.
>
> *1. "header = TRUE,stringsAsFactors=FALSE)", *here what is the logic
> behind the keeping header true, and strings Factors=FALSE?
>
> 2. "*plot(spdf[,2],col=k1$cluster,pch=19,ylim=c(0,2.1))*
> *     points(spdf[,4],col=k2$cluster+2,pch=19*)", hence
>                                                             2.1) what pch
> indicates? why you set pch values as 19 in both plots?
>                                                             2.2) In y
> lim=c(0,2.1)), you have set the ylim values on the basis of the limits of
> the data, right?
>                                                             2.3) In the
> 2nd line, "col=k2$cluster+2", what is the logic behind the adding of +2?
> I think it indicates the adding of 2nd
>                                                         figure in the same
> plot, right?
>
> 3. "*text(1:42+0.5,spdf[,2]+0.05,spdf[,1],col=k1$cluster)*
> *     text(1:42-0.5,spdf[,4]-0.05,spdf[,3],col=k2$cluster+2)*",  hence,
>
>      3.1) 1:42 means, from 1 to 42 data points, right? So, I think there is
> no need for removing the extra 3 data points in column 2
>                                                                (i.e. in DMs
> columns) because hence k2 is being calculated separately. So, I can set
> differently as 1:45 for column 2 right?
>
>      3.2) What is the logic behind the addition, and subtraction of 0.5,
> and 0.05 in the 1st, and 2nd lines respectively?
>
> 4.
> *"legend(10,1,c("cluster 1 DM","cluster 2 DM","cluster 1 EM","cluster 2
> EM"),*
> *      col=1:4,pch=19)", *hence,
>                                             4.1. legend (10, 1) is for the
> setting of distance of the values in the X-axis, right?
>                                             4.2. The entire legend
> function is for the setting of the overall plot, right?
>                                             4.3. What pch=19 indicates in
> the legend function?
>
>
> Like before, kindly help me to learn this coding, Sir.
>
> Thank you very much for your kind, and encouraging support to the
> R-beginners.
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 06/12/19,
> 1:33:30 PM
>
> On Wed, Jun 12, 2019 at 1:31 PM Subhamitra Patra <
> subhamitra.patra using gmail.com> wrote:
>
>> Dear Sir,
>>
>> Thank you very much. The code which you suggested worked like awesome
>> magic which gave the expected result what I really wanted. For this help, I
>> shall be always grateful to you. Last time also, your intelligent help, and
>> encouraging support had solved my problem. I really appreciate your
>> kindness, patience and cooperative hands to the R beginners.
>>
>> About the crowding of the points, it will remain somehow because of less
>> distance between each data point, but the points corresponding to the
>> labels can be clearly accessible in the zoom mode.
>>
>> Thank you very much Sir.
>>
>>
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 06/12/19,
>> 1:21:19 PM
>>
>> On Wed, Jun 12, 2019 at 12:24 PM Jim Lemon <drjimlemon using gmail.com> wrote:
>>
>>> Hi Subhamitra,
>>> I may have a better understanding. I have made up the second set of top
>>> level domains as I have no idea what they may be. I hope that this is an
>>> improvement.
>>>
>>> spdf<-read.table(text="States1 DMs States2 EMs
>>> JP 2.071 YA 2.038
>>> CH 2.0548 YB 2.017
>>> AT 2.0544 YC 2.007
>>> CL 2.047 YD 1.963
>>> ES 2.033 YE 1.947
>>> PT 2.0327 YF 1.942
>>> PL 2.0321 YG 1.932
>>> FR 2.031 YH 1.924
>>> SE 2.0293 YI 1.913
>>> DE 2.0291 YJ 1.906
>>> DK 2.027 YK 1.892
>>> UK 2.022 YL 1.877
>>> TW 1.9934 YM 1.869
>>> NL 1.993 YN 1.849
>>> HK 1.989 YO 1.848
>>> LU 1.988 YP 1.836
>>> CA 1.987 YQ 1.835
>>> NZ 1.9849 YR 1.819
>>> US 1.9842 YS 1.798
>>> AU 1.981 YT 1.771
>>> MY 1.978 YU 1.762
>>> HU 1.968 YV 1.717
>>> LT 1.96 YW 1.707
>>> SG 1.958 YX 1.688
>>> FI 1.955 YZ 1.683
>>> CR 1.953 ZA 1.671
>>> BY 1.952 ZB 1.664
>>> IL 1.95 ZC 1.646
>>> EE 1.948 ZD 1.633
>>> NO 1.945 ZE 1.624
>>> IE 1.937 ZF 1.621
>>> SI 1.913 ZG 1.584
>>> LV 1.901 ZH 1.487
>>> SK 1.871 ZI 1.482
>>> BH 1.801 ZJ 1.23
>>> SK 1.761 ZK 1.129
>>> AE 1.751 ZL 1.168
>>> IS 1.699 ZM 0.941
>>> BM 1.687 ZN 0.591
>>> KW 1.668 ZO 0.387
>>> CY 1.633 ZP 0.16
>>> AP 1.56 ZQ 0.0002",
>>> header = TRUE,stringsAsFactors=FALSE)
>>> library(cluster)
>>> k1<-kmeans(spdf[,2],centers=2,nstart=25)
>>> k2<-kmeans(spdf[,4],centers=2,nstart=25)
>>> plot(spdf[,2],col=k1$cluster,pch=19,ylim=c(0,2.1))
>>> points(spdf[,4],col=k2$cluster+2,pch=19)
>>> text(1:42+0.5,spdf[,2]+0.05,spdf[,1],col=k1$cluster)
>>> text(1:42-0.5,spdf[,4]-0.05,spdf[,3],col=k2$cluster+2)
>>> legend(10,1,c("cluster 1 DM","cluster 2 DM","cluster 1 EM","cluster 2
>>> EM"),
>>>  col=1:4,pch=19)
>>>
>>> The crowding of points and labels remains.
>>>
>>> Jim
>>>
>>> On Wed, Jun 12, 2019 at 2:37 PM Subhamitra Patra <
>>> subhamitra.patra using gmail.com> wrote:
>>>
>>>> Hello Sir,
>>>>
>>>> Thank you very much for your help for which I shall be always grateful
>>>> to you.
>>>>
>>>> Concerning your questions,
>>>> "*1) Are the states in column 3 the same as those in column 1? As
>>>> you initially named the data frame "ts", perhaps the values in columns 2
>>>> and are taken at different times. If not, perhaps they are measured in
>>>> another set of countries, as yet unknown. Perhaps "DMs" and "EMs" are codes
>>>> that will resolve this.", *
>>>> I would like to answer that
>>>> the states in column 3 are not the same as in column 1. The data points
>>>> in column 2, and 4 are the values measured for different sets of countries.
>>>> Thus, they are a different set of values for a different set of countries,
>>>> and will not require one label for both the points in column 2 and 4 (i.e.
>>>> data columns). In particular, column 2 consists of the 45 data points that
>>>> measured for 45 countries (state names in column 1) whereas column 4
>>>> contains the 42 data points that measured for 42 another set of countries
>>>> (state name mentioned in column 3).  I tried both the column 2, and 4
>>>> separately along with their respective name columns, but unable to do
>>>> because K-mean test clusters only the numeric data points and is not
>>>> considering any non-numeric columns (i.e. state names). Thus, I considered
>>>> both the data points simultaneously, and after removing NAs from the data
>>>> table, both columns consist of the 42 data points. Hence, the number of
>>>> observations rather than the states name is coming in the clustered plot.
>>>> In this case, I stuck with the problem of setting a different label
>>>> (mentioned in column 1, and 3) for the different data points of column 2,
>>>> and 4.
>>>>
>>>> Hope I successfully answered your question.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>> [image: Mailtrack]
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Sender
>>>> notified by
>>>> Mailtrack
>>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 06/12/19,
>>>> 9:47:04 AM
>>>>
>>>> On Wed, Jun 12, 2019 at 6:04 AM Jim Lemon <drjimlemon using gmail.com> wrote:
>>>>
>>>>> Hi Subhamitra,
>>>>> It is time to admit that I had the wrong idea about what you wanted to
>>>>> do, due to the combination of trying to solve two problems at once
>>>>> while I was very tired. I appreciate your patience.
>>>>>
>>>>> From your last email, you have a data frame with four columns. The
>>>>> first and third are cryptic names for political states and the second
>>>>> and fourth are values that I assume are measured in those states.
>>>>> 1) Are the states in column 3 the same as those in column 1? As you
>>>>> initially named the data frame "ts", perhaps the values in columns 2
>>>>> and 4 are taken at different times. If not, perhaps they are measured
>>>>> in another set of countries, as yet unknown. Perhaps "DMs" and "EMs"
>>>>> are codes that will resolve this.
>>>>>
>>>>> I assumed that "DMs" and "EMs" should be used as the X and Y values on
>>>>> a scatterplot, as your initial example seemed to indicate. If so, they
>>>>> are different values for the same country and only require one label
>>>>> for each point. Proceeding from this, you can do something like this:
>>>>>
>>>>> spdf<-read.table(text="State DMs EMs
>>>>> JP 2.071 2.038
>>>>> CH 2.0548 2.017
>>>>> AT 2.0544 2.007
>>>>> CL 2.047 1.963
>>>>> ES 2.033 1.947
>>>>> PT 2.0327 1.942
>>>>> PL 2.0321 1.932
>>>>> FR 2.031 1.924
>>>>> SE 2.0293 1.913
>>>>> DE 2.0291 1.906
>>>>> DK 2.027 1.892
>>>>> UK 2.022 1.877
>>>>> TW 1.9934 1.869
>>>>> NL 1.993 1.849
>>>>> HK 1.989 1.848
>>>>> LU 1.988 1.836
>>>>> CA 1.987 1.835
>>>>> NZ 1.9849 1.819
>>>>> US 1.9842 1.798
>>>>> AU 1.981 1.771
>>>>> MY 1.978 1.762
>>>>> HU 1.968 1.717
>>>>> LT 1.96 1.707
>>>>> SG 1.958 1.688
>>>>> FI 1.955 1.683
>>>>> CR 1.953 1.671
>>>>> BY 1.952 1.664
>>>>> IL 1.95 1.646
>>>>> EE 1.948 1.633
>>>>> NO 1.945 1.624
>>>>> IE 1.937 1.621
>>>>> SI 1.913 1.584
>>>>> LV 1.901 1.487
>>>>> SK 1.871 1.482
>>>>> BH 1.801 1.23
>>>>> SK 1.761 1.129
>>>>> AE 1.751 1.168
>>>>> IS 1.699 0.941
>>>>> BM 1.687 0.591
>>>>> KW 1.668 0.387
>>>>> CY 1.633 0.16
>>>>> AP 1.56 0.0002",
>>>>> header = TRUE,stringsAsFactors=FALSE)
>>>>> library(cluster)
>>>>> k2 <- kmeans(spdf[,c(2,3)], centers = 2, nstart = 25)
>>>>> plot(spdf[,c(2,3)],col=k2$cluster,pch=19,xlim=c(1.55,2.1))
>>>>> text(spdf[,2]+rep(c(0.02,-0.02),42),
>>>>>  spdf[,3]+rep(c(-0.05,0.05),42),spdf[,1],col=k2$cluster)
>>>>> segments(spdf[,2],spdf[,3],spdf[,2]+rep(c(0.02,-0.02),42),
>>>>>  spdf[,3]+rep(c(-0.05,0.05),42),col=k2$cluster)
>>>>>
>>>>> I took the liberty of replacing your abbreviations with internet top
>>>>> level domains. As I hope you can see, you have a problem with crowded
>>>>> points and labels, even with the trick of spreading the labels out.
>>>>> You could modify the X and Y offsets by hand and get a much more
>>>>> readable plot.
>>>>>
>>>>> If this is not what you want, a bit more explanation of what you do
>>>>> want may get you there.
>>>>>
>>>>> Jim
>>>>>
>>>>
>>>>
>>>> --
>>>> *Best Regards,*
>>>> *Subhamitra Patra*
>>>> *Phd. Research Scholar*
>>>> *Department of Humanities and Social Sciences*
>>>> *Indian Institute of Technology, Kharagpur*
>>>> *INDIA*
>>>>
>>>
>>
>> --
>> *Best Regards,*
>> *Subhamitra Patra*
>> *Phd. Research Scholar*
>> *Department of Humanities and Social Sciences*
>> *Indian Institute of Technology, Kharagpur*
>> *INDIA*
>>
>
>
> --
> *Best Regards,*
> *Subhamitra Patra*
> *Phd. Research Scholar*
> *Department of Humanities and Social Sciences*
> *Indian Institute of Technology, Kharagpur*
> *INDIA*
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list