[R] graphically representing frequency of words in a speech?

Mike Lawrence Mike.Lawrence at dal.ca
Mon Jun 8 02:00:16 CEST 2009


Below are various attempts using using ggplot2
(http://had.co.nz/ggplot2/). First I try random positioning, then
random positioning with alpha, then a quasi-random position scheme in
polar coordinates:

#this demo has random number generation
# so best to set a seed to make it
# reproducible.
set.seed(1)

#generate some fake data
a = data.frame(
	word = month.name
	, freq = sample(1:10,12,replace=TRUE)
)

#add arbitrary location information
a$x = sample(1:12,12)
a$y = sample(1:12,12)

#load ggplot2
library(ggplot2)

#initialize a ggplot object
my_plot = ggplot()

#create an object for the text layer
my_text = geom_text(
	data = a
	, aes(
		x = x
		, y = y
		, label = word
		, size = freq
	)
)

#create an object for the text size limits
my_size_scale = scale_size(
	to = c(3,20)
)

#create an object to expand the x-axis limits
# (ensures that text isn't cropped)
my_x_scale = scale_x_continuous(
	expand = c(.5, 0)
)

#ditto for the y axis
my_y_scale = scale_y_continuous(
	expand = c(.5, 0)
)

#create an opts object that removes
# plot elements unnecessary in a tag cloud
my_opts = opts(
	legend.position = 'none'
	, panel.grid.minor = theme_blank()
	, panel.grid.major = theme_blank()
	, panel.background = theme_blank()
	, axis.line = theme_blank()
	, axis.text.x = theme_blank()
	, axis.text.y = theme_blank()
	, axis.ticks = theme_blank()
	, axis.title.x = theme_blank()
	, axis.title.y = theme_blank()
)

#show the plot
print(
	my_plot+
	my_text+
	my_size_scale+
	my_x_scale+
	my_y_scale+
	my_opts
)

#to aid readability amidst overlap, set alpha in
# the call to geom_text
my_text_with_alpha = geom_text(
	data = a
	, aes(
		x = x
		, y = y
		, label = word
		, size = freq
	)
	, alpha = .5
)

#show the version with alpha
print(
	my_plot+
	my_text_with_alpha+
	my_size_scale+
	my_x_scale+
	my_y_scale+
	my_opts
)

#alternatively, in polar coordinates,
# which maps x to angle and y to radius,
# making a nice circle
print(
	my_plot+
	my_text_with_alpha+
	my_size_scale+
	my_opts+
	coord_polar()
)
#(note omission of my_y_scale &
# my_x_scale, which seem to be ignored
# when coord_polar() is called. I'll
# report this possible bug to the ggplot2
# maintainer)

#a possible way to avoid overlap is to
# map radius (y) to frequency so that
# larger text is in the periphery
# where there is more room. This
# necessitates adding some random
# noise to the frequency so that
# the low frequency words don't
# jumble in the center too badly
a$freq2 = a$freq+rnorm(12)

#now map radius (y) to freq2
my_text_with_alpha_and_freq2 = geom_text(
	data = a
	, aes(
		x = x
		, y = freq2
		, label = word
		, size = freq
	)
	, alpha = .5
)

#show the version with alpha & radius mapped to freq2
print(
	my_plot+
	my_text_with_alpha_and_freq2+
	my_size_scale+
	my_opts+
	coord_polar()
)

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~




More information about the R-help mailing list