[R] navel-gazing
Joshua Wiley
jwiley.psych at gmail.com
Fri Sep 17 23:05:49 CEST 2010
I have been tinkering around with this for a bit, and I am proud to
share navel gazer 1.0.
If no arguments are passed, it will look up the top 50 authors on the
r-help list, for the given month in the given year. You can also
specify one or more months as a character vector (e.g., "August" or
c("August", "September") ). The same goes for years. Thanks to some
help from Henrique (although I promise this was not the only reason I
wanted it), it will not try to pull a month beyond the current one.
You can also choose a different list (such as r-devel). If you have
the time, you can set argument entire = TRUE, which will look up every
month from whatever year(s) you specified (or the current year if you
did not). It will return a named list with each element corresponding
to one month. Also, by default it will create a dotplot in lattice
(though this may be turned off via plot = FALSE). Finally, you can
specify how many authors you want. It defaults to 50.
It is also available at here:
http://gist.github.com/584910
#######################################
navel.gazer <- function(month = NULL, year = NULL, entire = FALSE,
list = "r-help", n = 50, plot = TRUE) {
# Ben Bolker came up with most of the code
# Henrique Dallazuanna provided an edit to the z <- line of code
# Brian Diggs provided capwords() to properly count Peter Dalgaard
# Joshua Wiley adapted all of it to one function
if(is.null(month)) {
month <- format(Sys.Date(), format = "%B")
}
if(isTRUE(entire)) {
month <- unique(months(as.Date(1:365, "2000-01-01")))
}
if(is.null(year)) {
year <- format(Sys.Date(), format = "%Y")
}
if(length(year) > 1) {
tmp <- vector(mode = "list", length = length(year))
for(i in seq_along(year)) {
tmp[[i]] <- paste(year[i], month, sep = "-")
}
times <- unlist(tmp)
} else {
times <- paste(year, month, sep = "-")
}
require(zoo)
times <- sort(as.yearmon(times, "%Y-%B"))
current <- as.yearmon(Sys.Date(), "%Y-%m")
times <- format(times[times <= current], "%Y-%B")
# Function to extract the names
# Originally by Ben Bolker
namefun <- function(x) {
gsub("\\n","",gsub("^.+<I>","",gsub("</I>.+$","",x)))
}
# Based on a suggestion by Brian Diggs
# Capitalizes the first letter of each word
capwords <- function(s, strict = FALSE) {
cap <- function(s) paste(toupper(substring(s,1,1)),
{s <- substring(s,2); if(strict) tolower(s) else s},
sep = "", collapse = " " )
sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
# Collects the author names for the relevant month and list
# from the R archives
# Originally by Ben Bolker
grabber <- function(month, list, n) {
baseurl <- "https://stat.ethz.ch/pipermail/"
require(RCurl)
# z <- getURL(paste(baseurl,list,"/",month,"/author.html",sep=""))
z <- getURL(paste(baseurl,list,"/", month,"/author.html",sep=""),
ssl.verifypeer = FALSE)
zz <- strsplit(z,"<LI>")[[1]]
cnames <- capwords(sapply(zz[3:(length(zz)-1)],namefun))
rr <- rev(sort(table(cnames)))
output <- rr[1:n]
return(output)
}
# Create dot plots of the number of posts
# lattice dotplot() code primarily by Ben Bolker
plotter <- function(dat) {
require(lattice)
if(length(dat) > 1) {
old.par <- par(no.readonly = TRUE)
on.exit(par(old.par))
par("ask" = TRUE)
}
for(i in seq_along(dat)) {
print(dotplot(~rev(dat[[i]]), xlab = "Number of posts",
main = names(dat)[i]))
}
invisible()
}
numbers <- lapply(times, function(x)
{grabber(month = x, list = list, n = n)})
names(numbers) <- times
if(plot) {
plotter(dat = numbers)
}
return(numbers)
}
#######################################
Two examples:
navel.gazer(year = 2009, entire = TRUE)
navel.gazer(month = "September", year = 2007)
This basically works out to be a tribute to David Winsemius:
navel.gazer(n = 1, entire = TRUE)
Hope you get a bit of fun out of this. I certainly enjoyed writing it!
Josh
On Tue, Aug 17, 2010 at 1:10 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> I think that gsub example on help page is more clear:
>
> library(XML)
>
> # could be used the XML package to get the names
> cnames <- gsub('\n', '', head(tail(sapply(getNodeSet(htmlParse(z, asText =
> TRUE), "//i"), xmlValue), -3), -3))
> gsub("(\\w)(\\w*)", "\\U\\1\\L\\2", cnames, perl=TRUE)
>
>
>
> On Tue, Aug 17, 2010 at 4:44 PM, Brian Diggs <diggsb at ohsu.edu> wrote:
>
>> Since Peter Dalgaard is splitting his considerable contributions between
>> "Peter Dalgaard" and "peter dalgaard", I made the following changes (which
>> shouldn't be a problem unless e e cummings becomes a regular poster):
>>
>> # from base::chartr documentation
>> capwords <- function(s, strict = FALSE) {
>> cap <- function(s) paste(toupper(substring(s,1,1)),
>> {s <- substring(s,2); if(strict) tolower(s) else s},
>> sep = "", collapse = " " )
>> sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
>> }
>>
>> cnames <- capwords(sapply(zz[3:(length(zz)-1)],namefun))
>>
>>
>>
>>
>> On 8/17/2010 10:00 AM, Henrique Dallazuanna wrote:
>>
>>> Ben,
>>>
>>> I change the line:
>>>
>>> z<- getURL(paste(baseurl,list,"/", month,"/author.html",sep=""))
>>>
>>> to
>>>
>>> z<- getURL(paste(baseurl,list,"/", month,"/author.html",sep=""),
>>> ssl.verifypeer = FALSE)
>>>
>>> because don't work for me.
>>>
>>> Nice!
>>>
>>> On Tue, Aug 17, 2010 at 1:47 PM, Ben Bolker<bbolker at gmail.com> wrote:
>>>
>>> month<- "2010-August"
>>>> list<- "r-help"
>>>> ##list<- "r-sig-ecology"
>>>> ##list<- "r-sig-mixed-models"
>>>> ## month<- "2010q3"
>>>> n<- 50
>>>> baseurl<- "https://stat.ethz.ch/pipermail/"
>>>> library(RCurl)
>>>> z<- getURL(paste(baseurl,list,"/",month,"/author.html",sep=""))
>>>> zz<- strsplit(z,"<LI>")[[1]]
>>>> namefun<- function(x) {
>>>> gsub("\\n","",gsub("^.+<I>","",gsub("</I>.+$","",x)))
>>>> }
>>>>
>>>> cnames<- sapply(zz[3:(length(zz)-1)],namefun)
>>>> rr<- rev(sort(table(cnames)))
>>>>
>>>>
>>>> library(lattice)
>>>> dotplot(~rev(rr[1:n]),xlab="Number of posts")
>>>>
>>>> dotplot(~rev(rr[1:n]),xlab="Number of posts",
>>>> scales=list(x=list(log=10)))
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>>
>> --
>> Brian Diggs
>> Senior Research Associate, Department of Surgery, Oregon Health & Science
>> University
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list