[R] analyzing results from Tuesday's US elections

Spencer Graves @pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Sun Nov 8 09:24:52 CET 2020



On 2020-11-07 23:39, Abby Spurdle wrote:
>> What can you tell me about plans to analyze data from this year's
>> general election, especially to detect possible fraud?
> 
> I was wondering if there's any R packages with out-of-the-box
> functions for this sort of thing.
> Can you please let us know, if you find any.
> 
>> I might be able to help with such an effort.  I have NOT done
>> much with election data, but I have developed tools for data analysis,
>> including web scraping, and included them in R packages available on the
>> Comprehensive R Archive Network (CRAN) and GitHub.[1]
> 
> Do you have a URL for detailed election results?
> Or even better, a nice R-friendly CSV file...
> 
> I recognize that the results aren't complete.
> And that such a file may need to be updated later.
> But that doesn't necessarily prevent modelling now.


	  I asked, because I don't know of any such.  With the increasingly 
vicious, widespread and systematic attacks on the integrity of elections 
in the US, I think it would be good to have a central database of 
election results with tools regularly scraping websites of local and 
state election authorities.  Whenever new data were posted, the software 
would update the central repository and send emails to anyone 
interested.  That could simplify data acquisition, because historical 
data could already be available there.  And it would be one standard 
format for the entire US and maybe the world.


	  This could be extremely valuable in exposing electoral fraud, thereby 
reducing its magnitude and effectiveness.  This is a global problem, but 
it seems to have gotten dramatically worse in the US in recent years.[2]


	  I'd like to join -- or organize -- a team of people working on this. 
If we can create the database and data analysis tools in a package like 
Ecfun on CRAN, I think we can interest college profs, especially those 
teaching statistics to political science students, who would love to 
involve their students in something like this.  They could access data 
real time in classes, analyze it using standard tools that we could 
develop, and involve their students in discussing what it means and what 
it doesn't.  They could discuss Bayesian sequential updating and quality 
control concepts using data that are real and relevant to the lives of 
their students.  It could help get students excited about both 
statistics and elections.


	  Such a project may already exist.  I know there are projects at some 
major universities that sound like they might support this.  However 
with the limited time I've invested in this so far, I didn't find any 
that seemed to provide easy access to such data and an easy way to join 
such a project.  Ballotpedia has such data but don't want help in 
analyzing it and asked for a few hundred dollars for data for one 
election cycle in Missouri, which is what I requested.  I can get that 
for free from the web site of the Missouri Secretary of State.


	  I thought I might next ask the Carter Center about this.  However, 
but I'm totally consumed with other priorities right now.  I don't plan 
to do anything on this in the short term -- unless I can find 
collaborators.


	  If such a central database doesn't exist -- and maybe even if it does 
-- I thought it might be good to make all the data available in a 
standard format in Wikidata, which is a project of the Wikimedia 
Foundation, which is also the parent organization of Wikipedia.  Then I 
could help create software and documentation on how to scrape data from 
the web sites of different election organizations that have it and 
automatically update Wikidata while also sending emails to people who 
express interest in those election results.  Then we could create 
software for analyzing such data and make that available, e.g., on 
Wikiversity, which is another project of the Wikimedia Foundation -- 
with the R code in Ecfun or some other CRAN package.


	  If we start now, I think we could have something mediocre in time for 
various local elections that occur next year with improvements for the 
2022 US Congressional elections and something even better for the 2024 
US presidential elections.


	  Thanks for asking.
	  Spencer Graves


[1]
https://github.com/sbgraves237


[2]
https://en.wikiversity.org/wiki/Electoral_integrity_in_the_United_States



More information about the R-help mailing list