[Statlist] Next talk: July 2, 2013 with Rajen Shah, University of Cambridge, UK

Susanne Kaiser-Heinzmann k@|@er @end|ng |rom @t@t@m@th@ethz@ch
Mon Jun 24 10:33:19 CEST 2013


_________________________________________________
ETH Zurich and University of Zurich

Organisers:
Proff. P. Bühlmann - L. Held - H.R. Kuensch - M. Maathuis - S. van de Geer - M. Wolf

*************************************************************************************

We are glad to announce the following talk

Tuesday, July 2, 2013 -  15.15h  ETH Zurich HG G 19.1
with Rajen Shah, University of Cambridge, UK

**********************************************************************

Title:
Large-scale regression with sparse data

Abstract:

The “"Big Data"” era in which we are living has brought with it a combination of statistical and computational challenges that often must be met with approaches that draw on developments from both the fields of statistics and computer science. In this talk I will present a method for performing regression where the n by p design matrix may have both n and p in the millions, but where the design matrix is sparse, that is most of its entries are zero; such sparsity is common in many large-scale applications such as text analysis. 
In this setting, performing regression using the original data can be computationally infeasible. Instead, we first map the design matrix to an n by L matrix with L << p, using a modified version of a scheme known as b-bit min-wise hashing in computer science. From a statistical perspective, we study the performance of regression using this compressed data, and give finite sample bounds on the prediction error. Interestingly, despite the loss of information through the compression scheme, we will see that ordinary least squares or ridge regression applied to the reduced data can actually allow us to fit a model containing interactions in the original data. 

This is joint (and ongoing) work with Nicolai Meinshausen.

*******************************************************************************************************
This abstract is also to be found under the following link: http://stat.ethz.ch/events/research_seminar





More information about the Statlist mailing list