[Statlist] Next talk: Friday, March 18, 2016 with Jonathan Rosenblatt, Ben Gurion University of the Negev, Israel

Susanne Kaiser-Heinzmann k@|@er @end|ng |rom @t@t@m@th@ethz@ch
Mon Mar 14 11:55:15 CET 2016


Proff. P. Bühlmann - L. Held - T. Hothorn - M. Maathuis -
N. Meinshausen - S. van de Geer - M. Wolf
***********************************************************************************
We are pleased to invite you to the following talk:

Friday, March 18, 2016 at 15.15h  ETH Zurich HG G 19.1

with Jonathan Rosenblatt, Ben Gurion University of the Negev, Israel

***********************************************************************************

Title:

On the Optimality of Averaging in Distributed Statistical Learning

Abstract:

A common approach to statistical learning on big data is to randomly split it among m machines and calculate the parameter of interest by averaging their m individual estimates. Focusing on empirical risk minimization, or equivalently M-estimation, we study the statistical error incurred by this strategy. We consider two asymptotic settings: one where the number of samples per machine n->inf but the number of parameters p is fixed, and a second high-dimensional regime where both p,n-> inf with p/n-> kappa. Most previous works provided only moment bounds on the error incurred by splitting the data in the fixed p setting. In contrast, we present for both regimes asymptotically exact distributions for this estimation error. In the fixed-p setting, under suitable assumptions, we thus prove that to leading order, averaging is as accurate as the centralized solution. In the high-dimensional setting, we show a qualitatively different behavior: data splitting does incur a first order accuracy loss, which we quantify precisely. In addition, our asymptotic distributions allow the construction of confidence intervals and hypothesis testing on the estimated parameters. Our main conclusion is that in both regimes, averaging parallelized estimates is an attractive way to speedup computations and save on memory, while incurring a quantifiable and typically moderate excess error.


This abstract is also to be found under the following link: http://stat.ethz.ch/events/research_seminar

*********************************************************************************************************

Statlist mailing list
Statlist using stat.ch<mailto:Statlist using stat.ch>
https://stat.ethz.ch/mailman/listinfo/statlist




More information about the Statlist mailing list