Seminar: "Subset selection for big data regression: an improved approach"


Vasilis Chasiotis (Department of Statistics, AUEB, GR)

Subset selection for big data regression: an improved approach


In the big data era researchers face a series of problems. Such big data occur in several cases. Even standard approaches/methodologies like linear regression can be difficult or problematic with huge volumes of data. For example, traditional approaches for regression in big datasets may suffer due to the large sample size, since they involve inverting huge data matrices or even because the data cannot fit to the memory. Among others, a simple approach may be based on selecting subdata to run the regression. Some approaches for big data regression, already existing in the current literature, are based on selecting data points using information criteria, providing algorithms as well. Some of these approaches are based on the combinatorial properties of an orthogonal array. In the present paper we wish to improve the algorithms proposed in these approaches. We describe an approach, providing a new algorithm whose gain is shown through simulation experiments and analysis of real data. A discussion about the parameters of the proposed algorithm is also provided in order to clarify the trade-offs between execution time and information gain.

Room T102, AUEB New Building, 2 Troias str.

(A pdf of the presentation can be found here)

Ημερομηνία Εκδήλωσης: 
Friday, February 25, 2022 - 13:00