Large scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Regression models with crossed random effect error models can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.
Irene Felicetti: ilf@umich.eduBackfitting for large scale crossed random effects regressions
Art Owen, Ph.D. - Stanford University
September 17, 2020
3:30 pm - 5:00 pm
Online
Contact Information: Irene Felicetti: ilf@umich.edu
Large scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Regression models with crossed random effect error models can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.