Alex's Anthology of Algorithms Common Code for Contests in Concise C++
Elementary Algorithms / Streaming and Randomized Algorithms

Maintains the mean and variance of a stream of numbers in one pass using Welford's algorithm. This is more numerically stable than maintaining the sum of values and the sum of squares separately. Each new value shifts the running mean by its deviation divided by the new count, and accumulates the product of its deviations from the old and new means into a running sum of squared deviations, from which both population and sample variance follow directly.

  • OnlineStatistics() constructs an empty summary.
  • add(x) incorporates one more value x into the summary.
  • count() returns the number of values seen so far.
  • mean() returns the current arithmetic mean.
  • variance_population() returns population variance, dividing by $n$.
  • variance_sample() returns sample variance, dividing by $n - 1$.

Implementation

class OnlineStatistics {
  int n;
  double avg, m2;

 public:
  OnlineStatistics() : n(0), avg(0), m2(0) {}

  void add(double x) {
    n++;
    double delta = x - avg;
    avg += delta / n;
    double delta2 = x - avg;
    m2 += delta * delta2;
  }

  int count() const { return n; }
  double mean() const { return avg; }
  double variance_population() const { return n == 0 ? 0 : m2 / n; }
  double variance_sample() const { return n <= 1 ? 0 : m2 / (n - 1); }
};

Example Usage

#include <cassert>
#include <cmath>

bool close(double a, double b) {
  return fabs(a - b) < 1e-9;
}

int main() {
  OnlineStatistics stats;
  for (int x = 1; x <= 5; x++) {
    stats.add(x);
  }
  assert(stats.count() == 5);
  assert(close(stats.mean(), 3.0));
  assert(close(stats.variance_population(), 2.0));
  assert(close(stats.variance_sample(), 2.5));
  return 0;
}