We have profiling code that collections durations of methods along with a bunch of other data points, and we store those numbers inside a SummaryStatistics object from commons math to provide the min, max, mean, count etc. However we need to flush this object to disk every hour or so and start collecting again for the next one.

My question is how can we reliably add these values together, so if we have 24 summary statistics objects we can display the summary for the entire day without skewing the data? The objects themselves have the running averages as well as how many items were counted, so is there a utility class that will allow two weighted averages to be combined?

有帮助吗?

解决方案

You can also do this directly, using AggregateSummaryStatistics. See the section titled "Compute statistics for multiple samples and overall statistics concurrently" In the statistics section of the Commons Math User Guide.

其他提示

Since you say you have both the mean and the count, the general formula you want to use is to sum the product of the means by their count and then divide that by the sum of their counts.

E.g., for two SummaryStatistics objects A and B, you would use:

double weightedMean = (A.getMean() * A.getN() + B.getMean() * B.getN()) /
                      (A.getN() + B.getN());

For many of them (e.g., a List of them called `manyStats') you might do something like:

double accum = 0.0;
long n = 0;
for (SummaryStatisics s: manyStats) {
  accum += s.getMean() * s.getN();
  n += s.getN();
}
double weightedMean = accum / n;
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top