how to deal with floating point error seen in an aggregation

https://stackoverflow.com/questions/17928238

04-06-2022
|

Pergunta

I'm working with a system that operates in two modes over a large set of numbers. In the first mode, the system aggregates the numbers by simply iterating over them and adding them to a running total. In the second mode, the system parallel processes this work, the result of which is that the same total is calculated by first creating a number of subtotals, and then adding those subtotals together.

This is a simplification of the system, but it's the heart of my problem. Because of the subtotal aggregation, the end result can sometimes exhibit very small differences. I can only guess that this is because of an accumulation of floating point errors.

These two modes exist because it is not possible to run all "calculations" in this system using the parallel mode, and thus the linear mode must continue to exist for backwards compatibility. Using the second mode when possible is supposed to simply be faster, but users could potentially see differences because of the aforementioned problem.

My question is whether this issue is simply inherent and must be either accepted or the entire solution rejected, or if this problem can be mitigated by approaching the subtotal aggregation issue in a different way.

Please let me know if additional details are needed! Thank you for your time.

Solução

There exists an algorithm for exact summation of a set of floating-point values, famously available in Python. If you use this algorithm for computing both linear and parallel mode, the two of them will show the exact same totals: they will both compute exact sums of your set of values.

This algorithm requires dynamic allocation and may not be appropriate depending on your context, but it is the only solution I see with the information you have provided. Other approaches may be possible if reduced precision is acceptable (one obvious one being to convert all values to fixed-point before including them into the sum. Fixed-point addition, unlike floating-point addition, is associative).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow