Question

I have an caching application that uses a CRC64 value to ensure data integrity. I'm thinking about putting an extra field, a timestamp to be passed around with the data between the various cache servers and compared to see if data has changed.

However, this requires protocol changes. While that's not a huge deal, I already have a CRC64 that could be used as an indicator that something has changed.

Does anyone know the stats around two blocks of data producing the same CRC64? If not, how could I compute it or estimate it's likelyhood?

Was it helpful?

Solution

If you assume that crc64 is 'perfect', then the numbers are pretty reasonable:

For a 1% probability of collision, you need 6.1 × 10^8 entries. For a 50% probability of collision, you need 5.1 × 10^9 entries.

Of course, if the data is potentially supplied by malicious sources, then collisions in a hash as simple as crc64 can be generated easily, and collisions could be rampant. So whether or not you go this route depends on the source of input data and the potential ramifications of collisions.

OTHER TIPS

The probability of any two given blocks colliding is 1/264, or 1 in about 1.8 × 1019.

However, the probability rapidly becomes more likely if you are interested in the rate of collision out of any two blocks from a population of size N.

For more information, see Birthday Problem on Wikipedia, which has formulas and approximations.

The probability of two CRC64s over different random data being identical would be something close to 1 chance in 2** 64. But since CRCs are somewhat sensitive to data patterns, there could be degenerate cases where you'd lose several binary orders of protection. It's probably not possible to come up with a hard number, but you'd likely be safe in assuming the worst case chance of collision would be less than 1 chance in 2** 50 or so.

You'd be assured of getting closer to the theoretical limit if you used a cryptographic hash instead of a CRC64, but the crypto hash is generally much more expensive to compute.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top