Anonymizing IPv6 addresses
質問
As required by law in several countries we anonymize IP-addresses of our users in our log files. Using IPv4 we regularly just anonymize the two last bytes, eg. instead of 255.255.255.255
we log 255.255.\*.\*
What algorithm would you recommend to anonymize IPv6 addresses?
解決
At the very least you want to strip the EUI-64 off, i.e the last 64 bits of the address. more realistically you want to strip quite a lot more to really be private, since the remaining part will still identify only one subnet (i.e. one house possibly)
IPv6 global addressing is very hierarchical, from RFC2374:
| 3| 13 | 8 | 24 | 16 | 64 bits |
+--+-----+---+--------+--------+--------------------------------+
|FP| TLA |RES| NLA | SLA | Interface ID |
| | ID | | ID | ID | |
+--+-----+---+--------+--------+--------------------------------+
<--Public Topology---> Site
<-------->
Topology
<------Interface Identifier----->
The question becomes how private is private enough? Strip 64 bits and you've identified a LAN subnet, not a user. Strip another 16 on top of that and you've identified a small organisation, i.e. a customer of an ISP, e.g. company/branch office with several subnets. Strip the next 24 off an you've basically identified an ISP or really big organisation only.
You can implement this with a bitmask exactly like you would for an IPv4 address, the question becomes a legal one though of "how much do I need to strip to comply with the specific legislation", not a technical one at that point though.
他のヒント
To anonymize public IPv6 addresses you could take the first 2 groups and replace the remaining part with CRC-16. Some examples (where abc1 and abc2 - are CRC-16 values):
- 2001:0db8:85a3:0000:0000:8a2e:0370:7334 -> 2001:0db8-abc1
- 2a02:200:7::123 -> 2a02:200-abc2
Such shortening allows easy matching of the first 2 groups (of course with some probability) with non-anonymized IPv6 in full logs having shorter retention time. Which is good for problem or security incident investigation.