I'm planning to implement a non-fixed-size Fenwick tree. That is, a Fenwick tree that allows interleaving range queries with adding/removing elements.

All implementations and samples I've seen so far are for fixed-size Fenwick trees, where the number of elements is known before preprocessing the frequencies. They use a fixed size array, so it's not possible to add or remove elements after preprocessing is done. (Well, it is possible, but I'd need to re-build the structure).

I thought of extending TreeMap or maybe AbstractMap, but as TreeMap is actually a red-black tree, I don't know how to implement the red-black mechanics (so that the tree remains balanced) without loosing the cumulative sums of nodes involved in the rebalancing process.

So I thought that maybe I should take another approach: why not extend or adapt a simple random access ArrayList and re-calculate all cumulative sums when the underlying array is resized? This would of course have an impact but, hey, that's exactly what a HashMapdoes.

This is why I wanted to ask here first, in case someone has already done it, and to check which approach you think is the best.

有帮助吗?

解决方案

The Fenwick Tree is a space-efficient data structure because it foregoes the original data and instead only stored some "encoded" version.

As you have noticed, however, this encoding means that you cannot add or remove an element (except at the tail) without a lot of churn.

If your add/remove operations are infrequent enough than rebuilding the whole underlying array is possible in your case; just do so. This will mean a latency spike, but should not affect throughput significantly.

If, on the other hand, this overhead is not acceptable, then the Fenwick Tree is just not an appropriate data-structure for your issue. In this case, I would recommend an Augmented Binary Search Tree.

Take the Binary Search Tree implementation that you prefer (Red-Black Tree, AVL Tree, Splay Tree, ...) or even a B-Tree if you'd like, and augment the information each node/element carries with:

  • the number of left and right children
  • the sum of left and right children

When adding/removing elements you will need to keep the numbers and sums up-to-date on the path leading to the root. Adding and removing is O(log N) already, so since you'll be updating O(log N) nodes/elements you will not change the complexity significantly.

Then, implement new methods to access by index and retrieve the sum. Note that the index is not immediately accessible, but knowing the left sub-tree has 4 nodes, if you wish to access:

  • index 2, then you should go the left sub-tree
  • index 5, then you are here
  • index 7, then you should go the right sub-tree (and access index 2 in the right sub-tree: 7 - sizeof(left sub-tree) - 1)

其他提示

This is the first time I heard of this data structure. What sized trees are you aiming for? Unless it's just thousands, it's probably not going to matter much which underlying data structure you start with.

From Wikipedia, I understand that index-based access to the structure is important. For that reason, I would go with an implementation based on plain ArrayList to start with.

Remember that the point of an ADT is that you can swap out the implementation details while keeping the interface intact.

许可以下: CC-BY-SA归因
scroll top