Why is a Dictionary “not ordered”?

https://stackoverflow.com/questions/6384710

29-10-2019
|

سؤال

I have read this in answer to many questions on here. But what exactly does it mean?

var test = new Dictionary<int, string>();
test.Add(0, "zero");
test.Add(1, "one");
test.Add(2, "two");
test.Add(3, "three");

Assert(test.ElementAt(2).Value == "two");

The above code seems to work as expected. So in what manner is a dictionary considered unordered? Under what circumstances could the above code fail?

المحلول

Well, for one thing it's not clear whether you expect this to be insertion-order or key-order. For example, what would you expect the result to be if you wrote:

var test = new Dictionary<int, string>();
test.Add(3, "three");
test.Add(2, "two");
test.Add(1, "one");
test.Add(0, "zero");

Console.WriteLine(test.ElementAt(0).Value);

Would you expect "three" or "zero"?

As it happens, I think the current implementation preserves insertion ordering so long as you never delete anything - but you must not rely on this. It's an implementation detail, and that could change in the future.

Deletions also affect this. For example, what would you expect the result of this program to be?

using System;
using System.Collections.Generic;

class Test
{ 
    static void Main() 
    {
        var test = new Dictionary<int, string>();
        test.Add(3, "three");
        test.Add(2, "two");
        test.Add(1, "one");
        test.Add(0, "zero");

        test.Remove(2);
        test.Add(5, "five");

        foreach (var pair in test)
        {
            Console.WriteLine(pair.Key);
        }
    }     
}

It's actually (on my box) 3, 5, 1, 0. The new entry for 5 has used the vacated entry previously used by 2. That's not going to be guaranteed either though.

Rehashing (when the dictionary's underlying storage needs to be expanded) could affect things... all kinds of things do.

Just don't treat it as an ordered collection. It's not designed for that. Even if it happens to work now, you're relying on undocumented behaviour which goes against the purpose of the class.

نصائح أخرى

A Dictionary<TKey, TValue> represents a Hash Table and in a hashtable there is no notion of order.

The documentation explains it pretty well:

For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair structure representing a value and its key. The order in which the items are returned is undefined.

There's a lot of good ideas here, but scattered, so I'm going to try to create an answer that lays it out better, even though the problem has been answered.

First, a Dictionary has no guaranteed order, so you use it only to quickly look up a key and find a corresponding value, or you enumerate through all the key-value pairs without caring what the order is.

If you want order, you use an OrderedDictionary but the tradeoff is that lookup is slower, so if you don't need order, don't ask for it.

Dictionaries (and HashMap in Java) use hashing. That is O(1) time regardless of the size of your table. Ordered dictionaries typically use some sort of balanced tree which is O(log2(n)) so as your data grows, access gets slower. To compare, for 1 million elements, that's on the order of 2^20, so you'd have to do on the order of 20 lookups for a tree, but 1 for a hash map. That's a LOT faster.

Hashing is deterministic. Non-determinism means when you hash(5) the first time, and you hash(5) the next time, you get a different place. That would be completely useless.

What people meant to say is that if you add things to a dictionary, the order is complicated, and subject to change any time you add (or potentially remove) an element. For example, imagine the hash table has 500k elements into it, and you have 400k values. When you add one more, you reach the critical threshhold because it needs about 20% empty space to be efficient, so it allocates a bigger table (say, 1 million entries) and re-hashes all the values. Now they are all in different locations than they were before.

If you build the same Dictionary twice (read my statement carefully, THE SAME), you will get the same order. But as Jon correctly says, don't count on it. Too many things can make it not the same, even the initially allocated size.

This brings up an excellent point. It is really, really expensive to have to resize a hashmap. That means you have to allocate a bigger table, and re-insert every key-value pair. So it is well worth allocating 10x the memory it needs rather than have even a single grow have to happen. Know your size of hashmap, and preallocate enough if at all possible, it's a huge performance win. And if you have a bad implementation that doesn't resize, it can be a disaster if you pick too small of a size.

Now what Jon argued with me about in my comment in his answer was that if you add objects to a Dictionary in two different runs, you will get two different orderings. True, but that's not the dictionary's fault.

When you say:

new Foo();

you are creating a new object at a new location in memory.

If you use the value Foo as the key in a dictionary, with no other information, the only thing they can do is use the address of the object as the key.

That means that

var f1 = new Foo(1);
var f2 = new Foo(1);

f1 and f2 are not the same object, even if they have the same values.

So if you were to put them into Dictionaries:

var test = new Dictionary<Foo, string>();
test.Add(f1, "zero");

don't expect it to be the same as:

var test = new Dictionary<Foo, string>();
test.Add(f2, "zero");

even if both f1 and f2 have the same values. That has nothing to do with the deterministic behavior of the Dictionary.

Hashing is an awesome topic in computer science, my favorite to teach in data structures.

Check out Cormen and Leiserson for a high end book on red-black trees vs. hashing This guy named Bob has a great site about hashing, and optimal hashes: http://burtleburtle.net/bob

The order is non-deterministic.

From here

For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair structure representing a value and its key. The order in which the items are returned is undefined.

Maybe for your needs OrderedDictionary is the required.

I don't know C# or any of .NET, but the general concept of a Dictionary is that it's a collection of key-value pairs.
You don't access sequentially to a dictionary as you would when, for example, iterating a list or array.
You access by having a key, then finding whether there's a value for that key on the dictionary and what is it.
In your example you posted a dictionary with numerical keys which happen to be sequential, without gaps and in ascending order of insertion.
But no matter in which order you insert a value for key '2', you will always get the same value when querying for key '2'.
I don't know if C# permits, I guess yes, to have key types other than numbers, but in that case, it's the same, there's no explicit order on the keys.
The analogy with a real life dictionary could be confusing, as the keys which are the words, are alphabetically ordered so we can find them faster, but if they weren't, the dictionary would work anyway, because the definition of the word "Aardvark" would have the same meaning, even if it came after "Zebra". Think of a novel, in the other hand, changing the order of the pages wouldn't make any sense, as they are an ordered collection in essence.

The class Dictionary<TKey,TValue> is implemented using an array-backed index-linked list. If no items are ever removed, the backing store will hold items in order. When an item is removed, however, the space will be marked for reuse before the array is expanded. As a consequence, if e.g. ten items are added to a new dictionary, the fourth item is deleted, a new item is added, and the dictionary is enumerated, the new item will likely appear fourth rather than tenth, but there is no guarantee that different versions of Dictionary will handle things the same way.

IMHO, it would have been helpful for Microsoft to document that a dictionary from which no items are ever deleted will enumerate items in the original order, but that once any items are deleted, any future changes to the dictionary may arbitrary permute the items therein. Upholding such a guarantee as long as no items are deleted would be relatively cheap for most reasonable dictionary implementations; continuing to uphold the guarantee after items are deleted would be much more expensive.

Alternatively, it might have been helpful to have an AddOnlyDictionary which would be thread-safe for a single writer simultaneous with any number of readers, and guarantee to retain items in sequence (note that if items are only added--never deleted or otherwise modified--one may take a "snapshot" merely by noting how many items it presently contains). Making a general-purpose dictionary thread-safe is expensive, but adding the above level of thread-safety would be cheap. Note that efficient multi-writer multi-reader usage would not require use of a reader-writer lock, but could simply be handled by having writers lock and having readers not bother to.

Microsoft didn't implement an AddOnlyDictionary as described above, of course, but it's interesting to note that the thread-safe ConditionalWeakTable has add-only semantics, probably because--as noted--it's much easier to add concurrency to add-only collections than to collections which allow deletion.

Dictionary< string, Obj>, not SortedDictionary< string, Obj >, is default to sequence by the insertion order. Strange enough you need to specifically declare a SortedDictionary to have a dictionary that sorted by key string order:

public SortedDictionary<string, Row> forecastMTX = new SortedDictionary<string, Row>();

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow