Question

I'm adding items to a StringDictionary and it's possible that a duplicate key will come up. This will of course throw an exception.

If the chance of duplicates is very low (ie it will rarely happen), am I better off using a Try Catch block and leaving it unhandled, or should I always do a .ContainsKey check before adding each entry?

I'm assuming that if the likelihood of duplicate keys was high, then allowing exceptions would be a poor decision as they are expensive.

Thoughts?

Edit

I used reflector on the generic Dictionary and found the following for ContainsKey and TryGetValue, as both were mentioned below.

public bool TryGetValue(TKey key, out TValue value)
{
    int index = this.FindEntry(key);
    if (index >= 0)
    {
        value = this.entries[index].value;
        return true;
    }
    value = default(TValue);
    return false;
}

And

public bool ContainsKey(TKey key)
{
    return (this.FindEntry(key) >= 0);
}

Am I missing something, or is TryGetValue doing more work than ContainsKey ?


I appreciate the responses, and for my current purpose I'm going to go with doing a ContainsKey call as the collection will be small, and the code more readable.

Was it helpful?

Solution

How to approach this depends on what you want to do if a collision happens. If you want to keep the first inserted value, you should use ContainsKey to check before inserting. If, on the other hand, you want to use the last value for that key, you can do like so:

// c# sample:
myDictionary[key] = value;

As a side note: I would probably, if possible, use Dictionary<string, string> instead of StringDictionary. If nothing else that will give you access to some more Linq extension methods.

OTHER TIPS

I would do the Contains check.

My reasoning is exceptions should be saved for those things that just shouldn't happen. If they do then alarm bells should be rang and calvery brought in. Just seems odd to me to use exceptions for known issue case handling especially when you can test for it.

If at all possible, replace StringDictionary with Dictionary<string, string>, and use TryGetValue. This avoids both exception handling overhead, and double lookup.

I did some benchmarking regarding this. But I have to reiterate Kelsey's point:

exceptions should be saved for those things that just shouldn't happen. If they do then alarm bells should be rang and calvary brought in. Just seems odd to me to use exceptions for known issue case handling especially when you can test for it.

It makes sense because the performance gain you gain by going for try-catch (if at all) is trivial but the "catch" can be more penalizing. Here's the test:

public static void Benchmark(Action method, int iterations = 10000)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < iterations; i++)
        method();

    sw.Stop();
    MessageBox.Show(sw.Elapsed.TotalMilliseconds.ToString());
}

public static string GetRandomAlphaNumeric()
{
    return Path.GetRandomFileName().Replace(".", "").Substring(0, 8);
}

var dict = new Dictionary<string, int>();

No duplicates:

Benchmark(() =>
{
    // approach 1
    var key = GetRandomAlphaNumeric();
    if (!dict.ContainsKey(key))
        dict.Add(item, 0);

    // approach 2
    try
    {
        dict.Add(GetRandomAlphaNumeric(), 0);
    }
    catch (ArgumentException)
    {

    }
}, 100000);  

50% duplicates:

for (int i = 0; i < 50000; i++)
{
    dict.Add(GetRandomAlphaNumeric(), 0);  
}

var lst = new List<string>();
for (int i = 0; i < 50000; i++)
{
    lst.Add(GetRandomAlphaNumeric());
}
lst.AddRange(dict.Keys);
Benchmark(() =>
{
    foreach (var key in lst)
    {
        // approach 1
        if (!dict.ContainsKey(key))
            dict.Add(key, 0);

        // approach 2
        try
        {
            dict.Add(key, 0);
        }
        catch (ArgumentException)
        {

        }
    }
}, 1);  

100% duplicates

var key = GetRandomAlphaNumeric();
dict.Add(key, 0);
Benchmark(() =>
{
    // approach 1
    if (!dict.ContainsKey(key))
        dict.Add(item, 0);

    // approach 2
    try
    {
        dict.Add(key, 0);
    }
    catch (ArgumentException)
    {

    }
}, 100000);

Results:

No duplicates

approach 1: debug -> 630 ms - 680 ms; release -> 620 ms - 640 ms

approach 2: debug -> 640 ms - 690 ms; release -> 640 ms - 670 ms

50% duplicates

approach 1: debug -> 26 ms - 39 ms; release -> 25 ms - 33 ms

approach 2: debug -> 1340 ms; release -> 1260 ms

100% duplicates

approach 1: debug -> 7 ms; release -> 7 ms

approach 2: debug -> 2600 ms; release -> 2400 ms

You can see that as duplicates increase, try-catch performs poorly. Even in the worst case where you have no duplicates at all, the performance gain of try-catch isn't anything substantial.

Unless this is a very large dictionary or in a critical inner loop of code, you will probably not see a difference.

The .ContainsKey check will cost you a little performance every time, while the thrown exception will cost you a bit more performance rarely. If the chances of a duplicate are indeed low, go with allowing the Exception.

If you actually do want to be able to manage duplicate keys, you might look at MultiDictionary in PowerCollections

There are 2 ways to look at this...

Performance

If you're looking at it from a performance perspective, you have to consider:

  • how expensive it is to calculate the Hash of the type for the Key and
  • how expensive it is to create and throw exceptions

I can't think of any hash calculation that would be more expensive than throwing an exception. Remember, when exceptions are thrown, they have be marshaled across interop to Win32 API, be created, crawl the entire stack trace, and stop and process any catch block encountered. Exception throwing in the CLR is still treated as HRESULTs handled by Win32 API under the covers. From Chris Brumme's blog:

Of course, we can’t talk about managed exceptions without first considering Windows Structured Exception Handling (SEH). And we also need to look at the C++ exception model. That’s because both managed exceptions and C++ exceptions are implemented on top of the underlying SEH mechanism, and because managed exceptions must interoperate with both SEH and C++ exceptions.

Performance Conslusion: Avoid exceptions


Best Practice

The .NET Framework Design Guidelines are a good ruleset to follow (with rare exception - pun intended). Design Guidelines Update: Exception Throwing. There is something called the "Try/Doer" pattern mention in the guidelines which in this case the recommendation is to avoid the exceptions:

Consider the Tester-Doer pattern for members which may throw exceptions in common scenarios to avoid performance problems related to exceptions.

Exceptions should also never be used as a control-flow mechanism - again written in the CLR Design Guidelines.

Best Practice Conclusion: Avoid Exceptions

This issue is a lot simpler as of now for .NET Standard 2.1+ and .NET Core 2.0+. Just use the TryAdd method. It handles all the issues you have the most graceful way.

I try to avoid Exceptions everywhere I can - they're expensive to handle, and they can complicate the code. Since you know that a collision is possible, and it's trivial to do the .Contains check, I'd do that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top