Question

I have a function that receives three different "people" objects and generates a new "compatibility" object based on the combined values in the "people" objects.

However, about 1/3 of the time the three "people" objects that it receives as input are the same as one before, though possibly in a different order. In these cases I do NOT want to make a new "score" object, but simply return a value contained within the existing object.

Originally, the program just loops through the list<> of "compatibility" objects searching for the one that belongs to these three "people" (since each "compatibility" object contains an array of people objects). This method is really slow considering that there's over thousands of "compatibility" objects and over a million "people" objects.

I had the idea of using a dictionary where the key is a number I generated by combining the three people objects' id values into a single UInt64 using XOR, and storing the score objects in as dictionary values rather than in a list. This cuts down the time by about half, and is acceptable in terms of time performance, but there's way too many collisions, and it returns a wrong score too often.

Any suggestions or pointers would be much appreciated.

Edit: To add to the original question, each "people" object has a bunch of other fields that I could use, but the problem is making a key that is UNIQUE and COMMUTATIVE.

Was it helpful?

Solution

I think you're looking at things in a much too complex manner. Take the 3 PersonID values and sort them,so that they're always in the same order, no matter which order they were passed in. Then set a value in a hashtable using the three PersonIDs as the key, separated with a hyphen or some other character that won't occur in a PersonID value. Then later, check if there's a value in the hashtable with that key.

So if the three PersonIDs are 10, 5 and 22, the hash key could be something like "5-10-22".

OTHER TIPS

Create the key by concatinating objectids after sorting the trio in a pre-determined order.

Your best option would be a custom IEqualityComparer class. Declare your Dictionary like this

Dictionary<List<People>, Compatability> people = 
    new Dictionary<List<People>, Compatability>(new PersonListComparer());

You'll need to create a PersonListComparer class that implements IEqualityComparer<List<People>>. There are two methods you'll need to implement, one that gets a hash code and one that compares equality. The Dictionary will use GetHashCode to determine if two lists are POSSIBLY equal, and the Equals method to determine if they actually are (in other words, the hash code is fast but could give a false positive but never a false negative). Use your existing hashing algorithm (the XOR) for GetHashCode, then just comare the two lists explicitly in the Equals method.

This should do the trick!

Why not use the names of the people as the dictionary key? (Sort the names first, so that order of passing doesn't matter.) IE, John, Alice, and Bob become something like my_dictionary["Alice_Bob_John"] <- if that key exists, you've already computed the score, otherwise, you need to compute it. As an alternative to my string hacking above, you could actually use a structure:

NameTriple n = new NameTriple("John", "Alice", "Bob");
// NameTriple internally sorts the names.
my_dictionary[n] ...

If you want to keep everything in memory and not use a database, I'd recommend something akin to a tree structure. Assuming your object IDs are sortable and order doesn't matter, you can accomplish this with nested dictionaries.

Namely, a Dictionary<Key, Dictionary<Key, Dictionary<Key, Compatibility>>> should do the trick. Sort the IDs, and use the lowest value in the outer dictionary, the next value in the next, and the final value to find the compatibility object. This way, there will be no collisions, and lookup should be quite fast.

Or, now that I think again, this doesn't have to be that complicated. Just use a string as a key and concatenate the IDs together in sorted order with a "!" or something else in between that doesn't occur naturally in the IDs.

assuming all "Person" objects are unique, store a UUID in the object.

in your function staticly store the quad (P1,P2,P3,V) where P1,P2,P3 are UUID's of a Person object, sorted (to avoid the ordering problem) and V is the result from the previous calculation.

then your function checks to is if there is an entry for this triplet of Persons, if not it does the work and stores it.

you can store the (P1,P2,P3,V) values in a dictionary, just key off some hash of the three P values

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top