finding n largest differences between two lists

https://stackoverflow.com/questions/9323486

30-04-2021
|

Frage

I have two lists old and new, with the same number of elements.

I'm trying to write an efficient function that takes n as a parameter, compares the elements of two lists at the same locations (by index), finds n largest differences, and returns the indices of those n elements.

I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?

Lösung

Whenever you think "n largest", think heapq.

>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]

This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.

It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs here in case you want the absolute value of the difference:

>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]

Andere Tipps

Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first n:

print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]

This would be O(k log k) where k is the length of your original lists.

If n is significantly smaller than k, the best idea would be to use the nlargest function provided by the heapq module:

import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))

This will be O(k log n) instead of O(k log k) which can be significant for k >> n. Also, if your lists are really big, you'd probably be better off using itertools.izip instead of the regular zip function.

From your question i think this is what you want:

In difference.py

l1 = [15,2,123,4,50]
l2 = [9,8,7,6,5]


l3 = zip(l1, l2)

def f(n):
    diff_val = 0
    index_val = 0
    l4 = l3[:n]

    for x,y in l4:
        if diff_val < abs(x-y):
            diff_val = abs(x-y)
            elem = (x, y)
            index_val = l3.index(elem)

    print "largest diff: ", diff_val
    print "index of values:", index_val


n = input("Enter value of n:") 

f(n)

Execution:

[avasal@avasal ]# python difference.py 
Enter value of n:4
largest diff:  116
index of values: 2
[avasal@avasal]#

if this is not what you want, consider elaborating the question little more..

>>> l = []
... for i in itertools.starmap(lambda x, y: abs(x-y), itertools.izip([1,2,3],   [100,102,330])):
...     l.append(i)
>>> l
5: [99, 100, 327]

itertools comes handy for repetitive tasks. From starmap converts tuples to *args. For reference. With max function you will be able to get the desired result. index function will help to find the position.

l.index(max(l)

>>> l.index(max(l))
6: 2

Here's a solution hacked together in numpy (disclaimer, I'm a novice in numpy so there may be even slicker ways to do this). I didn't combine any of the steps so it is very clear what each step was doing. The final value is a list of the indexes of the original lists in order of the highest delta. Picking the top n is simply sorted_inds[:n] and retrieving the values from each list or from the delta list is trivial.

I don't know how it compares in performance to the other solutions and it's obviously not going to show up with such a small data set, but it might be worth testing with your real data set as my understanding is that numpy is very very fast for numerical array operations.

Code

import numpy

list1 = numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
list2 = numpy.array([9, 8, 7, 6, 5, 4, 3, 2, 1])

#Caculate the delta between the two lists
delta = numpy.abs(numpy.subtract(list1, list2))
print('Delta: '.ljust(20) + str(delta))

#Get a list of the indexes of the sorted order delta
sorted_ind = numpy.argsort(delta)
print('Sorted indexes: '.ljust(20) + str(sorted_ind))

#reverse sort
sorted_ind = sorted_ind[::-1]
print('Reverse sort: '.ljust(20) + str(sorted_ind))

Output

Delta:              [8 6 4 2 0 2 4 6 8]
Sorted indexes:     [4 3 5 2 6 1 7 0 8]
Reverse sort:       [8 0 7 1 6 2 5 3 4]

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow