finding n largest differences between two lists
-
30-04-2021 - |
Frage
I have two lists old
and new
, with the same number of elements.
I'm trying to write an efficient function that takes n
as a parameter, compares the elements of two lists at the same locations (by index), finds n
largest differences, and returns the indices of those n
elements.
I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?
Lösung
Whenever you think "n largest", think heapq
.
>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]
This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.
It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs
here in case you want the absolute value of the difference:
>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]
Andere Tipps
Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first n
:
print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]
This would be O(k log k)
where k
is the length of your original lists.
If n
is significantly smaller than k
, the best idea would be to use the nlargest
function provided by the heapq
module:
import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))
This will be O(k log n)
instead of O(k log k)
which can be significant for k >> n
.
Also, if your lists are really big, you'd probably be better off using itertools.izip
instead of the regular zip
function.
From your question i think this is what you want:
In difference.py
l1 = [15,2,123,4,50]
l2 = [9,8,7,6,5]
l3 = zip(l1, l2)
def f(n):
diff_val = 0
index_val = 0
l4 = l3[:n]
for x,y in l4:
if diff_val < abs(x-y):
diff_val = abs(x-y)
elem = (x, y)
index_val = l3.index(elem)
print "largest diff: ", diff_val
print "index of values:", index_val
n = input("Enter value of n:")
f(n)
Execution:
[avasal@avasal ]# python difference.py
Enter value of n:4
largest diff: 116
index of values: 2
[avasal@avasal]#
if this is not what you want, consider elaborating the question little more..
>>> l = []
... for i in itertools.starmap(lambda x, y: abs(x-y), itertools.izip([1,2,3], [100,102,330])):
... l.append(i)
>>> l
5: [99, 100, 327]
itertools
comes handy for repetitive tasks. From starmap
converts tuples
to *args
. For reference. With max
function you will be able to get the desired result. index
function will help to find the position.
l.index(max(l)
>>> l.index(max(l))
6: 2
Here's a solution hacked together in numpy (disclaimer, I'm a novice in numpy so there may be even slicker ways to do this). I didn't combine any of the steps so it is very clear what each step was doing. The final value is a list of the indexes of the original lists in order of the highest delta. Picking the top n is simply sorted_inds[:n]
and retrieving the values from each list or from the delta list is trivial.
I don't know how it compares in performance to the other solutions and it's obviously not going to show up with such a small data set, but it might be worth testing with your real data set as my understanding is that numpy is very very fast for numerical array operations.
Code
import numpy
list1 = numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
list2 = numpy.array([9, 8, 7, 6, 5, 4, 3, 2, 1])
#Caculate the delta between the two lists
delta = numpy.abs(numpy.subtract(list1, list2))
print('Delta: '.ljust(20) + str(delta))
#Get a list of the indexes of the sorted order delta
sorted_ind = numpy.argsort(delta)
print('Sorted indexes: '.ljust(20) + str(sorted_ind))
#reverse sort
sorted_ind = sorted_ind[::-1]
print('Reverse sort: '.ljust(20) + str(sorted_ind))
Output
Delta: [8 6 4 2 0 2 4 6 8]
Sorted indexes: [4 3 5 2 6 1 7 0 8]
Reverse sort: [8 0 7 1 6 2 5 3 4]