TAGS :Viewed: 3 - Published at: a few seconds ago

[ How do I compare 2 lists and order 1 based on the number of matches ]

Say you had a list, for example:

first_list = ['a', 'b', 'c']

And you had the following list:

second_list = ['a', 'a b c', 'abc zyx', 'ab cc ac']

How would you create a function that simply re-orders the second list based on the total number of times an element from the entire first list matches any part of an individual string from the second list?


For further clarity:

  • in the second list the 'a' string would contain 1 match
  • the 'a b c' string would contain 3 matches
  • the second example list would essentially end up in reverse order once the function has finished

My attempt:

first_list = ['a', 'b', 'c']
second_list = ['a', 'a b c', 'abc zyx', 'ab cc ac']

print second_list

i = 0
for keyword in first_list:
    matches = 0
    for s in second_list:
        matches += s.count(keyword)
        if matches > second_list[0].count(keyword):
            popped = second_list.pop(i)
            second_list.insert(0, popped)

print second_list

Answer 1


The most straightforward approach would be to use the key parameter of the sorted built-in function:

>>> sorted(second_list, key = lambda s: sum(s.count(x) for x in first_list), reverse=True)
['ab cc ac', 'a b c', 'abc zyx', 'a']

The key function is called once per item in the list to be sorted. Still this is inefficient since count takes linear time.

Answer 2


Similar answer:

first_list = ['a', 'b', 'c']    
second_list = ['a', 'a b c', 'abc zyx', 'ab cc ac']

#Find occurrences
list_for_sorting = []
for string in second_list:
    occurrences = 0
    for item in first_list:
        occurrences += string.count(item)

    list_for_sorting.append((occurrences, string))

#Sort list
sorted_by_occurrence = sorted(list_for_sorting, key=lambda tup: tup[0], reverse=True)
final_list = [i[1] for i in sorted_by_occurrence]
print(final_list)

['ab cc ac', 'a b c', 'abc zyx', 'a']

Answer 3


Here's one unstable way to do it:

>>> l1 = ['a', 'b', 'c']
>>> l2 = ['a', 'a b c', 'abc zyx', 'ab cc ac']
>>> [s for _, s in sorted(((sum(s2.count(s1) for s1 in l1), s2) for s2 in l2), reverse=True)]
['ab cc ac', 'abc zyx', 'a b c', 'a']

If stable sort is required you could leverage enumerate:

>>> l1 = ['a', 'b', 'c']
>>> l2 = ['a', 'a b c', 'ccc ccc', 'bb bb bb', 'aa aa aa']
>>> [x[-1] for x in sorted(((sum(s2.count(s1) for s1 in l1), -i, s2) for i, s2 in enumerate(l2)), reverse=True)]
['ccc ccc', 'bb bb bb', 'aa aa aa', 'a b c', 'a']

Above generates tuples where second item is a string from l2 and first item is the count of matches from l1:

>>> tuples = [(sum(s2.count(s1) for s1 in l1), s2) for s2 in l2]
>>> tuples
[(1, 'a'), (3, 'a b c'), (3, 'abc zyx'), (6, 'ab cc ac')]

Then those tuples are sorted to descending order:

>>> tuples = sorted(tuples, reverse=True)
>>> tuples
[(6, 'ab cc ac'), (3, 'abc zyx'), (3, 'a b c'), (1, 'a')]

And finally only the strings are taken:

>>> [s for _, s in tuples]
['ab cc ac', 'abc zyx', 'a b c', 'a']

In the second version tuples have reverse index to ensure stability:

>>> [(sum(s2.count(s1) for s1 in l1), -i, s2) for i, s2 in enumerate(l2)]
[(1, 0, 'a'), (3, -1, 'a b c'), (3, -2, 'abc zyx'), (6, -3, 'ab cc ac')]