The Occasional Occurence
A sorting problem
April 12, 2007 at 11:11 AM | categories: Python, GeneralHere's the situation.
I have a list of known field names.
specific_order = ['id', 'age', 'zzyzx', 'bar', 'spam_eggs']
That list is in the order that I want the resulting data displayed in. The dataset is a simple dictionary where the items in the specific_order list are the keys.
However, in each dataset there is the possibility that derived fields/keys will be present, like 'bar_foo' or 'id_new'.
In the final dataset, I would like the fields to be ordered first by the known field names, and then by any derived field names. So, something like:
final = ['id', 'id_new', 'age', 'zzyzx', 'bar', 'bar_foo', 'spam_eggs']
So, with the specific_order list and a copy of that list sorted by descending length (spo_by_length) I have created the following comparison function (to be passed to list.sort()):
def cmp_headers(a, b): ca, cb = None, None if (a not in specific_order) or (b not in specific_order): for candidate in spo_by_length: if (ca and cb): break if not ca and a.startswith(candidate): ca = candidate if not cb and b.startswith(candidate): cb = candidate a = ca or a b = cb or b if a in specific_order and b in specific_order: return cmp(specific_order.index(a), specific_order.index(b)) if a in specific_order: return -1 if b in specific_order: return 1 return cmp(a, b)
Here is some sample sorted output:
In [2]: keys = specific_order + ['id_old', 'id_new', 'bar_foo', 'spam_eggs_ham'] In [3]: sorted(keys, cmp_headers) Out[3]: ['id', 'id_old', 'id_new', 'age', 'zzyzx', 'bar', 'bar_foo', 'spam_eggs', 'spam_eggs_ham']
The part that concerns me in cmp_headers is the for loop. Is there any way that I can get the desired sort order without looping over the length-sorted list of fields? Is there a better sort algorithm in general for this sort of task?
cw
PS - this is somewhat academic, as this sort operation will be performed once per dataset, so not too expensive in the long run. However, I would like to know if there is a better way.