NaN value breaks sorting in Python

Some time ago, I gave a talk at Boston Python meetup. Bottom line – NaN values in a list silently break sorting. Details below.

NaN stands for not a number. It is a numeric data type used for undefined and unrepresentable values. For example:

In [1]:
a = float('inf')
b = float('inf')
In [2]:
a / b
Out[2]:
nan
In [3]:
a - b
Out[3]:
nan

It’s very easy for NaN to enter your data:

In [4]:
from scipy.stats.stats import pearsonr

a = [0, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0]

pearsonr(a, b)[0]
Out[4]:
nan

It’s also very easy for NaN to quietly propagate:

In [5]:
c = float('nan')
c + 4022
Out[5]:
nan

NaN silently breaks sorting

In [6]:
d = [4, float('nan'), 2, 1]

NaN values break sorting, because they are not smaller, larger, or equal to any number. They don’t even compare as equal to themselves.

In [7]:
e = float('nan')
In [8]:
e < 15234
Out[8]:
False
In [9]:
e > 15234
Out[9]:
False
In [10]:
e == 15234
Out[10]:
False
In [11]:
e == e
Out[11]:
False

Ways to get around sorting with NaN values

In [12]:
import math
f = [4, float('nan'), 2, 1]
In [13]:
sorted([x for x in f if not math.isnan(x)])
Out[13]:
[1, 2, 4]
In [14]:
sorted([x for x in f if x == x])
Out[14]:
[1, 2, 4]
In [16]:
sorted(f, key=lambda x: x if not math.isnan(x) else 0, reverse=True)
Out[16]:
[4, 2, 1, nan]

Leave a comment

Your email address will not be published. Required fields are marked *