I've observed a peculiar behavior while working with sets in Python involving NaN values. Consider the following scenarios:
When creating a set with duplicate NaN values using
{np.nan, np.nan}
only one value is retained in the set.
Similarly, when constructing a set with duplicate tuples containing NaN values like
{(np.nan,), (np.nan,)}
only one value is stored in the set.
However, when I create a set with tuples containing NaN values extracted from a Pandas DataFrame, like
{(pd.Series([np.nan])[0],), (pd.Series([np.nan])[0],)}
the set contains both values.
Why does this discrepancy occur?I suspect that is something related to the equality between NaN values. But it is not clear to me why using pandas should change the result
I expected that the behavior would be consistent across these different methods of creating sets, where duplicate NaN values would be treated the same way.