I had a label column from a pandas dataframe that had so many variance. I want to narrow it down by putting some of the label to another label I chose.The data is supposed to be like this (both column are in dtype string):
old_label | new_label |
---|---|
health | health |
healthy_tips | health |
rejuvenation | health |
government | government |
senate | government |
governor | government |
So I apply this function that inspect every substring element of the inputs:
def relabel(x): for i in x: if ("health" or "rejuvenation") in i: return "health" elif ("gover" or "senate") in i: return "government" else: return i
Then I apply using:
data['new_label'] = data['old_label'].apply(relabel)
But it immediately return the exact same value by its input, so the result is just a new column with the exact same data.
How to fix this?