In this dataframe I want to create a column 'desired_output' that takes the first value of 'lumpsum' from the 'index' of each 'ID'.
data = [ [1, 12334, 1, 12334], [1, 12334, 1, 12334], [1, 12334, 1, 12334], [1, 12334, 1, 12334], [1, 34567, 1, 12334], [1, 34567, 1, 12334], [2, 45788, 1, 45788], [2, 45788, 2, 45788], [2, 23467, 2, 45788], [2, 5678, 3, 5678], [2, 4567, 3, 5678], [3, 56832, 1, 56832], [3, 43456, 1, 56832], [3, 2378, 2, 2378], [4, 6754, 1, 6754], [4, 3456, 2, 3456]]columns = ['ID', 'lumpsum', 'index', 'desired_output']df = pd.DataFrame(data, columns=columns)print(df)
I used this code tried to create the 'desired_output' column, and called this new column 'test'.
df['test']=df.groupby('ID', 'index')['lumpsum'].transform('first')
The output completely ignored my grouping using 'index' and only returned the first 'lumpsum' value of each 'ID'. how should I rectify this?