The question I'm asking is similar to the one I posted here a while ago: Comparing 2 Pandas dataframes row by row and performing a calculation on each row
I got a very helpful answer to that question and I'm trying to use that information to help me answer my current question.
Task: Group a dataframe by columns trial, RECORDING_SESSION_LABEL, and IP_INDEX. For each group, I need to calculate the Euclidean distance between a row and all rows above it (so from Row 2 to Row n) using the values in columns CURRENT_FIX_X and CURRENT_FIX_Y. If the distance is less than 58.93, I need to add the value of CURRENT_FIX_INDEX from the row I'm comparing to (not against) to a list, and then concatenate that list into a string and add it to a new column (refix_list) so the string is in the new column of the row I'm comparing against.
Example: I'm on Row 7, so I'm comparing the distance of Row 7 to Rows 6, 5, 4, 3, 2, and 1 of that group. If the distance between Row 7 and Rows 5, 3, and 1 are less than 58.93, I want a comma-separated string that contains the CURRENT_FIX_INDEX value of each of those 3 rows in the refix_list column at Row 7.
Problem: I have code that I'm working with, and I'm not sure if it's working because I get a 'ValueError: Length of values (0) does not match length of index (297)' when I try to print the df so I know there's an issue either creating the list or more likely, concatenating it into a string and assigning it to the specific row.
Here's the code I'm working with:
# Define a function to calculate Euclidean distancedef euclidean_distance(x1, y1, x2, y2): return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)# Grouping the DataFrame by RECORDING_SESSION_LABEL, trial, and IP_INDEXgrouped = df.groupby(['RECORDING_SESSION_LABEL', 'trial', 'IP_INDEX'])# List to store CURRENT_FIX_INDEX for each rowindex_list = []refix_values = []# Iterate over each groupfor group_name, group_df in grouped: # Sort the group_df by some unique column group_df = group_df.sort_values(by='trial') # Calculate Euclidean distance for each row for i, row in group_df.iterrows(): current_x = row['CURRENT_FIX_X'] current_y = row['CURRENT_FIX_Y'] # Calculate distance with every row above it for j, prev_row in group_df.iloc[:i].iterrows(): current_index = prev_row['CURRENT_FIX_INDEX'] prev_x = prev_row['CURRENT_FIX_X'] prev_y = prev_row['CURRENT_FIX_Y'] distance = euclidean_distance(current_x, current_y, prev_x, prev_y) # If distance is less than or equal to 58.93, store CURRENT_FIX_INDEX if distance <= 58.93: index_list.append(current_index) refix_values.append(','.join(map(str, index_list))) #Add list of matching INDEX values to list of listsdf['refix_list'] = []# Iterate over the DataFrame to access each row and its indexfor index, row in df.iterrows(): # Assign the list to the current row in the specified column df.at[index, refix_list] = refix_valuesprint(df)
From my limited knowledge, I'm guessing the issue is in the last block of code, but I'm not positive. Any help is appreciated!