Calculating distance between rows of Pandas dataframe and adding to list

The question I'm asking is similar to the one I posted here a while ago: Comparing 2 Pandas dataframes row by row and performing a calculation on each row

I got a very helpful answer to that question and I'm trying to use that information to help me answer my current question.

Task: Group a dataframe by columns trial, RECORDING_SESSION_LABEL, and IP_INDEX. For each group, I need to calculate the Euclidean distance between a row and all rows above it (so from Row 2 to Row n) using the values in columns CURRENT_FIX_X and CURRENT_FIX_Y. If the distance is less than 58.93, I need to add the value of CURRENT_FIX_INDEX from the row I'm comparing to (not against) to a list, and then concatenate that list into a string and add it to a new column (refix_list) so the string is in the new column of the row I'm comparing against.

Example: I'm on Row 7, so I'm comparing the distance of Row 7 to Rows 6, 5, 4, 3, 2, and 1 of that group. If the distance between Row 7 and Rows 5, 3, and 1 are less than 58.93, I want a comma-separated string that contains the CURRENT_FIX_INDEX value of each of those 3 rows in the refix_list column at Row 7.

Problem: I have code that I'm working with, and I'm not sure if it's working because I get a 'ValueError: Length of values (0) does not match length of index (297)' when I try to print the df so I know there's an issue either creating the list or more likely, concatenating it into a string and assigning it to the specific row.

Here's the code I'm working with:

# Define a function to calculate Euclidean distancedef euclidean_distance(x1, y1, x2, y2):    return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)# Grouping the DataFrame by RECORDING_SESSION_LABEL, trial, and IP_INDEXgrouped = df.groupby(['RECORDING_SESSION_LABEL', 'trial', 'IP_INDEX'])# List to store CURRENT_FIX_INDEX for each rowindex_list = []refix_values = []# Iterate over each groupfor group_name, group_df in grouped:    # Sort the group_df by some unique column    group_df = group_df.sort_values(by='trial')    # Calculate Euclidean distance for each row    for i, row in group_df.iterrows():        current_x = row['CURRENT_FIX_X']        current_y = row['CURRENT_FIX_Y']        # Calculate distance with every row above it        for j, prev_row in group_df.iloc[:i].iterrows():            current_index = prev_row['CURRENT_FIX_INDEX']            prev_x = prev_row['CURRENT_FIX_X']            prev_y = prev_row['CURRENT_FIX_Y']            distance = euclidean_distance(current_x, current_y, prev_x, prev_y)            # If distance is less than or equal to 58.93, store CURRENT_FIX_INDEX            if distance <= 58.93:                index_list.append(current_index)    refix_values.append(','.join(map(str, index_list))) #Add list of matching INDEX values to list of listsdf['refix_list'] = []# Iterate over the DataFrame to access each row and its indexfor index, row in df.iterrows():    # Assign the list to the current row in the specified column    df.at[index, refix_list] = refix_valuesprint(df)

From my limited knowledge, I'm guessing the issue is in the last block of code, but I'm not positive. Any help is appreciated!

Latest Images

Trending Articles

Latest Images