Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 15970

Calculating distance between rows of Pandas dataframe and adding to list

$
0
0

The question I'm asking is similar to the one I posted here a while ago: Comparing 2 Pandas dataframes row by row and performing a calculation on each row

I got a very helpful answer to that question and I'm trying to use that information to help me answer my current question.

Task: Group a dataframe by columns trial, RECORDING_SESSION_LABEL, and IP_INDEX. For each group, I need to calculate the Euclidean distance between a row and all rows above it (so from Row 2 to Row n) using the values in columns CURRENT_FIX_X and CURRENT_FIX_Y. If the distance is less than 58.93, I need to add the value of CURRENT_FIX_INDEX from the row I'm comparing to (not against) to a list, and then concatenate that list into a string and add it to a new column (refix_list) so the string is in the new column of the row I'm comparing against.

Example: I'm on Row 7, so I'm comparing the distance of Row 7 to Rows 6, 5, 4, 3, 2, and 1 of that group. If the distance between Row 7 and Rows 5, 3, and 1 are less than 58.93, I want a comma-separated string that contains the CURRENT_FIX_INDEX value of each of those 3 rows in the refix_list column at Row 7.

Problem: I have code that I'm working with, and I'm not sure if it's working because I get a 'ValueError: Length of values (0) does not match length of index (297)' when I try to print the df so I know there's an issue either creating the list or more likely, concatenating it into a string and assigning it to the specific row.

Here's the code I'm working with:

# Define a function to calculate Euclidean distancedef euclidean_distance(x1, y1, x2, y2):    return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)# Grouping the DataFrame by RECORDING_SESSION_LABEL, trial, and IP_INDEXgrouped = df.groupby(['RECORDING_SESSION_LABEL', 'trial', 'IP_INDEX'])# List to store CURRENT_FIX_INDEX for each rowindex_list = []refix_values = []# Iterate over each groupfor group_name, group_df in grouped:    # Sort the group_df by some unique column    group_df = group_df.sort_values(by='trial')    # Calculate Euclidean distance for each row    for i, row in group_df.iterrows():        current_x = row['CURRENT_FIX_X']        current_y = row['CURRENT_FIX_Y']        # Calculate distance with every row above it        for j, prev_row in group_df.iloc[:i].iterrows():            current_index = prev_row['CURRENT_FIX_INDEX']            prev_x = prev_row['CURRENT_FIX_X']            prev_y = prev_row['CURRENT_FIX_Y']            distance = euclidean_distance(current_x, current_y, prev_x, prev_y)            # If distance is less than or equal to 58.93, store CURRENT_FIX_INDEX            if distance <= 58.93:                index_list.append(current_index)    refix_values.append(','.join(map(str, index_list))) #Add list of matching INDEX values to list of listsdf['refix_list'] = []# Iterate over the DataFrame to access each row and its indexfor index, row in df.iterrows():    # Assign the list to the current row in the specified column    df.at[index, refix_list] = refix_valuesprint(df)

From my limited knowledge, I'm guessing the issue is in the last block of code, but I'm not positive. Any help is appreciated!


Viewing all articles
Browse latest Browse all 15970

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>