Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

Speeding up a rolling sum calculation?

$
0
0

I'm doing some work with a fairly large amount of (horse racing!) data for a project, calculating rolling sums of values for various different combinations of data - thus I need to streamline it as much as possible.

Essentially I am:

  • calculating the rolling calculation of a points field over time
  • calculating this for various grouped combinations of data [in this case the combination of horse and trainer]
  • looking at the average of the value by group for the last 180 days of data through time

The rolling window calculation below works fine - but takes 8.2s [this is about 1/8 of the total dataset - hence would take 1m 5s]. I am looking for ideas of how to streamline this calculation as I'm looking to do it for a number of different combinations of data, and thus speed is of the essence. Thanks.

import pandas as pdimport timeurl = 'https://raw.githubusercontent.com/richsdixon/testdata/main/testdata.csv'df = pd.read_csv(url, parse_dates=True)df['RaceDate'] = pd.to_datetime(df['RaceDate'], format='mixed')df.sort_values(by='RaceDate', inplace=True)df['HorseRaceCount90d'] = (df.groupby(['Horse','Trainer'], group_keys=False)                                 .apply(lambda x: x.rolling(window='180D', on='RaceDate', min_periods=1)['Points'].mean()))

Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>