I want to create a column that is a cumulative sum over a group column but the cumulative sum only happens when the 'days' column meets a certain condition. I have come up with what I regard as a "duct tape" solution, there must be a more elegant way.
import polars as pl# Create a DataFrame with literal valuesdf = pl.DataFrame({"days": [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 6, 7, 1],"amount": [100, 200, 150, 300, 250, 180, 220, 280, 210, 320,21,456,111],"group": ["A", "B", "A", "C", "B", "A", "C", "B", "C", "A","C","B","B"]})# Display the DataFrameprint(df)#My duct tape solutiondf = ( df .with_columns( pl.when(pl.col("days") > 2) .then(pl.col("amount")) .otherwise(0).alias("3+days_amount") ) .with_columns( pl.col("3+days_amount").cum_sum().over("group").alias("group_cumsum") ))print(df)shape: (13, 3)┌──────┬────────┬───────┐│ days ┆ amount ┆ group ││ --- ┆ --- ┆ --- ││ i64 ┆ i64 ┆ str │╞══════╪════════╪═══════╡│ 0 ┆ 100 ┆ A ││ 1 ┆ 200 ┆ B ││ 2 ┆ 150 ┆ A ││ 3 ┆ 300 ┆ C ││ 4 ┆ 250 ┆ B ││…┆…┆…││ 3 ┆ 210 ┆ C ││ 4 ┆ 320 ┆ A ││ 6 ┆ 21 ┆ C ││ 7 ┆ 456 ┆ B ││ 1 ┆ 111 ┆ B │└──────┴────────┴───────┘shape: (13, 5)┌──────┬────────┬───────┬───────────────┬──────────────┐│ days ┆ amount ┆ group ┆ 3+days_amount ┆ group_cumsum ││ --- ┆ --- ┆ --- ┆ --- ┆ --- ││ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 │╞══════╪════════╪═══════╪═══════════════╪══════════════╡│ 0 ┆ 100 ┆ A ┆ 0 ┆ 0 ││ 1 ┆ 200 ┆ B ┆ 0 ┆ 0 ││ 2 ┆ 150 ┆ A ┆ 0 ┆ 0 ││ 3 ┆ 300 ┆ C ┆ 300 ┆ 300 ││ 4 ┆ 250 ┆ B ┆ 250 ┆ 250 ││…┆…┆…┆…┆…││ 3 ┆ 210 ┆ C ┆ 210 ┆ 510 ││ 4 ┆ 320 ┆ A ┆ 320 ┆ 320 ││ 6 ┆ 21 ┆ C ┆ 21 ┆ 531 ││ 7 ┆ 456 ┆ B ┆ 456 ┆ 706 ││ 1 ┆ 111 ┆ B ┆ 0 ┆ 706 │└──────┴────────┴───────┴───────────────┴──────────────┘
polars expressions seem very elegant generally, hoping there's something I am missing.