Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

Polars: how to partition a big dataframe and save each one in parallel

$
0
0

I have a big Polars dataframe with a lot of groups. Now, I want to partition the dataframe by group and save all sub-dataframes. I can easily do this as follows:

for d in df.partition_by(["group1", "group2"]):    d.write_csv(f"~/{d[0, 'group1']}_{d[0, 'group2']}.csv")

However, the approach above is sequential and slow when the df is very large and has a whole lot of partitions.

Is there any Polars native way to parallelize it (the code section above)?

If not, how can I do it in a Python native way instead?


Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>