Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

How to Efficiently Assign Date Ranges to Prioritized Items in Python?

$
0
0

I need to calculate the errors of some forecasts which are stored as time series in InfluxDB. Each forecast has a Tag associated with the guid in InfluxDB. However, for certain date ranges, you have more than one forecast because the user can run more than one. So, I have the problem: I have to say which forecasts to use to calculate the errors. For them I have some parameters that allow me to rank them. All the forecasts that the user has "officialized" will have priority first and then the most recent forecasts.The forecast information is also stored in my relational database, so I decided to bring those that have data for the dates that interest me (from the last date that has calculated errors to where I have real data), and order them by official, creation date.

So, in my service a list of dictionaries arrives ordered by priority, something like this:

executions_forecast = [            {"guid": "foo123","init_date": "2024-20-02T00:00:00Z","final_date": "2024-24-02T00:00:00Z","is_officialized": True,"created_at": "2024-02-16T00:00:00Z",            },            {"guid": "foo456","init_date": "2024-02-18T00:00:00Z","final_date": "2024-02-22T00:00:00Z","is_officialized": True,"created_at": "2024-02-14T00:00:00Z",            },            {"guid": "foo789","init_date": "2024-02-16T00:00:00Z","final_date": "2024-02-21T00:00:00Z","is_officialized": False,"created_at": "2024-02-15T00:00:00Z",            }            # ...        ]

(You should prioritize using foo 123, then foo 456, then foo 789)And I also have the date range in which I can calculate errors: next_datetime_error, last_datetime_realFinally, what I need is to have each execution with the corresponding date range to query the InfluxDB, something like:

{"foo123": ("2024-20-02T00:00:00Z", "2024-24-02T12:00:00Z"), "foo456": ("2024-18-02T00:00:00Z", "2024-22-02T12:00:00Z"), "foo789": ("2024-16-02T00:00:00Z", "2024-21-02T12:00:00Z")} 

I'm a little undecided on what the best approach is to achieve this. I also have pandas in the microservice so I could use it although I don't know if it is better to just use a loop. I have in favor that executions_forecast is ordered by priority, I was thinking of creating a df with all the dates and going through executions_forecast and filling it but I am not sure how to approach it efficiently. Or maybe I should go through it in reverse order of priority and overwrite. I'm not sure what the most efficient way to do it is.This is the code I have:

executions_ranges = {}        range_datetimes = date_range(start=next_datetime_error, end=last_datetime_real, freq="H")        df = DataFrame({'fecha': range_datetimes, 'bool': False})        upper_datetime = last_datetime_real        for execution in executions_forecast:            init_date = datetime.strptime(execution.get("init_date"), "%Y-%m-%dT%H:%M:%S%z")            final_date = datetime.strptime(execution.get("final_date"), "%Y-%m-%dT%H:%M:%S%z")            init_date, final_date = init_date.replace(tzinfo=None), final_date.replace(tzinfo=None)            # Here I must check that the execution has data on dates less than upper_datetime            # and fill the corresponding data in range_datetimes            upper_datetime = init_date            df.loc[(df['fecha'] >= init_date) & (df['fecha'] <= final_date), 'bool'] = True            executions_ranges[execution.get("guid")] = (init_date, final_date)

Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>