Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 18095

Creating an array of quarterly dates in azure synapse pipeline

$
0
0

I am working in the Azure Synapse Pipeline environment and I am trying to initialize an array that will contain the dates "03-31", "06-30", "09-30" and "12-31" for dates within the range of 2018 to the current year. So currently, it would be all quarterly dates through "2024-03-31". I am building this out to do a rest API call where I will pass each of the dates in the array to pull reports to my ADLS. I already have the ADLS piece figured out, its just a matter of actually getting this array to work.

I have tried several solutions to get this working, but my experience in synapse is quite limited. I started by trying to create a pyspark notebook that would be ran at the start of the pipeline and generate a .json with all the necessary quarterly dates in the ADLS. I was able to get this to work, but then, when I pulled the .json into the pipeline, I couldn't convert it to an array variable.

I have settled with generating the quarterly dates array manually for the time being. But I can't help to expect there is a simple approach to solve this issue. It doesn't seem like it should be that complicated.

def getDateRange(start_date):# Define start date and current datecurrent_date_str = spark.sql("SELECT CAST(current_date() AS STRING)").collect()[0][0]print("Current Date:")print(current_date_str)end_year = int(current_date_str[:4]) + 1# Generate a sequence of dates within the range up to the current datedate_range_df = spark.range(0, (365 * (end_year - 2018) + 1)).selectExpr("date_add('{}', CAST(id AS INT)) as date".format(start_date))# Filter out dates beyond the current datedate_range_df = date_range_df.filter(col("date") <= current_date())# Extract month and daydate_range_df = date_range_df.withColumn("month", expr("month(date)")).withColumn("day", expr("day(date)"))# Filter dates to keep only the end of each quarterquarterly_dates_df = date_range_df.filter(    (date_range_df.month == 3) & (date_range_df.day == 31) |    (date_range_df.month == 6) & (date_range_df.day == 30) |    (date_range_df.month == 9) & (date_range_df.day == 30) |    (date_range_df.month == 12) & (date_range_df.day == 31))# Format dates as strings in 'MM-dd-yyyy' formatquarterly_dates_df = quarterly_dates_df.withColumn("date_string", date_format("date", "MM-dd-yyyy"))return quarterly_dates_df

There is an additional piece of code to get this data into the ADLS system, but it contains sensitive info. I don't have the pipeline piece I had built out to share, but I was trying to do a lookup, then store the output to an array, but the array was storing as a single element instead of as a list.


Viewing all articles
Browse latest Browse all 18095

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>