I have a query like this.
WITH all_products AS ( SELECT gtin, category, product_name, product_image, brand, manufacturer FROM `products`),client_products as ( SELECT * FROM all_products WHERE client_id = "usdemoaccount" and is_client_product = true),competitor_products as ( SELECT * FROM all_products WHERE client_id = "usdemoaccount" and is_client_product = false);
Here computation for all_products
happens twice because its reference twice in the below code. But we need not do it twice and save on compute if re-use the above results.
The reason for this from BQ documentation mentions, non recursive CTE's are not materialized.
BigQuery only materializes the results of recursive CTEs, but does not materialize the results of non-recursive CTEs inside the WITH clause. If a non-recursive CTE is referenced in multiple places in a query, then the CTE is executed once for each reference.
I'm exploring alternatives to address this issue. While I understand that using temporary tables is one option, I'm concerned about potential drawbacks such as increased storage costs and concurrency issues, especially when the same API is used by multiple users with different parameters.
What are some effective strategies or best practices for optimizing the performance of CTEs in BigQuery? Specifically, I'm interested in approaches that can help materialize non-recursive CTEs or improve query performance without resorting to temporary tables.
Even with temporary tables, if there is a option to automatically clean up those tables to avoid storage costs and handle concurrently out of the box, that should also be preferable.