Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

When using Apache Beam locally, how to utilize persistant caching for quries in BigQuery?

$
0
0

For development purposes, I'd like to cache the results of queries in BigQuery made by beam.io.ReadFromBigQuery connector - so I'd be able to load them quickly from the local file system when running the exact same query in the next times.

The problem is that I cannot run any PTransform before beam.io.ReadFromBigQuery to validate existence of caching and skip the reading from BigQuery as a result.

Currently I came up with two possible solutions:

  1. Creating a customized beam.DoFn for reading from BigQuery. It will include the caching mechanism, but might underperform comparing to the existing connector. One variation might be inheritance of the existing connector - but it will require knowledge of Beam "under the hood" - which might be overwhelming.
  2. Implementing the caching when building the pipeline, and the resulting step will be determined according the existence or inexistence of the cache (apache_beam.io.textio.ReadAllFromText or beam.io.ReadFromBigQuery, respectively).

Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>