I am currently querying data from a Parquet data lake stored in Azure Data Lake Gen2
via Azure Synapse
(Serverless SQL pools) and through Grafana
with Microsoft SQL Server
as my Grafana data source.
My query looks as below:
SELECT DATEADD(SECOND, DATEDIFF(SECOND, '2020', t) / (200/1000.0) * (200/1000.0), '2020') AS time, AVG(accelerationx) as AVG_accelerationxFROM OPENROWSET( BULK 'https://cssdatalakestoragegen2.dfs.core.windows.net/cssdatalakestoragegen2filesystem/3BA199E2/CAN2_gnssimu/*/*/*/*', FORMAT = 'PARQUET' ) AS rWHERE t BETWEEN '2020-10-28T14:35:31Z' AND '2020-10-28T14:38:10Z'GROUP BY DATEDIFF(SECOND, '2020', t) / (200/1000.0)ORDER BY timeOFFSET 0 ROWS;
This results in data grouped to a 1 second resolution, even though the expected result was data grouped to a 200 ms resolution. The original file is stored at a 100 ms resolution.
The Parquet file I am trying to query is stored as follows in Azure:
https://cssdatalakestoragegen2.dfs.core.windows.net/cssdatalakestoragegen2filesystem/3BA199E2/CAN2_gnssimu/2020/10/28/00000014_00000001.parquet
I have attached the Parquet file in question:https://canlogger1000.csselectronics.com/files/temp/00000014_00000001.parquet
I assume my query is causing the results to be aggregated to SECOND
basis instead of MILLISECOND
basis - but replacing SECOND
in the query (and directly parsing the 200 ms instead of 0.2 seconds) causes an overflow error as below:
convert frame from rows error: mssql: The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.