From a Google Data Analytics Certificate subqueries exercise:
SELECT starttime, start_station_id, tripduration, ( SELECT ROUND(AVG(tripduration),2), FROM bigquery-public-data.new_york_citibike.citibike_trips WHERE start_station_id = outer_trips.start_station_id ) AS avg_duration_for_station, ( ROUND(tripduration - ( SELECT AVG(tripduration) FROM bigquery-public-data.new_york_citibike.citibike_trips WHERE start_station_id = outer_trips.start_station_id),2) ) AS difference_from_avgFROM bigquery-public-data.new_york_citibike.citibike_trips AS outer_tripsORDER BY difference_from_avg DESCLIMIT 25;
The query was run on BigQuery using the official dataset new_york_citibike you can access within the BigQuery workspace.
Both the subqueries query the same dataset citibike_trips, as does the outer query. I don't understand what purpose the WHERE clauses in the subqueries serve.According to the explanation,
WHERE tells the query to link the start_station_id with the output of the query: a new column labeled avg_duration_for_station
What do they mean?
If I run the query without the WHERE clauses, I get slightly different results in the "avg_duration_for_station" column: