Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

KNN Across categories in postgis using indexing

$
0
0

I have a dataset of points of different types. For every point in the dataset I want to find the closest point in every category. I can achieve this but the compute time is very long and I'm struggling to get the query to use a spatial index for the KNN in tandem with the type information in an effiecient way.

Sample data generation

CREATE TYPE point_type AS ENUM ('1','2','3','4','5');CREATE TABLE points AS  SELECT ST_MakePoint(    1000*random(),    1000*random()    )::geometry(Point) AS geom,     ((random()*3)::int+1)::text::point_type  point_type,         pk  FROM generate_series(1,6000) pk;update pointsset point_type='5' where pk=999;

Index creation

create index points_geom_idx    on points using gist (geom);CREATE INDEX points_dual ON points (point_type, geom);

Query that works but is very slow but works

Because of distances KNN's being pulled first, then filtered by the type constraint after?

explain analysewith types as (select column1::point_type point_type from (values('1'), ('2'), ('3'), ('4'),('5')       ))SELECT c1.point_type,       c1.pk AS main_id,       b.pk  AS secondary_id,       c1.secondary_point_type,       b.secondary_point_type,       b.distanceFROM (SELECT c.point_type,             c.pk,             c.geom,             types.point_type secondary_point_type      FROM  points c          join types on true          ) c1         LEFT JOIN LATERAL ( SELECT c2.point_type,                                    c2.geom,                                    c2.pk,                                    c2.point_type secondary_point_type,                                    c1.geom <->c2.geom AS distance                             FROM points c2         where c1.pk <> c2.pk          and c1.secondary_point_type=c2.point_type                             ORDER BY distance                             LIMIT 1)  b on true;

Query that is very fast but doesn't provide correct resultsI believe this is because it's just getting the closest point, and if that point isn't of the correct type, the join ultimately fails, so no data is joined, leaving nulls for most results

explain analysewith types as (select column1::point_type point_type from (values('1'), ('2'), ('3'), ('4'),('5')       ))SELECT c1.point_type,       c1.pk AS main_id,       b.pk  AS secondary_id,       c1.secondary_point_type,       b.secondary_point_type,       b.distanceFROM (SELECT c.point_type,             c.pk,             c.geom,             types.point_type secondary_point_type      FROM  points c          join types on true          ) c1         LEFT JOIN LATERAL ( SELECT c2.point_type,                                    c2.geom,                                    c2.pk,                                    c2.point_type secondary_point_type,                                    c1.geom <->c2.geom AS distance                             FROM points c2         where c1.pk <> c2.pk                             ORDER BY distance                             LIMIT 1)  b on c1.secondary_point_type=b.secondary_point_type ;

I'm trying to achieve this query quickly, using the spatial index for all knn measures across all types. Thanks!

outputs for analyzefirst query:

Sort  (cost=29155.39..29230.39 rows=30000 width=28) (actual time=24533.167..24543.539 rows=30000 loops=1)"  Output: c.point_type, c.pk, c2.pk, ((""*VALUES*"".column1)::point_type), c2.point_type, ((c.geom <-> c2.geom))"  Sort Key: c2.point_type DESC  Sort Method: quicksort  Memory: 2409kB  Buffers: shared hit=180999  ->  Nested Loop Left Join  (cost=0.15..26924.49 rows=30000 width=28) (actual time=5.024..24430.122 rows=30000 loops=1)"        Output: c.point_type, c.pk, c2.pk, (""*VALUES*"".column1)::point_type, c2.point_type, ((c.geom <-> c2.geom))"        Buffers: shared hit=180999        ->  Nested Loop  (cost=0.00..499.07 rows=30000 width=72) (actual time=0.546..105.076 rows=30000 loops=1)"              Output: c.point_type, c.pk, c.geom, ""*VALUES*"".column1"              Buffers: shared hit=64              ->  Seq Scan on public.points c  (cost=0.00..124.00 rows=6000 width=40) (actual time=0.341..12.850 rows=6000 loops=1)                    Output: c.geom, c.point_type, c.pk                    Buffers: shared hit=64              ->  Materialize  (cost=0.00..0.09 rows=5 width=32) (actual time=0.001..0.006 rows=5 loops=6000)"                    Output: ""*VALUES*"".column1""                    ->  Values Scan on ""*VALUES*""  (cost=0.00..0.06 rows=5 width=32) (actual time=0.034..0.141 rows=5 loops=1)""                          Output: ""*VALUES*"".column1"        ->  Limit  (cost=0.15..0.86 rows=1 width=52) (actual time=0.802..0.803 rows=1 loops=30000)              Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, ((c.geom <-> c2.geom))              Buffers: shared hit=180935              ->  Result  (cost=0.15..4249.52 rows=5999 width=52) (actual time=0.800..0.800 rows=1 loops=30000)                    Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, (c.geom <-> c2.geom)"                    One-Time Filter: ((""*VALUES*"".column1)::point_type = (""*VALUES*"".column1)::point_type)"                    Buffers: shared hit=180935                    ->  Index Scan using points_geom_idx on public.points c2  (cost=0.15..500.15 rows=5999 width=40) (actual time=0.787..0.787 rows=1 loops=30000)                          Output: c2.geom, c2.point_type, c2.pk                          Order By: (c2.geom <-> c.geom)                          Filter: (c.pk <> c2.pk)                          Rows Removed by Filter: 1                          Buffers: shared hit=180935Settings: search_path = 'public, topology, tiger'Planning Time: 4.964 msExecution Time: 24553.107 ms

Second query:

QUERY PLANNested Loop  (cost=0.88..1197.38 rows=30000 width=28) (actual time=3.535..4538.832 rows=30000 loops=1)"  Output: c.point_type, c.pk, b.pk, (""*VALUES*"".column1)::point_type, b.secondary_point_type, b.distance"  Buffers: shared hit=36251  ->  Seq Scan on public.points c  (cost=0.00..124.00 rows=6000 width=40) (actual time=0.095..4.897 rows=6000 loops=1)        Output: c.geom, c.point_type, c.pk        Buffers: shared hit=64  ->  Hash Left Join  (cost=0.88..0.98 rows=5 width=48) (actual time=0.726..0.743 rows=5 loops=6000)"        Output: ""*VALUES*"".column1, b.pk, b.secondary_point_type, b.distance""        Hash Cond: ((""*VALUES*"".column1)::point_type = b.secondary_point_type)"        Buffers: shared hit=36187"        ->  Values Scan on ""*VALUES*""  (cost=0.00..0.06 rows=5 width=32) (actual time=0.001..0.008 rows=5 loops=6000)""              Output: ""*VALUES*"".column1"        ->  Hash  (cost=0.87..0.87 rows=1 width=16) (actual time=0.707..0.707 rows=1 loops=6000)              Output: b.pk, b.secondary_point_type, b.distance              Buckets: 1024  Batches: 1  Memory Usage: 9kB              Buffers: shared hit=36187              ->  Subquery Scan on b  (cost=0.15..0.87 rows=1 width=16) (actual time=0.701..0.703 rows=1 loops=6000)                    Output: b.pk, b.secondary_point_type, b.distance                    Buffers: shared hit=36187                    ->  Limit  (cost=0.15..0.86 rows=1 width=52) (actual time=0.700..0.700 rows=1 loops=6000)                          Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, ((c.geom <-> c2.geom))                          Buffers: shared hit=36187                          ->  Index Scan using points_geom_idx on public.points c2  (cost=0.15..4249.52 rows=5999 width=52) (actual time=0.695..0.695 rows=1 loops=6000)                                Output: NULL::point_type, NULL::geometry(Point), c2.pk, c2.point_type, (c.geom <-> c2.geom)                                Order By: (c2.geom <-> c.geom)                                Filter: (c.pk <> c2.pk)                                Rows Removed by Filter: 1                                Buffers: shared hit=36187Settings: search_path = 'public, topology, tiger'Planning Time: 3.206 msExecution Time: 4549.481 ms

Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>