Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

python + SQLAlchemy: deleting taking 250,000x longer than querying the same data

$
0
0

I am accessing a PostGres database using python and SQL ALchemy. I can't figure out how to delete in a timely manner. The query is fast, the delete takes 250,000x longer

I have a table, 'RP', that has 92M rows. I am trying to delete some of them.

I have some code that finds the objects I want to delete that works and runs fast, it is basically:

import sqlalchemy as sa...with Session(engine, autobegin=True) as session:...    r_count = 0    for image in images:        sub_r = sa.select(RP).filter_by(d_id=dive_image.id)        count = session.execute(sa.select(sa.func.count()).select_from(sub_roi)).scalar_one()        r_count += count    #print timing and count info here

This loop executes ~500 times in ~0.2 seconds, because len(images)~500, each time counting ~10-200 individual rows, for a total of ~70,000 rows that I want to delete.

When I add a delete command below, it takes much longer. Each of the 500 passes through the loop, which originally took <0.01 second to execute, now is taking 250 seconds for each iteration, meaning that it will take 36+ hours to delete these 70,000 rows (which were found by a query in <0.5 seconds).

import sqlalchemy as sa...with Session(engine, autobegin=True) as session:...    r_count = 0    for image in images:        sub_r = sa.select(RP).filter_by(d_id=image.id)        count = session.execute(sa.select(sa.func.count()).select_from(sub_roi)).scalar_one()        r_count += count    #print timing and count info here#New Delete Code Below    for image in images:        session.query(RP).filter_by(d_id=image.id).delete()        #session.commit()  #tried with this inserted and removed, doesn't seem to matter        # Deleting using the session does not go faster        # del_rp = sa.delete(RP).where(RP.d_id ==image.id)        # session.execute(del_rp)        # Deleting one at a time also doesn't seem to go faster        # rois = session.query(RP).filter_by(d_id=image.id).all()        # for r in rois:        #     session.delete(r)        #print timing for each loop here, ~260+ seconds for each loop

To summarize: I tried 3 main strategies. I tried doing session.query().filter_by().delete(), I tried doing session.execute(sa.delete().where()), and I tried doing a for loop on session.query().filter_by().all(), and then doing a session.delete(one). I also tried including session.commit() at points in the middle.

I am expecting it to execute much faster. I see no reason why the query should go fast and the delete take many orders of magnitude longer.

I am the only user on the server. So there is no other possible bottleneck besides this code. The commented out methods also seem to take many hundreds of seconds for a few dozen deletes (it would take me longer to gather precise timing info)

I am using the pgAdmin dashboard, and in the 'Tuples Out' view, I see ~200-300 seconds of flatline, then 4,000 fetched, 12 billion returned. If I'm deleting ~140 objects in that pass through the loop, then that corresponds to each individual delete causing a fetch/return of the whole 92M table. Is there some way to tell the delete that I don't care for any return value (assuming that's what is happening?)

I suppose I could try to be more clever and accumulate all 70,000 rows and issue a single delete command, but based on the pgAdmin dashboard, it seems like this will not help, because it seems like the session.query().filter_by().delete() is getting broken up individually anyways.

What am I supposed to be doing differently? I have tried .delete(synchronize_session='fetch'), it doesn't seem to help, but I don't remember if I tried it in every permutation.

Maybe this should be a second question, but I also can't figure out a way to interrupt the code. It seems like if I send a ctrl+C, it waits until the ~250 second loop iteration is done. If I use the pgAdmin tool, I don't have permission to kill the session. I don't want a bunch of idle threads clogging up the server, so at this point I'm just being patient and waiting for the loop

**Edit: I checked the server as suggested. I see that the actual code being executed is **

DELETE FROM rp WHERE rp.d_id = 5254591 RETURNING rp.id

That should just be returning a single .id, correct?


Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>