Need advice small sql delete query

Saahib · July 2016

Hi,
I have two tables with lots of records.
Table A:

| accessID | accessDate|

|  28392    | 2016-07-13
|  23422    | 2016-07-15
|  54645    | 2016-07-15
|  34534    | 2016-07-08

Now there another TABLE B:

| accessID | 

|  28592    |   
|  13422    |
|  33645    |
|  44534    |

Now wants to delete all those record from TABLE where TableB.accessID = TableA.accessID .

Currently using query :

 DELETE from TableA  where accessID  in (SELECT accessID FROM  TableB)

Now problem is that it is taking ages as TableB has around 25K records and TableA has around a 0.5 million records. I had worked on SQL briefly 5-6 years ago and now don't feel like reading at all again as forgot almost everything, now I need suggestion to it quickly as possible and also without affecting current DB design.

Falzo · July 2016

you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

raindog308 · July 2016

Every time I read something by @Falzo here, it's good advice.

Subselects are notoriously slow...the right solution is to use the set operation that Falzo describes.

Falzo said: EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

I'm often guilty of this myself on a regular basis, but really SQL is designed for set operations, not line-by-line cursor processing. You can do the latter, but you lose out on the true power of SQL.

elgs · July 2016

If this is still too slow:

DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);

try:

CREATE TABLEAA LIKE TABLEA;
INSERT INTO TABLEAA SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
DROP TABLE TABLEA;
RENAME TABLE TABLEAA TO TABLEA;

Removing index before the operation and rebuild them will make the process faster.

Saahib · July 2016

@Falzo said:
you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

Thanks for the advice, I am more inclined towards SQL only solution. What If only use INNER JOIN on your example and get accessID common in both as thats what I want ie. delete data from TableA which exist in tableB

DELETE FROM
tableA
JOIN
tableB ON tableA.accessID = tableB.accessID

And it is still taking lots of time (which will cause table lock), I can break it pieces and run using PHP in parts like (we can also delete data from tableB once it is deleted in tableA after comparison)

DELETE FROM 
   tableA, tableB
 JOIN 
    tableB ON tableA.accessID = tableB.accessID
LIMIT 1000

This way can run in batch of 1000 unless we get 0 query effected result. Hopefully LIMIT is effective here in joins. As said I am not sure about my Mysql knowledge anymore, JOINS are making me dizzy.

PS: How we specify which table goes for right side in RIGHT JOIN and which goes to LEFT in left joins ?

Saahib · July 2016

@elgs said:
If this is still too slow:

DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);

Do you think it will have some performance benefit although its again sub-select?

@elgs said:
try:
CREATE TABLEAA LIKE TABLEA;
INSERT INTO TABLEAA SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
DROP TABLE TABLEA;
RENAME TABLE TABLEAA TO TABLEA;
Removing index before the operation and rebuild them will make the process faster.

Creating copy of table is also in horizon however want to avoid though.

Falzo · July 2016

PS: How we specify which table goes for right side in RIGHT JOIN and which goes to LEFT in left joins ?

From the sql declaration of JOIN:

If there is no matching row for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:

SELECT left_tbl.*
  FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id
  WHERE right_tbl.id IS NULL;
This example finds all rows in left_tbl with an id value that is not present in right_tbl (that is, all rows in left_tbl with no corresponding row in right_tbl). This assumes that right_tbl.id is declared NOT NULL.

Thats what I meant with the other way round. Yet the examples gives a clue which is right and which is left ;-)

Go for the php script. Not more than ten lines which makes you sleep well and may come in handy if you maintain some garbage collection scripts at later time...

elgs · July 2016

@Saahib said:
@elgs said:
If this is still too slow:
DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
Do you think it will have some performance benefit although its again sub-select?

Yes, EXISTS will be faster than IN, but I'm afraid it may not be significantly faster.

@elgs said:
try:
CREATE TABLEAA LIKE TABLEA;
INSERT INTO TABLEAA SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
DROP TABLE TABLEA;
RENAME TABLE TABLEAA TO TABLEA;
Removing index before the operation and rebuild them will make the process faster.
Creating copy of table is also in horizon however want to avoid though.

This will be the fastest. If you remove the index and add them back after the execution of these statements, it will be even faster. But there is a tradeoff, these statements do not work in a transaction.

Saahib · July 2016

@elgs
I understand that creating duplicate table and working on it will be better, but can't use that for some reason here.

I tried
DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);

It took about again ages, more than 50 minutes so not an option.

Saahib · July 2016

@elgs
I understand that creating duplicate table and working on it will be better, but can't use that for some reason here.

I tried
DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);

It took about again ages, more than 50 minutes so not an option.

Also, I tried joins, what I needed was INNER JOIN to desired result but it also took almost an hour.

So, I think I am out of options for mysql only solution.

Now going to PHP way as suggested by @Falzo ,

So I fetched all accessID from tableB (generally 20K to 50K records).

Now how I am supposed to check it against tableA records and delete , will this be fine ...

Saahib · July 2016

_Sorry for multiple post but it seems whenever I include PHP snippet, there is captchatriggered which is misconfigured, hence all this mess posts. _

What I want to ask is that after I fetch all records from tableB ie. around 20K to 50K average, then what I am supposed to do, Iterate through each record and run following query one by one ?

while($row = $result->fetch_assoc()){
    $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);

That means almsot 50k new queries ie. one for each record..or there is some better way ?

Falzo · July 2016

@Saahib said:

>

while($row = $result->fetch_assoc()){
  $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);
That means almsot 50k new queries ie. one for each record..or there is some better way ?

+1 for this.

Are you able to share more infos about those tables involved? Like how much others fields per row etc.

I may find some spare time from monday to have a further look. Hours seems unreasonable. How about checking on the performance of the sql server itself? Like deleting all roes of table a with a simple condition likely matching all accessids ( e.g. > 1) just for the sake of using a condition... And of course checking on the time that will take.

Saahib · July 2016

@Falzo thanks for the comment.

tableA has just 2 fields while tableB has 1 only.

I will do test and will let you know.

PS: I did join test for DELETE with LIMIT , and did same with IN statement, both took almost similar time.

Saahib · July 2016

Alright, done a simple delete test based on condition as you asked :

mysql> DELETE  FROM tableA WHERE accessDate =  '2016-07-24';
Query OK, 14691 rows affected (0.18 sec)

Server seems to be fine to me..

Also, when I tried with PHP:
while($row = $result->fetch_assoc()){
$db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);

There was 100% mysql usage consistently but I guess will work as its doesn't blocked other queries since other quiries those tables can easily slip between each of these query. But script ended before completion , we can't set any predefined time as records increases and decreases heavily.

Need to do it batches with a way to mark processed data.

teamacc · July 2016

WHERE IN with a subquery will run that subquery for every main item.

Saahib · July 2016

Well, I am out of ideas as can't change table structure ..

Falzo · July 2016

@teamacc said:
WHERE IN with a subquery will run that subquery for every main item.

thanks for confirming ;-)

Saahib said: Alright, done a simple delete test based on condition as you asked :
mysql> DELETE FROM tableA WHERE accessDate = '2016-07-24';

Query OK, 14691 rows affected (0.18 sec)
Server seems to be fine to me..

yes, and thanks for that follow up! I agree, that this seems reasonable as a result regarding decent server performance...

Saahib said: Need to do it batches with a way to mark processed data.

somewhere in this thread you mentioned that you won't need the entry in table b after deleting from table a. so you also should delete that entry too after deleting from table a.

this way you could simply limit the first query on table b to a batch of something like 1k entries which you run through your while loop deleting both entries from A and B and therefore won't need a special mark, as size of table B should decrease automatically with every batch until there were no more entries.

depending on your use case and the way those accessID are generated you would also be able to select which accessid to grab first and prune, e.g. lowest IDs = oldest (if that is the case)

you could simply fit the size of that limit for the initial query to your needs and time you want to give the process to run. and in addition may adjust via a timed interval how often you need to run this, to keep the number of entries low...

yet it bothers me that a single mysql request by simply joining the tables takes that long going for a test case if I can get my hands on a comparable table to play with ^^

elgs · July 2016

If I were you, and if you don't care about the ACID in this case. I don't see any reason that I would not use the following scripts:

CREATE TABLEA_TMP LIKE TABLEA;
INSERT INTO TABLEA_TMP SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
RENAME TABLE TABLEA TO TABLEA_BACKUP;
RENAME TABLE TABLEA_TMP TO TABLEA;

-- if everything is ok, manually drop TABLEA_BACKUP

This actually has a good side effect of rebuilding and shrinking TABLE_A.

Falzo · July 2016

I did some tests on a table A with nearly one million entries (log from a shop) and with even more columns in it and a related table of comparable IDs for which I reduced the number of rows to 50k to reach a comparable layout.

as stated in my first comment exactly this query

DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

worked as intended and gave a result like:

50004 Datensätze gelöscht. ( Die Abfrage dauerte 0.2357 Sekunden )

as supposed there were only rows deleted from table A.
you could also multi delete from both tables by adding table B like:

DELETE tableA, tableB FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

this has been run on a somewhat optimized mysql-server, all tables were innodb.

to be comparable I also tested a simple delete from tablea where id >1 which resulted in something like:

812576 Datensätze gelöscht. ( Die Abfrage dauerte 1.6256 Sekunden )

so I can't tell why joined table-operations take so long on your system. maybe you run into some limits memory-wise and a lot of disk get involved. are your tables innodb too?

Saahib · July 2016

Thanks for taking time into it and performing those tests. Your results made me worried now.

I have MyISAM however, I also tried with InnoDB, same results. This is an OpenVZ container with 8GB ram and regular sata disk on host node, there are other 2 CT on this node and almost dormant. So all disk IO is available.

With following is the config :

max_connections=440
query_cache_type=0 
thread_cache_size = 8K  
query_cache_size = 128M
query_cache_limit = 148M
 key_buffer_size = 128M
 innodb_file_per_table=1
 table_open_cache=5000
 open_files_limit=10000
 max_allowed_packet=268435456
 open_files_limit=10000
 interactive_timeout=300
 wait_timeout=300
 default-storage-engine=MyISAM

innodb_buffer_pool_size=500M

Any idea where I can make improvement without actually changing hardware atm?

Also, I don't see any limit hit in /proc/user_beancounters for this container as they mostly set to unlimited.

Saahib · July 2016

Btw, I see no DISK activity with iotop while CPU usage by mysql is 100% , where is problem ?

netomx · July 2016

On the php script ending without completion, try php cli

Saahib · July 2016

@netomx said:
On the php script ending without completion, try php cli

Well that can do and I can also consider @elgs approach however, as @Falzo recently posted result of his finding that it took him less than a second to do similar query on his server, I am baffled. Probably my setup has fault, need to first look into that now.

Falzo · July 2016

I just did further tests and have to admit, my first test setup probably wasn't as comparable to your table layout as I thought.

I now created a table with around 30k rows of different numbers like your table B and a table with around 700k entries including a column with those numbers but multiple occurences of them. (my first setup probably only had a unique set of numbers on table B where each of those only occured once in table A ...)

now that makes a difference even with the LEFT JOIN. it takes about 40 seconds to do just a join of about the first 50k rows (tested with a select and limit) so the join itself also eats up time when there are multiple hits for that ON clause

at least I was able to reproduce your problem now and confirm the issues you are experiencing... I need to think about another approach to do this with mysql directly ,-)

the strategy @elgs mentioned may help if one could choose a performant WHERE clause to select the values which should be kept in table A.

this kind of operation is referenced as an alternative to delete itself in the mysql documentation too: http://dev.mysql.com/doc/refman/5.7/en/delete.html (read paragraph 'InnoDB Tables')

Falzo · July 2016

PS: sorry to have you scared for a moment, that definitely wasn't my intention... was just probing around to find the right setup and reproduce the problem after all ;-)

Saahib · July 2016

@Falzo said:
PS: sorry to have you scared for a moment, that definitely wasn't my intention... was just probing around to find the right setup and reproduce the problem after all ;-)

Still its an issue, I think problem is somewhere else.. I am also using SELECT to see JOINS for test, taking ages.. even selecting with limit 500 is taking about 1 minute 20 seconds ... something is not good.

Btw, just to clarify, all numbers in tableA are unique, they are only 10 digit long , same with tableB, the DB was designed by someone else so can't change its layout, they are using varchar to store those numbers though, if that makes any difference.

mtoledoce · July 2016

@Saahib said:
_Sorry for multiple post but it seems whenever I include PHP snippet, there is captchatriggered which is misconfigured, hence all this mess posts. _

What I want to ask is that after I fetch all records from tableB ie. around 20K to 50K average, then what I am supposed to do, Iterate through each record and run following query one by one ?
while($row = $result->fetch_assoc()){
  $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);
That means almsot 50k new queries ie. one for each record..or there is some better way ?

I would like to know the process to get TableB

after perform this batch delete you may need purge the TableB right?
how many times you need perform this delete process?
can you filter and get a subset of rows (tableC) before delete in tableA?
why accessID row persist in tableB when was deleted in tableA?

Eobble · July 2016

@Falzo said:
you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

Wouldn't he need a inner join instead of left join for his purpose?

Saahib · July 2016

@mtoledoce

TableB has to be kept until new data arives.

Need it once a day

Can't really make out from 3rd one, filter subset ..ie. ?

For fourth one, not sure what it has to do but, sometimes even if Deleted in tableA, they again get same accessID in tableA later, so could be again in tableB ...

Saahib · July 2016

@Eobble said:

@Falzo said:
you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

Wouldn't he need a inner join instead of left join for his purpose?

Inner Join will do exactly I need, even though Left join also does what I need because when using tableB as left on tableA, will only get what is common in both. Still same result because if its in tableB then must be in tableA.

Tried both INNER and LEFT join, here issue is about super bad performance.

mtoledoce · July 2016

wow! how many people you manage ahh ..if accessID is a code and not a unique identifier for the record.
Well, you are doing something really bad, you can't question tableA with the same data over and over again.

Howdy, Stranger!

Categories

In this Discussion

Need advice small sql delete query

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Need advice small sql delete query

Comments