Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Need advice small sql delete query
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Need advice small sql delete query

SaahibSaahib Host Rep, Veteran
edited July 2016 in Help

Hi,
I have two tables with lots of records.
Table A:

| accessID | accessDate|

|  28392    | 2016-07-13
|  23422    | 2016-07-15
|  54645    | 2016-07-15
|  34534    | 2016-07-08

Now there another TABLE B:

| accessID | 

|  28592    |   
|  13422    |
|  33645    |
|  44534    |

Now wants to delete all those record from TABLE where TableB.accessID = TableA.accessID .

Currently using query :

 DELETE from TableA  where accessID  in (SELECT accessID FROM  TableB)

Now problem is that it is taking ages as TableB has around 25K records and TableA has around a 0.5 million records. I had worked on SQL briefly 5-6 years ago and now don't feel like reading at all again as forgot almost everything, now I need suggestion to it quickly as possible and also without affecting current DB design.

«1

Comments

  • FalzoFalzo Member
    edited July 2016

    you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

    I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

    if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

    DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

    TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

    EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

    Thanked by 3raindog308 ehab netomx
  • raindog308raindog308 Administrator, Veteran

    Every time I read something by @Falzo here, it's good advice.

    Subselects are notoriously slow...the right solution is to use the set operation that Falzo describes.

    Falzo said: EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

    I'm often guilty of this myself on a regular basis, but really SQL is designed for set operations, not line-by-line cursor processing. You can do the latter, but you lose out on the true power of SQL.

    Thanked by 1Falzo
  • elgselgs Member
    edited July 2016

    If this is still too slow:

    DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    

    try:

    CREATE TABLEAA LIKE TABLEA;
    INSERT INTO TABLEAA SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    DROP TABLE TABLEA;
    RENAME TABLE TABLEAA TO TABLEA;
    

    Removing index before the operation and rebuild them will make the process faster.

  • SaahibSaahib Host Rep, Veteran
    edited July 2016

    @Falzo said:
    you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

    I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

    if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

    TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

    EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

    Thanks for the advice, I am more inclined towards SQL only solution. What If only use INNER JOIN on your example and get accessID common in both as thats what I want ie. delete data from TableA which exist in tableB

    DELETE FROM
    tableA
    JOIN
    tableB ON tableA.accessID = tableB.accessID

    And it is still taking lots of time (which will cause table lock), I can break it pieces and run using PHP in parts like (we can also delete data from tableB once it is deleted in tableA after comparison)

    DELETE FROM 
       tableA, tableB
     JOIN 
        tableB ON tableA.accessID = tableB.accessID
    LIMIT 1000
    

    This way can run in batch of 1000 unless we get 0 query effected result. Hopefully LIMIT is effective here in joins. As said I am not sure about my Mysql knowledge anymore, JOINS are making me dizzy.

    PS: How we specify which table goes for right side in RIGHT JOIN and which goes to LEFT in left joins ?

  • SaahibSaahib Host Rep, Veteran
    edited July 2016

    @elgs said:
    If this is still too slow:

    DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    

    Do you think it will have some performance benefit although its again sub-select?

    @elgs said:
    try:

    CREATE TABLEAA LIKE TABLEA;
    INSERT INTO TABLEAA SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    DROP TABLE TABLEA;
    RENAME TABLE TABLEAA TO TABLEA;
    

    Removing index before the operation and rebuild them will make the process faster.

    Creating copy of table is also in horizon however want to avoid though.

  • FalzoFalzo Member

    PS: How we specify which table goes for right side in RIGHT JOIN and which goes to LEFT in left joins ?

    From the sql declaration of JOIN:

    If there is no matching row for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:
    
    SELECT left_tbl.*
      FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id
      WHERE right_tbl.id IS NULL;
    This example finds all rows in left_tbl with an id value that is not present in right_tbl (that is, all rows in left_tbl with no corresponding row in right_tbl). This assumes that right_tbl.id is declared NOT NULL. 
    

    Thats what I meant with the other way round. Yet the examples gives a clue which is right and which is left ;-)

    Go for the php script. Not more than ten lines which makes you sleep well and may come in handy if you maintain some garbage collection scripts at later time...

  • elgselgs Member
    edited July 2016

    @Saahib said:

    @elgs said:
    If this is still too slow:

    DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    

    Do you think it will have some performance benefit although its again sub-select?

    Yes, EXISTS will be faster than IN, but I'm afraid it may not be significantly faster.

    @elgs said:
    try:

    CREATE TABLEAA LIKE TABLEA;
    INSERT INTO TABLEAA SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    DROP TABLE TABLEA;
    RENAME TABLE TABLEAA TO TABLEA;
    

    Removing index before the operation and rebuild them will make the process faster.

    Creating copy of table is also in horizon however want to avoid though.

    This will be the fastest. If you remove the index and add them back after the execution of these statements, it will be even faster. But there is a tradeoff, these statements do not work in a transaction.

  • SaahibSaahib Host Rep, Veteran
    edited July 2016

    @elgs
    I understand that creating duplicate table and working on it will be better, but can't use that for some reason here.

    I tried
    DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);

    It took about again ages, more than 50 minutes so not an option.

  • SaahibSaahib Host Rep, Veteran

    @elgs
    I understand that creating duplicate table and working on it will be better, but can't use that for some reason here.

    I tried
    DELETE FROM TABLEA WHERE EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);

    It took about again ages, more than 50 minutes so not an option.

    Also, I tried joins, what I needed was INNER JOIN to desired result but it also took almost an hour.

    So, I think I am out of options for mysql only solution.

    Now going to PHP way as suggested by @Falzo ,

    So I fetched all accessID from tableB (generally 20K to 50K records).

    Now how I am supposed to check it against tableA records and delete , will this be fine ...

  • SaahibSaahib Host Rep, Veteran

    _Sorry for multiple post but it seems whenever I include PHP snippet, there is captchatriggered which is misconfigured, hence all this mess posts. _

    What I want to ask is that after I fetch all records from tableB ie. around 20K to 50K average, then what I am supposed to do, Iterate through each record and run following query one by one ?

    while($row = $result->fetch_assoc()){
        $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);
    

    That means almsot 50k new queries ie. one for each record..or there is some better way ?

  • FalzoFalzo Member

    @Saahib said:

    >

    while($row = $result->fetch_assoc()){
      $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);
    

    That means almsot 50k new queries ie. one for each record..or there is some better way ?

    +1 for this.

    Are you able to share more infos about those tables involved? Like how much others fields per row etc.

    I may find some spare time from monday to have a further look. Hours seems unreasonable. How about checking on the performance of the sql server itself? Like deleting all roes of table a with a simple condition likely matching all accessids ( e.g. > 1) just for the sake of using a condition... And of course checking on the time that will take.

  • SaahibSaahib Host Rep, Veteran

    @Falzo thanks for the comment.

    tableA has just 2 fields while tableB has 1 only.

    I will do test and will let you know.

    PS: I did join test for DELETE with LIMIT , and did same with IN statement, both took almost similar time.

  • SaahibSaahib Host Rep, Veteran
    edited July 2016

    Alright, done a simple delete test based on condition as you asked :

    mysql> DELETE  FROM tableA WHERE accessDate =  '2016-07-24';
    Query OK, 14691 rows affected (0.18 sec)
    

    Server seems to be fine to me..

    Also, when I tried with PHP:
    while($row = $result->fetch_assoc()){
    $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);

    There was 100% mysql usage consistently but I guess will work as its doesn't blocked other queries since other quiries those tables can easily slip between each of these query. But script ended before completion , we can't set any predefined time as records increases and decreases heavily.

    Need to do it batches with a way to mark processed data.

  • WHERE IN with a subquery will run that subquery for every main item.

    Thanked by 1Falzo
  • SaahibSaahib Host Rep, Veteran

    Well, I am out of ideas as can't change table structure ..

  • FalzoFalzo Member
    edited July 2016

    @teamacc said:
    WHERE IN with a subquery will run that subquery for every main item.

    thanks for confirming ;-)

    Saahib said: Alright, done a simple delete test based on condition as you asked :
    mysql> DELETE FROM tableA WHERE accessDate = '2016-07-24';

    Query OK, 14691 rows affected (0.18 sec)
    Server seems to be fine to me..

    yes, and thanks for that follow up! I agree, that this seems reasonable as a result regarding decent server performance...

    Saahib said: Need to do it batches with a way to mark processed data.

    somewhere in this thread you mentioned that you won't need the entry in table b after deleting from table a. so you also should delete that entry too after deleting from table a.

    this way you could simply limit the first query on table b to a batch of something like 1k entries which you run through your while loop deleting both entries from A and B and therefore won't need a special mark, as size of table B should decrease automatically with every batch until there were no more entries.

    depending on your use case and the way those accessID are generated you would also be able to select which accessid to grab first and prune, e.g. lowest IDs = oldest (if that is the case)

    you could simply fit the size of that limit for the initial query to your needs and time you want to give the process to run. and in addition may adjust via a timed interval how often you need to run this, to keep the number of entries low...

    yet it bothers me that a single mysql request by simply joining the tables takes that long :/ going for a test case if I can get my hands on a comparable table to play with ^^

  • elgselgs Member
    edited July 2016

    If I were you, and if you don't care about the ACID in this case. I don't see any reason that I would not use the following scripts:

    CREATE TABLEA_TMP LIKE TABLEA;
    INSERT INTO TABLEA_TMP SELECT * FROM TABLEA WHERE NOT EXISTS (SELECT 1 FROM TABLEB WHERE TABLEA.accessID=TABLEB.accessID);
    RENAME TABLE TABLEA TO TABLEA_BACKUP;
    RENAME TABLE TABLEA_TMP TO TABLEA;
    

    -- if everything is ok, manually drop TABLEA_BACKUP

    This actually has a good side effect of rebuilding and shrinking TABLE_A.

  • FalzoFalzo Member
    edited July 2016

    I did some tests on a table A with nearly one million entries (log from a shop) and with even more columns in it and a related table of comparable IDs for which I reduced the number of rows to 50k to reach a comparable layout.

    as stated in my first comment exactly this query

    DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

    worked as intended and gave a result like:

    50004 Datensätze gelöscht. ( Die Abfrage dauerte 0.2357 Sekunden )

    as supposed there were only rows deleted from table A.
    you could also multi delete from both tables by adding table B like:

    DELETE tableA, tableB FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

    this has been run on a somewhat optimized mysql-server, all tables were innodb.

    to be comparable I also tested a simple delete from tablea where id >1 which resulted in something like:

    812576 Datensätze gelöscht. ( Die Abfrage dauerte 1.6256 Sekunden )

    so I can't tell why joined table-operations take so long on your system. maybe you run into some limits memory-wise and a lot of disk get involved. are your tables innodb too?

  • SaahibSaahib Host Rep, Veteran
    edited July 2016

    Thanks for taking time into it and performing those tests. Your results made me worried now.

    I have MyISAM however, I also tried with InnoDB, same results. This is an OpenVZ container with 8GB ram and regular sata disk on host node, there are other 2 CT on this node and almost dormant. So all disk IO is available.

    With following is the config :

    max_connections=440
    query_cache_type=0 
    thread_cache_size = 8K  
    query_cache_size = 128M
    query_cache_limit = 148M
     key_buffer_size = 128M
     innodb_file_per_table=1
     table_open_cache=5000
     open_files_limit=10000
     max_allowed_packet=268435456
     open_files_limit=10000
     interactive_timeout=300
     wait_timeout=300
     default-storage-engine=MyISAM
    
    innodb_buffer_pool_size=500M
    

    Any idea where I can make improvement without actually changing hardware atm?

    Also, I don't see any limit hit in /proc/user_beancounters for this container as they mostly set to unlimited.

  • SaahibSaahib Host Rep, Veteran

    Btw, I see no DISK activity with iotop while CPU usage by mysql is 100% , where is problem ?

  • netomxnetomx Moderator, Veteran

    On the php script ending without completion, try php cli

  • SaahibSaahib Host Rep, Veteran

    @netomx said:
    On the php script ending without completion, try php cli

    Well that can do and I can also consider @elgs approach however, as @Falzo recently posted result of his finding that it took him less than a second to do similar query on his server, I am baffled. Probably my setup has fault, need to first look into that now.

    Thanked by 2Falzo netomx
  • FalzoFalzo Member

    I just did further tests and have to admit, my first test setup probably wasn't as comparable to your table layout as I thought.

    I now created a table with around 30k rows of different numbers like your table B and a table with around 700k entries including a column with those numbers but multiple occurences of them. (my first setup probably only had a unique set of numbers on table B where each of those only occured once in table A ...)

    now that makes a difference even with the LEFT JOIN. it takes about 40 seconds to do just a join of about the first 50k rows (tested with a select and limit) so the join itself also eats up time when there are multiple hits for that ON clause :(

    at least I was able to reproduce your problem now and confirm the issues you are experiencing... I need to think about another approach to do this with mysql directly ,-)

    the strategy @elgs mentioned may help if one could choose a performant WHERE clause to select the values which should be kept in table A.

    this kind of operation is referenced as an alternative to delete itself in the mysql documentation too: http://dev.mysql.com/doc/refman/5.7/en/delete.html (read paragraph 'InnoDB Tables')

  • FalzoFalzo Member

    PS: sorry to have you scared for a moment, that definitely wasn't my intention... was just probing around to find the right setup and reproduce the problem after all ;-)

  • SaahibSaahib Host Rep, Veteran
    edited July 2016

    @Falzo said:
    PS: sorry to have you scared for a moment, that definitely wasn't my intention... was just probing around to find the right setup and reproduce the problem after all ;-)

    Still its an issue, I think problem is somewhere else.. I am also using SELECT to see JOINS for test, taking ages.. even selecting with limit 500 is taking about 1 minute 20 seconds ... something is not good.

    Btw, just to clarify, all numbers in tableA are unique, they are only 10 digit long , same with tableB, the DB was designed by someone else so can't change its layout, they are using varchar to store those numbers though, if that makes any difference.

  • mtoledocemtoledoce Member
    edited July 2016

    @Saahib said:
    _Sorry for multiple post but it seems whenever I include PHP snippet, there is captchatriggered which is misconfigured, hence all this mess posts. _

    What I want to ask is that after I fetch all records from tableB ie. around 20K to 50K average, then what I am supposed to do, Iterate through each record and run following query one by one ?

    while($row = $result->fetch_assoc()){
      $db->query("DELETE FROM tableA WHERE accessID = $row['accessID']);
    

    That means almsot 50k new queries ie. one for each record..or there is some better way ?

    I would like to know the process to get TableB

    • after perform this batch delete you may need purge the TableB right?

    • how many times you need perform this delete process?

    • can you filter and get a subset of rows (tableC) before delete in tableA?

    • why accessID row persist in tableB when was deleted in tableA?

  • EobbleEobble Member

    @Falzo said:
    you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

    I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

    if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

    DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

    TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

    EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

    Wouldn't he need a inner join instead of left join for his purpose?

  • SaahibSaahib Host Rep, Veteran

    @mtoledoce

    TableB has to be kept until new data arives.

    Need it once a day

    Can't really make out from 3rd one, filter subset ..ie. ?

    For fourth one, not sure what it has to do but, sometimes even if Deleted in tableA, they again get same accessID in tableA later, so could be again in tableB ...

  • SaahibSaahib Host Rep, Veteran

    @Eobble said:

    @Falzo said:
    you probably would want to avoid that inner SELECT as I suppose mysql runs this for every single row of table A it is checking.

    I'd suggest using JOIN to connect both tables and DELETE only from one table. if you join the table A as right table on table B, it probably should get only the entries to the accessID s which are in table B.

    if you do it the other way round, you probably have to do some check, if there are empty fields joined, those would be the rows you want to keep.

    DELETE tableA FROM tableB LEFT JOIN tableA ON tableB.accessID = tableA.accessID

    TEST on a duplicate or something like that before, it's just a quick thought not a guaranteed work ;-)

    EDIT: if you can access the db via php, as alternative write a small script which does the select of accessIDs from tableB once and than runs a delete command against table A for every entry, I'd guess that also would be a lot faster...

    Wouldn't he need a inner join instead of left join for his purpose?

    Inner Join will do exactly I need, even though Left join also does what I need because when using tableB as left on tableA, will only get what is common in both. Still same result because if its in tableB then must be in tableA.

    Tried both INNER and LEFT join, here issue is about super bad performance.

  • wow! how many people you manage ahh ..if accessID is a code and not a unique identifier for the record.
    Well, you are doing something really bad, you can't question tableA with the same data over and over again.

Sign In or Register to comment.