Skip to content

Conversation

@dutow
Copy link

@dutow dutow commented Mar 22, 2022

Issue: locking iterator is several times slower than the default
iterator in queries based on secondary keys.

For example:

select COUNT(*) from sbtest1 where k % 10 = 1 for update;

This query ends up iterating over the entire table, and in the end locks
the entire table, but does so by lots of small locks and seeks every
time.

This diff shows that by simply locking the entire table instead we can
improve the performance of the iterator by 50% - it's still slower than
the default implementation, but now it's much closer to it.

This modification of course results in overlocking in queries such as

select COUNT(*) from sbtest1 where k % 10 = 1 for update limit 3000;

where instead of locking only a few thousands records, with this change
we lock the entire table instead.

This change is in this form a proof of concept, for a real patch I
think we could go in two directions:

  • instead of a hard coded value, make this an adjustable session
    variable
  • or instead of simply stopping after a threshold, continue locking in
    linear or exponential batches

@dutow dutow marked this pull request as draft March 22, 2022 08:56
bool m_valid;

ulonglong *m_lock_count;
ulonglong m_self_lock_count;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable needs to be initialized to 0. Otherwise, it causes random failures when m_self_lock_count > max_lock_count.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed that when I extracted this patch from a larger changeset and then I only tested it in a debug build after that. I'll fix it.

@dutow
Copy link
Author

dutow commented Apr 7, 2022

@hermanlee what are your thought on the two possible improvements I mentioned earlier?

(I also tested it with TokuDB, and that seems to work just like this patch: if there's some more complex query based on a secondary key, it simply locks the entire table)

Issue: locking iterator is several times slower than the default
iterator in queries based on secondary keys.

For example:

select COUNT(*) from sbtest1 where k % 10 = 1 for update;

This query ends up iterating over the entire table, and in the end locks
the entire table, but does so by lots of small locks and seeks every
time.

This diff shows that by simply locking the entire table instead we can
improve the performance of the iterator by 50% - it's still slower than
the default implementation, but now it's much closer to it.

This modification of course results in overlocking in queries such as

select COUNT(*) from sbtest1 where k % 10 = 1 for update limit 3000;

where instead of locking only a few thousands records, with this change
we lock the entire table instead.

This change is in this form a proof of concept, for a real patch I
think we could go in two directions:

* instead of a hard coded value, make this an adjustable session
  variable
* or instead of simply stopping after a threshold, continue locking in
  linear or exponential batches
@dutow dutow force-pushed the improve_locking_iter branch from 25ea94a to 2368e93 Compare April 18, 2022 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants