Improve locking iterator performance #1

dutow · 2022-03-22T08:56:28Z

Issue: locking iterator is several times slower than the default
iterator in queries based on secondary keys.

For example:

select COUNT(*) from sbtest1 where k % 10 = 1 for update;

This query ends up iterating over the entire table, and in the end locks
the entire table, but does so by lots of small locks and seeks every
time.

This diff shows that by simply locking the entire table instead we can
improve the performance of the iterator by 50% - it's still slower than
the default implementation, but now it's much closer to it.

This modification of course results in overlocking in queries such as

select COUNT(*) from sbtest1 where k % 10 = 1 for update limit 3000;

where instead of locking only a few thousands records, with this change
we lock the entire table instead.

This change is in this form a proof of concept, for a real patch I
think we could go in two directions:

instead of a hard coded value, make this an adjustable session
variable
or instead of simply stopping after a threshold, continue locking in
linear or exponential batches

hermanlee · 2022-04-06T22:57:24Z

storage/rocksdb/rdb_locking_iter.h

  bool  m_valid;

  ulonglong *m_lock_count;
+  ulonglong m_self_lock_count;


This variable needs to be initialized to 0. Otherwise, it causes random failures when m_self_lock_count > max_lock_count.

You are right, I missed that when I extracted this patch from a larger changeset and then I only tested it in a debug build after that. I'll fix it.

dutow · 2022-04-07T13:55:55Z

@hermanlee what are your thought on the two possible improvements I mentioned earlier?

(I also tested it with TokuDB, and that seems to work just like this patch: if there's some more complex query based on a secondary key, it simply locks the entire table)

Issue: locking iterator is several times slower than the default iterator in queries based on secondary keys. For example: select COUNT(*) from sbtest1 where k % 10 = 1 for update; This query ends up iterating over the entire table, and in the end locks the entire table, but does so by lots of small locks and seeks every time. This diff shows that by simply locking the entire table instead we can improve the performance of the iterator by 50% - it's still slower than the default implementation, but now it's much closer to it. This modification of course results in overlocking in queries such as select COUNT(*) from sbtest1 where k % 10 = 1 for update limit 3000; where instead of locking only a few thousands records, with this change we lock the entire table instead. This change is in this form a proof of concept, for a real patch I think we could go in two directions: * instead of a hard coded value, make this an adjustable session variable * or instead of simply stopping after a threshold, continue locking in linear or exponential batches

dutow marked this pull request as draft March 22, 2022 08:56

hermanlee reviewed Apr 6, 2022

View reviewed changes

dutow force-pushed the improve_locking_iter branch from 25ea94a to 2368e93 Compare April 18, 2022 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve locking iterator performance #1

Improve locking iterator performance #1

Uh oh!

dutow commented Mar 22, 2022

Uh oh!

hermanlee Apr 6, 2022

Uh oh!

dutow Apr 7, 2022

Uh oh!

dutow commented Apr 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve locking iterator performance #1

Are you sure you want to change the base?

Improve locking iterator performance #1

Uh oh!

Conversation

dutow commented Mar 22, 2022

Uh oh!

hermanlee Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

dutow Apr 7, 2022

Choose a reason for hiding this comment

Uh oh!

dutow commented Apr 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants