Skip to content

Conversation

@yumkam
Copy link
Collaborator

@yumkam yumkam commented Nov 7, 2025

There can be some spurious/double wakeup, but this should not matter much

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

...

There can be some spurious/double wakeup, but this should not matter much
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

🟢 2025-11-07 12:28:49 UTC The validation of the Pull Request description is successful.

@yumkam yumkam marked this pull request as ready for review November 7, 2025 12:28
@yumkam yumkam requested a review from a team as a code owner November 7, 2025 12:28
Copilot AI review requested due to automatic review settings November 7, 2025 12:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the HTTP gateway implementation to improve resource management and implement a wake-up mechanism for efficient buffer handling. The key change is converting the CURL multi-handle from a raw pointer to a std::shared_ptr to enable proper lifetime tracking.

Key Changes:

  • Convert CURLM* handle to std::shared_ptr<CURLM> with custom deleter for automatic cleanup
  • Add handle and threshold parameters to TCountedContent to trigger wake-ups when buffered data falls below threshold
  • Implement automatic wake-up logic in TCountedContent destructor and Extract() method when buffer size crosses threshold

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ydb/library/yql/providers/common/http_gateway/yql_http_gateway.h Updated TCountedContent constructor signature and added Handle and Threshold member variables
ydb/library/yql/providers/common/http_gateway/yql_http_gateway.cpp Converted CURL multi-handle to shared_ptr, updated all handle usage sites to call .get(), implemented wake-up logic in TCountedContent, and added busy-wait synchronization in cleanup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
auto weakHandle = std::weak_ptr<CURLM>(Handle);
Handle.reset();
while (!weakHandle.expired()) {} // short busy-wait in case of TCountedContent
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This busy-wait loop is problematic. It will consume 100% CPU while waiting for all TCountedContent instances to be destroyed. Consider using a more efficient synchronization mechanism like a condition variable, or at least add a small sleep/yield inside the loop to reduce CPU usage.

auto weakHandle = std::weak_ptr<CURLM>(Handle);
Handle.reset();
std::this_thread::yield();
while (!weakHandle.expired()) {
    std::this_thread::sleep_for(std::chrono::microseconds(100));
}
Suggested change
while (!weakHandle.expired()) {} // short busy-wait in case of TCountedContent
while (!weakHandle.expired()) {
std::this_thread::sleep_for(std::chrono::microseconds(100));
}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, yes, busy-loops are bad; however, in this case it is expected to happen only on destruction (already rare!), only in cases when there are concurrently executed move-out or destruction from TCountedContent, and only for very short period (when curl_multi_wakeup is executed)
Hence, comparing required expenses and complications from better synchronization primitives, the choice is pretty obvious.

Comment on lines +1072 to +1076
if (oldSize >= Threshold && oldSize - size() < Threshold) {
if (auto handle = Handle.lock()) {
curl_multi_wakeup(handle.get());
}
}
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threshold check logic has a potential race condition. Between checking oldSize >= Threshold and evaluating oldSize - size() < Threshold, another thread could modify the Counter, making the condition incorrect. This is compounded by the fact that oldSize is the value after the subtraction (fetch_sub returns the previous value), so oldSize - size() is actually the new value after this operation.

However, there's a more fundamental issue: if Threshold is 0 (the default), the condition oldSize >= Threshold && oldSize - size() < Threshold would trigger on every destruction where size() > 0, potentially causing excessive wake-up calls. Consider adding a check: if (Threshold > 0 && oldSize >= Threshold && oldSize - size() < Threshold)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter is sum of all size(), hence both should not be issue.
Anyway, rare spurious (or even missed) wake-up should not be issue.

Comment on lines 1084 to 1088
if (oldSize >= Threshold && oldSize - size() < Threshold) {
if (auto handle = Handle.lock()) {
curl_multi_wakeup(handle.get());
}
}
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in the destructor: if Threshold is 0 (the default), the condition oldSize >= Threshold && oldSize - size() < Threshold would trigger on every extraction where size() > 0, potentially causing excessive wake-up calls. Consider adding a check: if (Threshold > 0 && oldSize >= Threshold && oldSize - size() < Threshold)

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

2025-11-07 12:32:03 UTC Pre-commit check linux-x86_64-relwithdebinfo for 07a90f5 has started.
2025-11-07 12:32:19 UTC Artifacts will be uploaded here
2025-11-07 12:34:36 UTC ya make is running...
2025-11-07 13:25:44 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

2025-11-07 12:32:10 UTC Pre-commit check linux-x86_64-release-asan for 07a90f5 has started.
2025-11-07 12:32:26 UTC Artifacts will be uploaded here
2025-11-07 12:34:40 UTC ya make is running...
2025-11-07 13:25:43 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

2025-11-07 13:29:32 UTC Pre-commit check linux-x86_64-relwithdebinfo for 2792ab0 has started.
2025-11-07 13:29:50 UTC Artifacts will be uploaded here
2025-11-07 13:32:07 UTC ya make is running...
🔴 2025-11-07 13:34:07 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

2025-11-07 13:29:54 UTC Pre-commit check linux-x86_64-release-asan for 2792ab0 has started.
2025-11-07 13:30:12 UTC Artifacts will be uploaded here
2025-11-07 13:32:32 UTC ya make is running...
🔴 2025-11-07 13:34:41 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

2025-11-07 13:45:10 UTC Pre-commit check linux-x86_64-release-asan for f055b54 has started.
2025-11-07 13:47:19 UTC Artifacts will be uploaded here
2025-11-07 13:49:31 UTC ya make is running...
🟡 2025-11-07 16:01:35 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
17596 17072 0 211 281 32

🟢 2025-11-07 16:01:49 UTC Build successful.
🟢 2025-11-07 16:02:16 UTC ydbd size 3.8 GiB changed* by +11.5 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 4579632 merge: f055b54 diff diff %
ydbd size 4 072 955 080 Bytes 4 072 966 832 Bytes +11.5 KiB +0.000%
ydbd stripped size 1 511 824 008 Bytes 1 511 829 928 Bytes +5.8 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

2025-11-07 13:45:22 UTC Pre-commit check linux-x86_64-relwithdebinfo for f055b54 has started.
2025-11-07 13:46:57 UTC Artifacts will be uploaded here
2025-11-07 13:49:09 UTC ya make is running...
🟡 2025-11-07 15:31:39 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41208 38377 0 3 2801 27

2025-11-07 15:32:00 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-11-07 15:41:55 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
73 (only retried tests) 58 0 0 0 15

🟢 2025-11-07 15:42:05 UTC Build successful.
🟢 2025-11-07 15:42:26 UTC ydbd size 2.3 GiB changed* by +6.3 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 4579632 merge: f055b54 diff diff %
ydbd size 2 431 787 616 Bytes 2 431 794 040 Bytes +6.3 KiB +0.000%
ydbd stripped size 516 526 256 Bytes 516 527 344 Bytes +1.1 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@yumkam yumkam requested a review from GrigoriyPA November 7, 2025 15:27
auto weakHandle = std::weak_ptr<CURLM>(Handle);
Handle.reset();
while (!weakHandle.expired()) { // short busy-wait in unlikely case of collision with TCountedContent
Sleep(TDuration::MicroSeconds(1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

путь к зависаниям. давай таймаут сделаем

Copy link
Collaborator Author

@yumkam yumkam Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Последствия досрочного выхода из этого цикла хуже, чем бесконечное ожидание. И тут никогда не может быть таймаут, обработка таймаута создаст ложное впечатление, что он возможен

@yumkam yumkam requested a review from uzhastik November 8, 2025 10:29
@github-actions
Copy link

github-actions bot commented Nov 10, 2025

2025-11-10 18:24:48 UTC Pre-commit check linux-x86_64-relwithdebinfo for df68b7b has started.
2025-11-10 18:25:07 UTC Artifacts will be uploaded here
2025-11-10 18:27:13 UTC ya make is running...
2025-11-10 18:29:50 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 10, 2025

2025-11-10 18:25:00 UTC Pre-commit check linux-x86_64-release-asan for df68b7b has started.
2025-11-10 18:25:16 UTC Artifacts will be uploaded here
2025-11-10 18:27:21 UTC ya make is running...
2025-11-10 18:29:51 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 10, 2025

2025-11-10 18:31:46 UTC Pre-commit check linux-x86_64-release-asan for 9db575c has started.
2025-11-10 18:32:12 UTC Artifacts will be uploaded here
2025-11-10 18:35:11 UTC ya make is running...
2025-11-10 19:09:48 UTC Check cancelled

@yumkam yumkam requested a review from Copilot November 10, 2025 18:32
@github-actions
Copy link

github-actions bot commented Nov 10, 2025

2025-11-10 18:32:29 UTC Pre-commit check linux-x86_64-relwithdebinfo for 9db575c has started.
2025-11-10 18:32:47 UTC Artifacts will be uploaded here
2025-11-10 18:34:52 UTC ya make is running...
2025-11-10 19:09:48 UTC Check cancelled

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +704 to 705
// FIXME: NOT SAFE (see man libcurl(3))
const CURLcode globalInitResult = curl_global_init(CURL_GLOBAL_ALL);
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIXME comment indicates a known safety issue with curl_global_init but doesn't explain what the problem is or how it should be fixed. Consider documenting that curl_global_init is not thread-safe and should be called only once before any threads are started, or reference the specific safety concerns from the libcurl documentation.

Copilot uses AI. Check for mistakes.
if (multiCleanupResult != CURLM_OK) {
Cerr << "curl_multi_cleanup error " << int(multiCleanupResult) << ": " << curl_multi_strerror(multiCleanupResult) << Endl;
}
curl_global_cleanup(); // FIXME: NOT SAFE (see man libcurl(3))
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIXME comment indicates a known safety issue with curl_global_cleanup but doesn't explain what the problem is. Consider documenting that curl_global_cleanup must be called only after all libcurl operations have completed and all handles are destroyed, and it's not thread-safe. This is particularly important since the cleanup is now in a shared_ptr deleter.

Suggested change
curl_global_cleanup(); // FIXME: NOT SAFE (see man libcurl(3))
// WARNING: curl_global_cleanup() must only be called after all libcurl operations have completed
// and all easy and multi handles have been destroyed. This function is not thread-safe and must not
// be called concurrently with any other libcurl function. Since this cleanup is performed in the
// shared_ptr deleter, it is critical to ensure that no other libcurl handles are in use and no other
// threads are using libcurl when this is called. See man libcurl(3) and the libcurl documentation for details.
curl_global_cleanup();

Copilot uses AI. Check for mistakes.
Comment on lines +1070 to +1071
auto oldSize = Counter->fetch_sub(size());
if (oldSize >= Threshold && oldSize - size() < Threshold) {
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential race condition here. Between fetch_sub and checking the threshold, another thread could also call fetch_sub, causing multiple wakeup calls even though only one thread actually crossed the threshold. While extra wakeups are generally harmless, consider if this could cause performance issues under high concurrency. If so, consider using compare-and-exchange to ensure only one thread triggers the wakeup when crossing the threshold.

Suggested change
auto oldSize = Counter->fetch_sub(size());
if (oldSize >= Threshold && oldSize - size() < Threshold) {
size_t expected = Counter->load();
bool triggered = false;
while (true) {
if (expected < Threshold || expected - size() >= Threshold) {
// No threshold crossing, just subtract
if (Counter->compare_exchange_weak(expected, expected - size())) {
break;
}
// compare_exchange_weak updates expected, so loop
} else {
// Threshold crossing, only one thread should trigger
if (Counter->compare_exchange_weak(expected, expected - size())) {
triggered = true;
break;
}
// compare_exchange_weak updates expected, so loop
}
}
if (triggered) {

Copilot uses AI. Check for mistakes.
@iddqdex iddqdex added the rebase-and-check Rebase PR with the current base branch and check label Nov 10, 2025
@github-actions github-actions bot removed the rebase-and-check Rebase PR with the current base branch and check label Nov 10, 2025
@github-actions
Copy link

github-actions bot commented Nov 10, 2025

2025-11-10 19:11:39 UTC Pre-commit check linux-x86_64-relwithdebinfo for a5150b1 has started.
2025-11-10 19:11:57 UTC Artifacts will be uploaded here
2025-11-10 19:14:12 UTC ya make is running...
🟡 2025-11-10 20:49:15 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41277 38455 0 3 2792 27

2025-11-10 20:49:30 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-11-10 20:58:04 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
54 (only retried tests) 41 0 0 0 13

🟢 2025-11-10 20:58:12 UTC Build successful.
🟢 2025-11-10 20:58:33 UTC ydbd size 2.3 GiB changed* by +9.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 3b1cf4e merge: a5150b1 diff diff %
ydbd size 2 437 459 680 Bytes 2 437 469 824 Bytes +9.9 KiB +0.000%
ydbd stripped size 518 602 832 Bytes 518 607 888 Bytes +4.9 KiB +0.001%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 10, 2025

2025-11-10 19:11:45 UTC Pre-commit check linux-x86_64-release-asan for a5150b1 has started.
2025-11-10 19:12:02 UTC Artifacts will be uploaded here
2025-11-10 19:14:15 UTC ya make is running...
🟡 2025-11-10 21:19:03 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
17651 17168 0 138 315 30

🟢 2025-11-10 21:19:12 UTC Build successful.
🟢 2025-11-10 21:19:38 UTC ydbd size 3.8 GiB changed* by +10.7 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 3b1cf4e merge: a5150b1 diff diff %
ydbd size 4 080 530 776 Bytes 4 080 541 744 Bytes +10.7 KiB +0.000%
ydbd stripped size 1 514 729 960 Bytes 1 514 735 496 Bytes +5.4 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@yumkam yumkam merged commit c734e91 into ydb-platform:main Nov 11, 2025
18 of 21 checks passed
yumkam added a commit to yumkam/ydb that referenced this pull request Nov 11, 2025
yumkam added a commit to yumkam/ydb that referenced this pull request Nov 11, 2025
yumkam added a commit that referenced this pull request Nov 12, 2025
…l when we released inflight buffers (backport q-25-1 #28390) (#28660)
yumkam added a commit that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants