-
Notifications
You must be signed in to change notification settings - Fork 5
Remove (most) ifdefs that disable features for AdaptiveCpp #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nilsfriess
wants to merge
2
commits into
uxlfoundation:main
Choose a base branch
from
nilsfriess:fix-acpp
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,5 @@ | ||
| #ifndef TEMP_MEMORY_POOL_HPP | ||
| #define TEMP_MEMORY_POOL_HPP | ||
| #ifndef __ADAPTIVECPP__ | ||
| #include "helper.h" | ||
|
|
||
| namespace blas { | ||
|
|
@@ -54,7 +53,11 @@ typename Temp_Mem_Pool::event_t Temp_Mem_Pool::release_buff_mem( | |
| const container_t& mem) { | ||
| return {q_.submit([&](sycl::handler& cgh) { | ||
| cgh.depends_on(dependencies); | ||
| #ifndef __ADAPTIVECPP__ | ||
| cgh.host_task([&, mem]() { release_buff_mem_(mem); }); | ||
| #else | ||
| cgh.AdaptiveCpp_enqueue_custom_operation([&, mem](sycl::interop_handle &) { release_buff_mem_(mem); }); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps same issue here? |
||
| #endif | ||
| })}; | ||
| } | ||
|
|
||
|
|
@@ -110,10 +113,13 @@ typename Temp_Mem_Pool::event_t Temp_Mem_Pool::release_usm_mem( | |
| const container_t& mem) { | ||
| return {q_.submit([&](sycl::handler& cgh) { | ||
| cgh.depends_on(dependencies); | ||
| #ifndef __ADAPTIVECPP__ | ||
| cgh.host_task([&, mem]() { release_usm_mem_(mem); }); | ||
| #else | ||
| cgh.AdaptiveCpp_enqueue_custom_operation([&, mem](sycl::interop_handle &) { release_usm_mem_(mem); }); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And here? |
||
| #endif | ||
| })}; | ||
| } | ||
| } | ||
| #endif // SB_ENABLE_USM | ||
| #endif // __ADAPTIVECPP__ | ||
| #endif | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to do a free here inside a custom operation? Custom operations will be evaluated at submission time, which is a fundamental difference from host task (See https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/enqueue-custom-operation.md). So this looks to me like there might be a hazard of deleting memory while the device might still be using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're correct. I am not sure how to fix this though, do you have a suggestion?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pseudo code, maybe something like this could work (and, if the C++ standard library uses a thread pool, should not be slower than
host_task):Not sure if we need the in-order branch. If we reliably get dependency lists even for in-order queues we might not need it.
The real issue here is that SYCL does not have the asynchronous free mechanisms that e.g. CUDA has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a workaround for an asynchronous free. The suggested solutions makes sense to me. I suggest to make a helper function for it since the pattern is repeated a few times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be some complexity here since it's possible we need to wait on the returned
std::futurethat is returned bystd::asyncat some point to get things working.@Rbiessy I would recommend to also reconsider the current workaround for the DPC++ side. According to the current SYCL 2020 specification,
(https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:interfaces.hosttasks.overview)
The
sycl::contextclass, similarly tosycl::queueorsycl::device, follows common reference semantics, and thus cannot be used inside thehost_task. So, the current code is undefined behavior and might break at any time.The background here is that many backends impose the restriction that enqueued callbacks may not make calls into the backend runtime API or risk deadlocks. For example, in CUDA,
cudaStreamAddCallbackrequires that the registered callback may not perform any calls into the CUDA runtime, otherwise, deadlocks or other forms of UB might appear.The SYCL restriction that common-reference-semantics objects are not allowed in host tasks enforces this limitation.
Unfortunately, because of limitations like this, the current
host_taskis far less useful than it might appear at first glance, which is why AdaptiveCpp never implemented it. It's full of potential performance or correctness issues unfortunately :(I'm not sure what the best approach here is. I suspect that designing a new approach might require a better understanding of what the goal here is with the async free emulation than I currently have.
One solution that I've considered would be to just add asynchronous free to SYCL implementations as an extension, but that is also not trivial, since the OpenCL USM extensions don't support this functionality. Also, it would mean that the code would rely on specific extensions, and even if both AdaptiveCpp and DPC++ implemented them, the code would no longer be based on standard SYCL and it wouldn't work anymore with other compilers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am aware it is technically UB according to SYCL spec. In practice we agreed that it should work fine with DPC++ and likely other SYCL implementations that would support
host_task.I was just reminded that there is already an extension for async alloc and free, see https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc.
DPC++ already supports that so if we want to change the DPC++ path I would strongly suggest to use this. We don't have enough bandwidth to work on this at Codeplay but we could help with the review. It may be a good opportunity to update it if AdaptiveCpp were to support that extension?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why that should be the case. I can see that for specific implementation choices of
host_task(e.g. using a SYCL-managed worker thread instead of a backend callback) it might be fine. But that is an implementation detail and not guaranteed.hmm... We already have our own memory pool interface which is different from the one in that extension. The real new feature I suppose would be a mechanism to extend lifetime of an object until a specific kernel has finished executing. Then we could use RAII to return a memory object to the pool.
I wonder how DPC++ implements the async free on OpenCL, given that the Intel OpenCL USM extension doesn't have it.