You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 28, 2023. It is now read-only.
register promotion: insert syncs if scoped above thread mapping
Promotion to registers is performed in a particular scope. Within this
scope, if a promotion is deemed valid, tensor elements are only accessed
by the same thread making thread synchronization unnecessary. However,
outside this scope, promoted elements may be accessed by different
threads which would require synchronization. If different threads (or,
in practice, different iterations of thread-mapped loops) access the
same element in a way that requires synchronization, this is reflected
in the dependence relation. The OuterBlockInnerThread mapping strategy
detects it and introduces synchronization statements above thread
mapping. It only performs promotion to registers below thread mapping.
Therefore, synchronizations were unnecessary around copies to or from
registers.
PR #489 introduced functionality to promote to registers at any scope,
including above thread mapping. In this case, synchronizations inserted
below may not suffice. For example, in the tree of the shape
band( // contains a sequential loop
// <- promotion scope
extension(
sequence(
filter( // main computation
mapping(...)), // to threads
filter(...) // synchronization
different iterations of the outer sequential loop may lead to different
threads accessing the same tensor element (but in one iteration, only
one thread accesses it). Copies from global memory to registers will be
inserted at the promotion scope, i.e. after the synchronization
statement. A write to global memory by one thread will not be
synchronized with a read from the same address by a potentially
different thread in the following iteration of the loop above the
scoping point. A synchronization must be inserted either before the
read from global memory or after the write-back. In this particular
case, one may want to insert the write-back before the existing
synchronization, but it is not always possible in the general case where
the scoping point may be not immediately above the thread mapping.
Furthermore, it may also be necessary to synchronize due to dependences
with sibling subtrees that have a different mapping.
When register promotion copies are inserted above thread mapping,
introduce thread synchronizations before the copy to register and after
the copy from register. This is a conservative approximation. Exact
analysis would require analyzing dependences between an instance of the
scope and the rest of the elements and is left for future work.
0 commit comments