Skip to content

Commit 080e564

Browse files
Alfie-EdwardsFrederik Mellbye
authored andcommitted
Making InterIpuCopy non-allocating
Summary: Making InterIpuCopy allocating can cause global exchange lowering to take significantly longer as we can get global exchanged with different source and destination layouts. For the related regression ticket, the difference observed locally was 108 seconds vs 12 seconds. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh Maniphest Tasks: T65905 Differential Revision: https://phabricator.sourcevertex.net/D74105
1 parent 8e24843 commit 080e564

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

tensorflow/compiler/plugin/poplar/driver/passes/allocation_analysis.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,8 +195,8 @@ Status FindStartPoints(AllocationLoopState& state, HloModule* module) {
195195
}
196196

197197
// Only some inputs affect tile mapping of output, for this function
198-
// only add the user if next.instruction's tile mapping is related to
199-
// it's output
198+
// only add the user if the next instruction's tile mapping is related to
199+
// its output.
200200
Status AddIfMappingDependsOnOperand(std::vector<IndexedLocation>& to_visit,
201201
HloInstruction* user,
202202
const IndexedLocation& operand,

tensorflow/compiler/plugin/poplar/driver/tools/custom_ops/inter_ipu_copy.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ absl::flat_hash_set<int64_t> HloInterIpuCopy::AllocatingIndices() const {
2929
return {};
3030
}
3131

32-
bool HloInterIpuCopy::AllocatingOutput() const { return true; }
32+
bool HloInterIpuCopy::AllocatingOutput() const { return false; }
3333

3434
absl::flat_hash_map<int64_t, int64_t> HloInterIpuCopy::LayoutDependencies()
3535
const {

0 commit comments

Comments
 (0)