From c3a5823fa604f9f47bb55871a2820e307d1f6d09 Mon Sep 17 00:00:00 2001
From: Qiang Zhang <qiangzhang@meta.com>
Date: Tue, 11 Nov 2025 06:44:19 -0800
Subject: [PATCH] Fix NaN handling in AUPRC metric calculation (#3523)

Summary:

Improved the NaN handling logic in the AUPRC (Area Under Precision-Recall Curve) metric calculation to correctly handle edge cases where division by zero occurs.

The changes address NaN values that arise from 0.0/0.0 divisions in both recall and precision calculations:

**Recall NaN handling:**
- NaNs occur on the right side of the recall tensor due to cumsum starting from the left (num_fp) but being flipped
- Changed NaN replacement value from 1.0 to 0.0 to match the 0.0 value appended on the right side
- Removed the conditional check since we should always handle NaNs consistently

**Precision NaN handling:**
- Added explicit NaN handling for precision tensor (previously missing)
- NaNs in precision occur on the right side similar to recall
- Replace NaNs with 0.0 (not 1.0 as stated in comment) to prevent NaN propagation in _riemann_integral
- This prevents the entire AUPRC result from becoming NaN when any precision element is NaN

Added detailed comments explaining the root cause of NaN values and the rationale for the chosen replacement values.

Reviewed By: iamzainhuda

Differential Revision: D86464670
---
 torchrec/metrics/auprc.py | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/torchrec/metrics/auprc.py b/torchrec/metrics/auprc.py
index ed99417d2..8c90217da 100644
--- a/torchrec/metrics/auprc.py
+++ b/torchrec/metrics/auprc.py
@@ -65,9 +65,16 @@ def _compute_auprc_helper(
     precision = torch.cat([precision, precision.new_ones(1)])
     recall = torch.cat([recall, recall.new_zeros(1)])
 
-    # If recalls are NaNs, set NaNs to 1.0s.
-    if torch.isnan(recall[0]):
-        recall = torch.nan_to_num(recall, 1.0)
+    # nan happens with 0.0 / 0.0. For recall's case, this could happen from its right side:
+    # num_fp is a cumsum and thus 0.0 starts from its left side. But given recall has a flip,
+    # then those 0.0 goes to right side and thus nan.
+    # If recalls are NaNs, set NaNs to 0.0s, as append a 0.0 on its right side above.
+    recall = torch.nan_to_num(recall, 0.0)
+
+    # similar as recall, precision's nan would happen from its right side.
+    # since we append 1.0 on its right side above, we replace nan by 1.0.
+    # If any element in precision is Nan, _riemann_integral will return NaN.
+    precision = torch.nan_to_num(precision, 1.0)
 
     auprc = _riemann_integral(recall, precision)
     return auprc