Skip to content

Conversation

@Lhongpei
Copy link
Contributor

@Lhongpei Lhongpei commented Nov 17, 2025

Description

This PR introduces a significant performance optimization for PDLPx, particularly for problems with highly sparse constraint matrices.

When a row of the constraint matrix (or its transpose) is highly sparse (i.e., has very few non-zeros), launching a full CUSPARSE SpMV kernel for the primal or dual update can be inefficient due to kernel launch overhead and low computational density.

This change introduces two new, "fused" CUDA kernels:

  • fused_compute_next_pdhg_primal_solution_kernel
  • fused_compute_next_pdhg_dual_solution_kernel

These kernels perform the sparse matrix-vector multiplication (SpMV) using a simple for-loop (which is more efficient for highly sparse rows) and fuse it with the subsequent PDHG update logic (e.g., projection onto bounds, reflection). This approach avoids the overhead of separate kernel launches and improves data locality.

Implementation Details

  • Fused Primal Kernel: Computes the dual product (A^T @ dual_solution) and fuses it with the primal variable update, projection (against var_lb, var_ub), and reflection.
  • Fused Dual Kernel: Computes the primal product (A @ primal_solution) and fuses it with the dual variable update, projection (against const_lb, const_ub), and reflection.
  • Auto-Algorithm Selection: The new fused kernel path is automatically selected for a primal or dual update when the number of non-zeros in each row (or column) is less than 100 and density is less than 0.01, which can be tuned further. For denser matrices, the existing CUSPARSE-based update is retained.

Performance Improvements

This fusion results in substantial performance gains, as demonstrated on Hans' Benchmark and the MIPLIB dataset.

Hans' Benchmark Examples

Model Iterations Previous (CUSPARSE) Fused Kernel Speedup
cont11 799200 31.23s 7.68s 4.07x
thk48 18000 18.79s 13.21s 1.42x

MIPLIB Dataset Summary

The results across the MIPLIB dataset are excellent. Both methods were run for the same number of iterations.
There are 169 instances using fused update according to the auto-selection.

Metric CUSPARSE Based Update Fused Update
GEOMEAN 0.369009528 0.200248621
SGM10 2.34091312 1.69636033
Better Count 3 / 169 166 / 169
Mean Relative Improvement - 32.47%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant