You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Libraries/oneMKL/matrix_mul_mkl/README.md
+36-21Lines changed: 36 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# `Matrix Multiplication with oneMKL` Sample
2
2
3
-
Matrix Multiplication with Intel® oneAPI Math Kernel Library (oneMKL) shows how to use the oneMKL optimized matrix multiplication routines.
3
+
Matrix Multiplication with Intel® oneAPI Math Kernel Library (oneMKL) shows how to use the oneMKL optimized matrix multiplication routines, and provides a simple benchmark.
4
4
5
5
| Optimized for | Description
6
6
|:--- |:---
@@ -14,14 +14,17 @@ For more information on oneMKL and complete documentation of all oneMKL routines
14
14
15
15
## Purpose
16
16
17
-
Matrix Multiplication uses oneMKL to multiply two large matrices.
18
-
19
-
This sample performs its computations on the default SYCL* device. You can set the `SYCL_DEVICE_TYPE` environment variable to `cpu` or `gpu` to select the device to use.
17
+
Matrix Multiplication uses oneMKL to multiply two large matrices and measure device performance.
20
18
19
+
This sample performs its computations on the default SYCL* device. You can set the `SYCL_DEVICE_FILTER` environment variable to `cpu` or `gpu` to select the device to use.
21
20
22
21
## Key Implementation Details
23
22
24
-
The oneMKL `blas::gemm` routine performs a generalized matrix multiplication operation. OneMKL BLAS routines support both row-major and column-major matrix layouts; this sample uses row-major layouts, the traditional choice for C++.
23
+
The oneMKL `blas::gemm` routine performs a matrix multiplication operation with optional scaling and updating behavior. oneMKL BLAS routines support both row-major and column-major matrix layouts; this sample uses the default column-major layout, the traditional choice for BLAS.
24
+
25
+
This sample provides a simple benchmark to test `gemm` performance on a SYCL device, and illustrates several best practices:
26
+
- Perform a warmup run before timing, to allow oneMKL to initialize and prepare GEMM kernels for execution.
27
+
- Pad matrix dimensions if needed to ensure data is well-aligned.
25
28
26
29
## Using Visual Studio Code* (Optional)
27
30
@@ -63,27 +66,39 @@ You can remove all generated files with `make clean`.
63
66
### On a Windows* System
64
67
Run `nmake` to build and run the sample. `nmake clean` removes temporary files.
65
68
66
-
> **Warning**: On Windows, static linking with oneMKL currently takes a very long time due to a known compiler issue. This will be addressed in an upcoming release.
67
-
68
69
## Running the Matrix Multiplication with oneMKL Sample
69
70
70
71
### Example of Output
71
-
If everything is working correctly, the program will generate two input matrices and call oneMKL to multiply them. It will also compute the product matrix itself to verify the results from oneMKL.
72
+
Example output from this sample:
72
73
73
74
```
74
-
./sgemm.mkl
75
-
Problem size: A (8192x8192) * B (8192x8192) --> C (8192x8192)
76
-
Benchmark interations: 100
77
-
Device: Intel(R) Iris(R) Xe Graphics
78
-
Launching oneMKL GEMM calculation...
79
-
SGEMM performance : GFLOPS
80
-
81
-
./dgemm.mkl
82
-
Problem size: A (8192x8192) * B (8192x8192) --> C (8192x8192)
83
-
Benchmark interations: 100
84
-
Device: Intel(R) Data Center GPU Max 1100
85
-
Launching oneMKL GEMM calculation...
86
-
DGEMM performance : GFLOPS
75
+
./matrix_mul_mkl single
76
+
oneMKL DPC++ GEMM benchmark
77
+
---------------------------
78
+
Device: Intel(R) Iris(R) Pro Graphics 580
79
+
Core/EU count: 72
80
+
Maximum clock frequency: 950 MHz
81
+
82
+
Benchmarking (4096 x 4096) x (4096 x 4096) matrix multiplication, single precision
83
+
-> Initializing data...
84
+
-> Warmup...
85
+
-> Timing...
86
+
87
+
Average performance: ...
88
+
89
+
./matrix_mul_mkl double
90
+
oneMKL DPC++ GEMM benchmark
91
+
---------------------------
92
+
Device: Intel(R) Iris(R) Pro Graphics 580
93
+
Core/EU count: 72
94
+
Maximum clock frequency: 950 MHz
95
+
96
+
Benchmarking (4096 x 4096) x (4096 x 4096) matrix multiplication, double precision
0 commit comments