We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c5d3454 commit 14df270Copy full SHA for 14df270
README.md
@@ -72,6 +72,7 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
72
73
### Tensor Parallelism + Quantization
74
| Model | Technique | Tokens/Second | Memory Bandwidth (GB/s) |
75
+| -------- | ------- | ------ | ------ |
76
| Llama-2-70B | Base | 62.50 | 1135.29 |
77
| | 8-bit | 80.44 | 752.04 |
78
| | 4-bit (G=32) | 90.77 | 548.10 |
0 commit comments