Skip to content

Commit d2c5d82

Browse files
committed
Updated readme
1 parent 9ac48bc commit d2c5d82

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ Featuring:
1212

1313
This is *NOT* intended to be a "framework" or "library" - it is intended to show off what kind of performance you can get with native PyTorch :) Please copy-paste and fork as you desire.
1414

15+
For an in-depth walkthrough of what's in this codebase, see this [blog post](https://pytorch.org/blog/accelerating-generative-ai-2/).
16+
1517
## Installation
1618
[Download PyTorch nightly](https://pytorch.org/get-started/locally/)
1719
Install sentencepiece and huggingface_hub
@@ -57,7 +59,16 @@ Benchmarks run on an A100-80GB, power limited to 330W.
5759
[Verifier: Llama-70B (int4), Draft: Llama-7B (int4)](./scripts/speculate_70B_int4.sh): 48.4 tok/s
5860

5961
### Tensor Parallelism
60-
Benchmark numbers to be added...
62+
| Model | Number of GPUs | Tokens/Second | Memory Bandwidth (GB/s) |
63+
| -------- | ------- | ------ | ------ |
64+
| Llama-2-7B | 1 | 104.9 | 1397.31 |
65+
| | 2 | 136.27 | 954.01 |
66+
| | 4 | 168.78 | 635.09 |
67+
| | 8 | 179.27 | 395.85 |
68+
| Llama-2-70B | 1 | OOM | |
69+
| | 2 | 20.53 | 1426.41 |
70+
| | 4 | 34.15 | 1204.62 |
71+
| | 8 | 47.25 | 858.28 |
6172

6273
### AMD
6374
Benchmarks run on one GCD of a MI-250x.

0 commit comments

Comments
 (0)