Skip to content

Commit 652da2f

Browse files
authored
Add hardware requirements section
1 parent 7a9f9db commit 652da2f

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

README.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ huggingface-cli login
2626
- [Datasets](#datasets)
2727
- [Stack Exchange](#stack-exchange-se)
2828
- [Merging PEFT adapter layers](#merging-peft-adapter-layers)
29+
3. [Evaluation](#evaluation)
30+
4. [Inference hardware requirements](#inference-hardware-requirements)
2931

3032
# Quickstart
3133
StarCoder was trained on GitHub code, thus it can be used to perform code generation. More precisely, the model can complete the implementation of a function or infer the following characters in a line of code. This can be done with the help of the 🤗's [transformers](https://github.com/huggingface/transformers) library.
@@ -63,6 +65,7 @@ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
6365
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
6466
print( pipe("def hello():") )
6567
```
68+
For hardware requirements, check the secyoon [Inference hardware requirements](#inference-hardware-requirements).
6669

6770
## Text-generation-inference
6871

@@ -189,5 +192,21 @@ For example
189192
python finetune/merge_peft_adapters.py --model_name_or_path bigcode/starcoder --peft_model_path checkpoints/checkpoint-1000 --push_to_hub
190193
```
191194

192-
## Evaluation
195+
# Evaluation
193196
To evaluate StarCoder and its derivatives, you can use the [BigCode-Evaluation-Harness](https://github.com/bigcode-project/bigcode-evaluation-harness) for evaluating Code LLMs.
197+
198+
# Inference hardware requirements
199+
In FP32 the model requires more than 60GB of RAM, you can load it in FP16 or BF16 in ~30GB, or in 8bit under 20GB of RAM with
200+
```python
201+
# make sure you have accelerate and bitsandbytes installed
202+
from transformers import AutoModelForCausalLM, AutoTokenizer
203+
204+
tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder")
205+
# for fp16 replace with `load_in_8bit=True` with `torch_dtype=torch.float16`
206+
model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder", device_map="auto", load_in_8bit=True)
207+
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
208+
````
209+
```
210+
Memory footprint: 15939.61 MB
211+
```
212+
You can also try [starcoder.cpp](https://github.com/bigcode-project/starcoder.cpp), a C++ implementation with [ggml](https://github.com/ggerganov/ggml) library.

0 commit comments

Comments
 (0)