Skip to content
This repository was archived by the owner on May 1, 2025. It is now read-only.

Commit 7ce12e0

Browse files
authored
Update README.md
1 parent 554254e commit 7ce12e0

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,15 +120,15 @@ from codetf.models import load_model_pipeline
120120

121121
code_generation_model = load_model_pipeline(model_name="codet5", task="pretrained",
122122
model_type="plus-220M", is_eval=True,
123-
load_in_8bit=True, weight_sharding=False)
123+
load_in_8bit=True, load_in_4bit=False, weight_sharding=False)
124124

125125
result = code_generation_model.predict(["def print_hello_world():"])
126126
print(result)
127127
```
128128
There are a few notable arguments that need to be considered:
129129
- ``model_name``: the name of the model, currently support ``codet5`` and ``causal-lm``.
130130
- ``model_type``: type of model for each model name, e.g. ``base``, ``codegen-350M-mono``, ``j-6B``, etc.
131-
- ``load_in_8bit``: inherit the ``load_in_8bit" feature from [Huggingface Quantization](https://huggingface.co/docs/transformers/main/main_classes/quantization).
131+
- ``load_in_8bit`` and ``load_in_4bit``: inherit the dynamic quantization feature from [Huggingface Quantization](https://huggingface.co/docs/transformers/main/main_classes/quantization).
132132
- ``weight_sharding``: our advance feature that leverages [HuggingFace Sharded Checkpoint](https://huggingface.co/docs/accelerate/v0.19.0/en/package_reference/big_modeling#accelerate.load_checkpoint_and_dispatch) to split a large model in several smaller shards in different GPUs. Please consider using this if you are dealing with large models.
133133

134134
### Model Zoo

0 commit comments

Comments
 (0)