You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Initial Mixtral enablement.
* Adds the mistral tokenizer model.
* Updates the convert checkpoint file to handle mistral model.
* Renames the typo of the model name.
* Fixing checkpoing loading. Still has some issue. Push to debug.
* Running on CPU working, temporarily disable the generate jit to see it's moving. But the outputs doesn't make sense yet because weights are not loaded yet.
* Fix checkpoint loading issue. Right now loading from the gpt-fast converter with qkv fusion.
* Fix the ckpt conversion script for mistral model. Fix the freqs_cis for loading pth file.
* Add quantized layer for moe
quantization
* Add the huggingface download script. Improved the convert checkpoints logging.
* Clean up and fix lint errors.
* Missing cleanups.
* Add instructions for Mixtral.
* Renames everything from mistral to mixtral.
* Fix more lint errors.
* Removes the unnecessary checkpoint name mapping from the original Mixtral checkpoints.
* Fix the model calling arg sequence; Fix the checkpoint convert script.
---------
Co-authored-by: Han Qi <hanq@google.com>
Need to manually modify the `config.json` in the checkpoint folder to make it a valid JSON file. (Replace `'` with `"`, remove the excessive `,` after the last item in the JSON object)
66
66
67
+
## Mixtral
68
+
### Get Mixtral Checkpoint from HuggingFace
69
+
70
+
Please sign agreement on Huggingface website to access Mixtral checkpoints. Download Mixtral PyTorch checkpoint using huggingface-cli. Mixtral Tokenizer is included in the checkpoint.
export model_name="llama-3"# or "llama-2", "gemma"
81
+
export model_name="llama-3"# or "llama-2", "gemma", "mixtral"
73
82
export quantize_weights=True # Whether to quantize weights
74
83
export quantize_type="int8_per_channel"# "quantize_weights" needs to be turned on. Availabe quantize type: {"int8", "int4"} x {"per_channel", "blockwise"}, "int8_per_channel" is the default option if not specified.
0 commit comments