Skip to content

Commit a2e184d

Browse files
FL33TW00DpcuencaVaibhavs10joshnewnham1duo
authored
feat: tokenizers and hub are the big sellers! (#270)
* feat: tokenizers and hub are the big sellers! * Update README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * fix: add Mistral 7B example * Update README.md Co-authored-by: vb <vaibhavs10@gmail.com> * Update README.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Link to Core ML PR Co-authored-by: Joshua Newman <joshnewnham@users.noreply.github.com> Co-authored-by: Yuduo Wu <6426433+1duo@users.noreply.github.com> Co-authored-by: Alejandro Isaza <alejandro-isaza@users.noreply.github.com> Co-authored-by: Aseem Wadhwa <aseemw@users.noreply.github.com> Reference: #257 --------- Co-authored-by: FL33TW00D <FL33TW00D@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Joshua Newman <joshnewnham@users.noreply.github.com> Co-authored-by: Yuduo Wu <6426433+1duo@users.noreply.github.com> Co-authored-by: Alejandro Isaza <alejandro-isaza@users.noreply.github.com> Co-authored-by: Aseem Wadhwa <aseemw@users.noreply.github.com>
1 parent 620eef1 commit a2e184d

File tree

1 file changed

+59
-27
lines changed

1 file changed

+59
-27
lines changed

README.md

Lines changed: 59 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,43 +18,75 @@ Those familiar with the [`transformers`](https://github.com/huggingface/transfor
1818

1919
## Rationale & Overview
2020

21-
Check out [our announcement post](https://huggingface.co/blog/swift-coreml-llm).
21+
Check out [our v1.0 release post](https://huggingfce.co/blog/swift-transformers) and our [original announcement](https://huggingface.co/blog/swift-coreml-llm) for more context on why we built this library.
2222

23-
## Modules
23+
## Examples
2424

25-
- `Tokenizers`: Utilities to convert text to tokens and back, with support for Chat Templates and Tools. Follows the abstractions in [`tokenizers`](https://github.com/huggingface/tokenizers).
25+
The most commonly used modules from `swift-transformers` are `Tokenizers` and `Hub`, which allow fast tokenization and
26+
model downloads from the Hugging Face Hub.
27+
28+
### Tokenizing text + chat templating
29+
30+
Tokenizing text should feel very familiar to those who have used the Python `transformers` library:
2631

27-
Usage example:
2832
```swift
29-
import Tokenizers
30-
func testTokenizer() async throws {
31-
let tokenizer = try await AutoTokenizer.from(pretrained: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
32-
let messages = [["role": "user", "content": "Describe the Swift programming language."]]
33-
let encoded = try tokenizer.applyChatTemplate(messages: messages)
34-
let decoded = tokenizer.decode(tokens: encoded)
35-
}
33+
let tokenizer = try await AutoTokenizer.from(pretrained: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
34+
let messages = [["role": "user", "content": "Describe the Swift programming language."]]
35+
let encoded = try tokenizer.applyChatTemplate(messages: messages)
36+
let decoded = tokenizer.decode(tokens: encoded)
3637
```
37-
- `Hub`: Utilities for interacting with the Hugging Face Hub. Download models, tokenizers and other config files.
3838

39-
Usage example:
39+
40+
### Tool calling
41+
42+
`swift-transformers` natively supports formatting inputs for tool calling, allowing for complex interactions with language models:
4043

4144
```swift
42-
import Hub
43-
func testHub() async throws {
44-
let repo = Hub.Repo(id: "mlx-community/Qwen2.5-0.5B-Instruct-2bit-mlx")
45-
let modelDirectory: URL = try await Hub.snapshot(
46-
from: repo,
47-
matching: ["config.json", "*.safetensors"],
48-
progressHandler: { progress in
49-
print("Download progress: \(progress.fractionCompleted * 100)%")
50-
}
51-
)
52-
print("Files downloaded to: \(modelDirectory.path)")
53-
}
45+
let tokenizer = try await AutoTokenizer.from(pretrained: "mlx-community/Qwen2.5-7B-Instruct-4bit")
46+
47+
let weatherTool = [
48+
"type": "function",
49+
"function": [
50+
"name": "get_current_weather",
51+
"description": "Get the current weather in a given location",
52+
"parameters": [
53+
"type": "object",
54+
"properties": ["location": ["type": "string", "description": "City and state"]],
55+
"required": ["location"]
56+
]
57+
]
58+
]
59+
60+
let tokens = try tokenizer.applyChatTemplate(
61+
messages: [["role": "user", "content": "What's the weather in Paris?"]],
62+
tools: [weatherTool]
63+
)
5464
```
5565

56-
- `Generation`: Utilities for text generation, handling tokenization for you. Currently supported sampling methods: greedy search, top-k sampling, and top-p sampling.
57-
- `Models`: Language model abstraction over a Core ML package.
66+
67+
### Hub downloads
68+
69+
Downloading models to a user device _fast_ and _reliably_ is a core requirement of on-device ML. `swift-transformers` provides a simple API to
70+
download models from the Hugging Face Hub, with progress reporting, flaky connection handling, and more:
71+
72+
```swift
73+
let repo = Hub.Repo(id: "mlx-community/Qwen2.5-0.5B-Instruct-2bit-mlx")
74+
let modelDirectory: URL = try await Hub.snapshot(
75+
from: repo,
76+
matching: ["config.json", "*.safetensors"],
77+
progressHandler: { progress in
78+
print("Download progress: \(progress.fractionCompleted * 100)%")
79+
}
80+
)
81+
print("Files downloaded to: \(modelDirectory.path)")
82+
```
83+
84+
### CoreML Integration
85+
86+
The `Models` and `Generation` modules provide handy utilities when working with language models in CoreML. Check out our
87+
example converting and running Mistral 7B using CoreML [here](https://github.com/huggingface/swift-transformers/tree/main/Examples).
88+
89+
The [modernization of Core ML](https://github.com/huggingface/swift-transformers/pull/257) and corresponding examples were primarily contributed by @joshnewnham, @1duo, @alejandro-isaza, @aseemw. Thank you 🙏
5890

5991
## Usage via SwiftPM
6092

0 commit comments

Comments
 (0)