Truncate inputs larger than context window instead of crashing #9

ochafik · 2024-11-17T03:06:25Z

This should "fix" #7

Not sure this is doing the right thing tbh, feedback welcome:

Currently silently truncating the input. Making truncation optional (and exploding if the option isn't set) runs the risk of most people never using the option and randomly exploding when they finally use larger inputs (in prod).
Aligned n_batch = n_ubatch = n_ctx to avoid crashes in llama.cpp. Possibly very inefficient? Also, default to the model's n_ctx_train.

Tested w/ all-MiniLM-L6-v2.e4ce9877.q8_0.gguf & nomic-embed-text-v1.5.Q8_0.gguf.

Truncate inputs larger than context window instead of crashing

0bc6cb5

Provide feedback