Skip to content

Commit 2763f81

Browse files
authored
Fix dtype (#517)
* try float16 * update * update
1 parent f1f2304 commit 2763f81

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/compressed_tensors/quantization/quant_args.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ def round_to_quantized_type_dtype(
427427
rounded = torch.clamp(tensor, finfo.min, finfo.max).to(dtype)
428428
else:
429429
iinfo = torch.iinfo(dtype)
430-
rounded = torch.round(torch.clamp(tensor, iinfo.min, iinfo.max))
430+
rounded = torch.round(torch.clamp(tensor, iinfo.min, iinfo.max)).to(dtype)
431431

432432
if cast_to_original_dtype:
433433
return rounded.to(original_dtype)

0 commit comments

Comments
 (0)