Skip to content

Conversation

@gabe-l-hart
Copy link
Collaborator

@gabe-l-hart gabe-l-hart commented Nov 5, 2025

Description

This PR is extracted from #16982 since it's an isolated change that's not strictly related to implementing the SSD algorithm.

The changes in this PR add the ability to print more verbose tensors with llama-eval-callback. There are two higher-verbosity states available:

  • -lv 1: Print all tensors with 8 digits of presion instead of 5
  • -lv >1: Print all tensors with 8 digits of presion instead of 5 AND print all tensor values rather than the slimmed begin/end view (this is VERY verbose!)

Sample Output

Current (no -lv)

./bin/llama-eval-callback -m ~/models/ibm-granite/granite-4.0-300m/ggml-model-Q8_0.gguf --temp 0 -p "Hello world"
...
ggml_debug:              result_norm = (f32)        MUL(norm{1024, 1, 1, 1}, output_norm.weight{1024, 1, 1, 1}}) = {1024, 1, 1, 1}
                                     [
                                      [
                                       [      6.0798,      -0.2350,      -9.4697, ...,      -6.9825,      -0.5371,      -4.7651],
                                      ],
                                     ]
                                     sum = 145.378464
ggml_debug:                 node_931 = (f32)    MUL_MAT(token_embd.weight{1024, 100352, 1, 1}, result_norm{1024, 1, 1, 1}}) = {100352, 1, 1, 1}
                                     [
                                      [
                                       [     60.6613,      56.7787,      32.9983, ...,     -30.6528,     -30.6563,     -30.6577],
                                      ],
                                     ]
                                     sum = -72923.351562
ggml_debug:            result_output = (f32)      SCALE(node_931{100352, 1, 1, 1}, }) = {100352, 1, 1, 1}
                                     [
                                      [
                                       [     15.1653,      14.1947,       8.2496, ...,      -7.6632,      -7.6641,      -7.6644],
                                      ],
                                     ]
                                     sum = -18230.837891
...

-lv 1 / --verbose

./bin/llama-eval-callback -m ~/models/ibm-granite/granite-4.0-300m/ggml-model-Q8_0.gguf --temp 0 -p "Hello world" -lv 1
...
ggml_debug:              result_norm = (f32)        MUL(norm{1024, 1, 1, 1}, output_norm.weight{1024, 1, 1, 1}}) = {1024, 1, 1, 1}
                                     [
                                      [
                                       [  6.07975531,  -0.23501977,  -9.46970558, ...,  -6.98248482,  -0.53711814,  -4.76507378],
                                      ],
                                     ]
                                     sum = 145.378464
ggml_debug:                 node_931 = (f32)    MUL_MAT(token_embd.weight{1024, 100352, 1, 1}, result_norm{1024, 1, 1, 1}}) = {100352, 1, 1, 1}
                                     [
                                      [
                                       [ 60.66127014,  56.77870941,  32.99830627, ..., -30.65284157, -30.65631485, -30.65769768],
                                      ],
                                     ]
                                     sum = -72923.351562
ggml_debug:            result_output = (f32)      SCALE(node_931{100352, 1, 1, 1}, }) = {100352, 1, 1, 1}
                                     [
                                      [
                                       [ 15.16531754,  14.19467735,   8.24957657, ...,  -7.66321039,  -7.66407871,  -7.66442442],
                                      ],
                                     ]
                                     sum = -18230.837891

...

-lv 2

WARNING: It's best to redirect this to a file!

./bin/llama-eval-callback -m ~/models/ibm-granite/granite-4.0-300m/ggml-model-Q8_0.gguf --temp 0 -p "Hello world" -lv 2 > tmp.log
output




…elements

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@gabe-l-hart
Copy link
Collaborator Author

As I was making the PR, I think the -lv 2 is too verbose for almost all circumstances, so --verbose should somehow map to -lv 1 since that's probably what most people will try first. I'll see if it's possible to redirect this since those flags are coming from common.

…idth

Branch: EvalCallbackVerbosity

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@gabe-l-hart gabe-l-hart mentioned this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant