-
struct llama_context {
// decode output (2-dimensional array: [n_outputs][n_vocab])
size_t logits_size = 0; // capacity (of floats) for logits
float * logits = nullptr;I've been trying to understand the Also what is the relationship between llama_batch.logits and llama_context.logits? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Yes, it's just a buffer of floats. Each "output" has
Some PRs which affected how the logits are handled are #6122 (which added some indirection with Let me know if I should clarify further. |
Beta Was this translation helpful? Give feedback.
-
|
Incidentally how does one retrieve the LLAMA_API int32_t llama_n_outputs(llama_context * ctx) {
return ctx->n_outputs;
} |
Beta Was this translation helpful? Give feedback.
llama_context.logitsis allocated inllama_output_reserve:https://github.com/ggerganov/llama.cpp/blob/daa9623ab051a8162ae750b150b9522571b55f21/src/llama.cpp#L15976
Yes, it's just a buffer of floats. Each "output" has
n_vocablogits. They are stored contiguously.llama_batch.logitsis a user-facing API which allows choosing which outputs to calculate. It's an array ofbool.llama_context.logitswill only contain the logits for the tokens corresponding to each truthy value inllama_batch.logits. The log…