Skip to content

Commit eeee367

Browse files
authored
server: fix correct time_ms calculation in prompt_progress (#17093)
* fix: correct time_ms calculation in send_partial_response The time_ms field was incorrectly calculated. The division was happening before the subtraction leading to incorrect values. Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After: (ggml_time_us() - slot.t_start_process_prompt) / 1000 * docs : document time_ms field in prompt_progress
1 parent 64fe17f commit eeee367

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

tools/server/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -512,7 +512,7 @@ These words will not be included in the completion, so make sure to add them to
512512

513513
`timings_per_token`: Include prompt processing and text generation speed information in each response. Default: `false`
514514

515-
`return_progress`: Include prompt processing progress in `stream` mode. The progress will be contained inside `prompt_progress` with 3 values: `total`, `cache` and `processed`. The overall progress is `processed/total`, while the actual timed progress is `(processed-cache)/(total-cache)`. Default: `false`
515+
`return_progress`: Include prompt processing progress in `stream` mode. The progress will be contained inside `prompt_progress` with 4 values: `total`, `cache`, `processed`, and `time_ms`. The overall progress is `processed/total`, while the actual timed progress is `(processed-cache)/(total-cache)`. The `time_ms` field contains the elapsed time in milliseconds since prompt processing started. Default: `false`
516516

517517
`post_sampling_probs`: Returns the probabilities of top `n_probs` tokens after applying sampling chain.
518518

tools/server/server.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3078,7 +3078,7 @@ struct server_context {
30783078
res->progress.total = slot.task->n_tokens();
30793079
res->progress.cache = slot.n_prompt_tokens_cache;
30803080
res->progress.processed = slot.prompt.tokens.size();
3081-
res->progress.time_ms = (ggml_time_us() - slot.t_start_process_prompt / 1000);
3081+
res->progress.time_ms = (ggml_time_us() - slot.t_start_process_prompt) / 1000;
30823082
} else {
30833083
res->content = tkn.text_to_send;
30843084
res->tokens = { tkn.tok };

0 commit comments

Comments
 (0)