Skip to content

Commit 24462a7

Browse files
authored
Merge branch 'main' into enhance/golangci-lint-configuration
2 parents 0b021fc + 0210106 commit 24462a7

File tree

4 files changed

+39
-21
lines changed

4 files changed

+39
-21
lines changed

README.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,30 @@ the llm-d inference framework.
1212

1313
This provides an "Endpoint Picker (EPP)" component to the llm-d inference
1414
framework which schedules incoming inference requests to the platform via a
15-
[Kubernetes] Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the [Architecture Documentation]).
15+
[Kubernetes] Gateway according to scheduler plugins. For more details on the
16+
llm-d inference scheduler architecture, routing logic, and different plugins
17+
(filters and scorers), including plugin configuration, see the [Architecture Documentation]).
18+
19+
### Relation to GIE (IGW)
1620

1721
The EPP extends the [Gateway API Inference Extension (GIE)] project,
1822
which provides the API resources and machinery for scheduling. We add some
1923
custom features that are specific to llm-d here, such as [P/D Disaggregation].
24+
The two projects collaborate closely as often a feature in llm-d might require
25+
enablement and extensions in the GIE code base.
26+
Unique and experimental features may start in llm-d and migrate, over time, to
27+
GIE. As a project goal, we prefer to upstream functionality to GIE when
28+
- it has matured sufficiently and has proven wide applicability and usefulness; and
29+
- it can be implemented in EPP alone (i.e., llm-d provides a full inference framework,
30+
beyond scheduling).
31+
32+
Note that in general features should go to the upstream [Gateway API Inference
33+
Extension (GIE)] project _first_ if applicable. The GIE is a major dependency of
34+
ours, and where most _general purpose_ inference features live. If you have
35+
something that you feel is general purpose or use, it probably should go to the
36+
GIE. If you have something that's _llm-d specific_ then it should go here. If
37+
you're not sure whether your feature belongs here or in the GIE, feel free to
38+
create a [discussion] or ask on [Slack].
2039

2140
A compatible [Gateway API] implementation is used as the Gateway. The Gateway
2241
API implementation must utilize [Envoy] and support [ext-proc], as this is the
@@ -41,14 +60,6 @@ For large changes please [create an issue] first describing the change so the
4160
maintainers can do an assessment, and work on the details with you. See
4261
[DEVELOPMENT.md](DEVELOPMENT.md) for details on how to work with the codebase.
4362

44-
Note that in general features should go to the upstream [Gateway API Inference
45-
Extension (GIE)] project _first_ if applicable. The GIE is a major dependency of
46-
ours, and where most _general purpose_ inference features live. If you have
47-
something that you feel is general purpose or use, it probably should go to the
48-
GIE. If you have something that's _llm-d specific_ then it should go here. If
49-
you're not sure whether your feature belongs here or in the GIE, feel free to
50-
create a [discussion] or ask on [Slack].
51-
5263
Contributions are welcome!
5364

5465
[create an issue]:https://github.com/llm-d/llm-d-inference-scheduler/issues/new

pkg/sidecar/proxy/connector_lmcache.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ func (s *Server) runLMCacheProtocol(w http.ResponseWriter, r *http.Request, pref
4949
ctx := r.Context()
5050
preq := r.Clone(ctx)
5151

52-
completionRequest["max_tokens"] = 1
53-
completionRequest["max_completion_tokens"] = 1
52+
completionRequest[requestFieldMaxTokens] = 1
53+
completionRequest[requestFieldMaxCompletionTokens] = 1
5454

5555
pbody, err := json.Marshal(completionRequest)
5656
if err != nil {

pkg/sidecar/proxy/connector_nixlv2.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ func (s *Server) runNIXLProtocolV2(w http.ResponseWriter, r *http.Request, prefi
6767
streamValue, streamOk := completionRequest[requestFieldStream]
6868
streamOptionsValue, streamOptionsOk := completionRequest[requestFieldStreamOptions]
6969
maxTokensValue, maxTokensOk := completionRequest[requestFieldMaxTokens]
70+
maxCompletionTokensValue, maxCompletionTokensOk := completionRequest[requestFieldMaxCompletionTokens]
7071

7172
completionRequest[requestFieldKVTransferParams] = map[string]any{
7273
requestFieldDoRemoteDecode: true,
@@ -80,6 +81,7 @@ func (s *Server) runNIXLProtocolV2(w http.ResponseWriter, r *http.Request, prefi
8081
completionRequest[requestFieldStream] = false
8182
delete(completionRequest, requestFieldStreamOptions)
8283
completionRequest[requestFieldMaxTokens] = 1
84+
completionRequest[requestFieldMaxCompletionTokens] = 1
8385

8486
pbody, err := json.Marshal(completionRequest)
8587
if err != nil {
@@ -146,6 +148,10 @@ func (s *Server) runNIXLProtocolV2(w http.ResponseWriter, r *http.Request, prefi
146148
if maxTokensOk {
147149
completionRequest[requestFieldMaxTokens] = maxTokensValue
148150
}
151+
delete(completionRequest, requestFieldMaxCompletionTokens)
152+
if maxCompletionTokensOk {
153+
completionRequest[requestFieldMaxCompletionTokens] = maxCompletionTokensValue
154+
}
149155
completionRequest[requestFieldKVTransferParams] = pKVTransferParams
150156

151157
dbody, err := json.Marshal(completionRequest)

pkg/sidecar/proxy/proxy.go

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -38,16 +38,17 @@ const (
3838
requestHeaderPrefillURL = "x-prefiller-url"
3939
requestHeaderRequestID = "x-request-id"
4040

41-
requestFieldKVTransferParams = "kv_transfer_params"
42-
requestFieldMaxTokens = "max_tokens"
43-
requestFieldDoRemotePrefill = "do_remote_prefill"
44-
requestFieldDoRemoteDecode = "do_remote_decode"
45-
requestFieldRemoteBlockIDs = "remote_block_ids"
46-
requestFieldRemoteEngineID = "remote_engine_id"
47-
requestFieldRemoteHost = "remote_host"
48-
requestFieldRemotePort = "remote_port"
49-
requestFieldStream = "stream"
50-
requestFieldStreamOptions = "stream_options"
41+
requestFieldKVTransferParams = "kv_transfer_params"
42+
requestFieldMaxTokens = "max_tokens"
43+
requestFieldMaxCompletionTokens = "max_completion_tokens"
44+
requestFieldDoRemotePrefill = "do_remote_prefill"
45+
requestFieldDoRemoteDecode = "do_remote_decode"
46+
requestFieldRemoteBlockIDs = "remote_block_ids"
47+
requestFieldRemoteEngineID = "remote_engine_id"
48+
requestFieldRemoteHost = "remote_host"
49+
requestFieldRemotePort = "remote_port"
50+
requestFieldStream = "stream"
51+
requestFieldStreamOptions = "stream_options"
5152

5253
// ConnectorNIXLV2 enables the P/D NIXL v2 protocol
5354
ConnectorNIXLV2 = "nixlv2"

0 commit comments

Comments
 (0)