You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-9Lines changed: 20 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,11 +12,30 @@ the llm-d inference framework.
12
12
13
13
This provides an "Endpoint Picker (EPP)" component to the llm-d inference
14
14
framework which schedules incoming inference requests to the platform via a
15
-
[Kubernetes] Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the [Architecture Documentation]).
15
+
[Kubernetes] Gateway according to scheduler plugins. For more details on the
16
+
llm-d inference scheduler architecture, routing logic, and different plugins
17
+
(filters and scorers), including plugin configuration, see the [Architecture Documentation]).
18
+
19
+
### Relation to GIE (IGW)
16
20
17
21
The EPP extends the [Gateway API Inference Extension (GIE)] project,
18
22
which provides the API resources and machinery for scheduling. We add some
19
23
custom features that are specific to llm-d here, such as [P/D Disaggregation].
24
+
The two projects collaborate closely as often a feature in llm-d might require
25
+
enablement and extensions in the GIE code base.
26
+
Unique and experimental features may start in llm-d and migrate, over time, to
27
+
GIE. As a project goal, we prefer to upstream functionality to GIE when
28
+
- it has matured sufficiently and has proven wide applicability and usefulness; and
29
+
- it can be implemented in EPP alone (i.e., llm-d provides a full inference framework,
30
+
beyond scheduling).
31
+
32
+
Note that in general features should go to the upstream [Gateway API Inference
33
+
Extension (GIE)] project _first_ if applicable. The GIE is a major dependency of
34
+
ours, and where most _general purpose_ inference features live. If you have
35
+
something that you feel is general purpose or use, it probably should go to the
36
+
GIE. If you have something that's _llm-d specific_ then it should go here. If
37
+
you're not sure whether your feature belongs here or in the GIE, feel free to
38
+
create a [discussion] or ask on [Slack].
20
39
21
40
A compatible [Gateway API] implementation is used as the Gateway. The Gateway
22
41
API implementation must utilize [Envoy] and support [ext-proc], as this is the
@@ -41,14 +60,6 @@ For large changes please [create an issue] first describing the change so the
41
60
maintainers can do an assessment, and work on the details with you. See
42
61
[DEVELOPMENT.md](DEVELOPMENT.md) for details on how to work with the codebase.
43
62
44
-
Note that in general features should go to the upstream [Gateway API Inference
45
-
Extension (GIE)] project _first_ if applicable. The GIE is a major dependency of
46
-
ours, and where most _general purpose_ inference features live. If you have
47
-
something that you feel is general purpose or use, it probably should go to the
48
-
GIE. If you have something that's _llm-d specific_ then it should go here. If
49
-
you're not sure whether your feature belongs here or in the GIE, feel free to
50
-
create a [discussion] or ask on [Slack].
51
-
52
63
Contributions are welcome!
53
64
54
65
[create an issue]:https://github.com/llm-d/llm-d-inference-scheduler/issues/new
0 commit comments