zenml-io · safoinme · Nov 9, 2025 · Nov 9, 2025 · Nov 9, 2025
diff --git a/docs/book/how-to/deployment/deployment.md b/docs/book/how-to/deployment/deployment.md
@@ -148,6 +148,84 @@ curl -X POST http://localhost:8000/invoke \
   -d '{"parameters": {"city": "London", "temperature": 20}}'
 ```
 
+### Session-aware Invocations
+
+Deployments support session-aware execution so that multiple `/invoke` calls can share context, which is critical for:
+- LLM chat and tool-using agents
+- Multi-step decision flows
+- User-specific short-term state
+
+#### Session Basics
+
+- Each request may include a `session_id`.
+- If no `session_id` is provided, the deployment can generate one and echo it back in `metadata.session_id`.
+- Reusing the same `session_id` lets your steps read/write session-scoped state across invocations.
+- Session state is stored separately from the LLM context window: it's your authoritative server-side memory.
+
+**Key separation**: LLMs have a model context window (tokens in a single prompt). Session state is your durable, structured memory you choose to feed into prompts. Keeping this state compact and curated is standard best practice across frameworks like LangChain/LangGraph.
+
+#### Configuring Sessions
+
+Sessions are controlled via the `deployment_settings.sessions` block:
+
+```yaml
+deployment_settings:
+  sessions:
+    enabled: true           # enable/disable session support
+    ttl_seconds: 3600       # inactivity timeout before a session expires
+    max_state_bytes: 65536  # soft limit for serialized session state
+```
+
+**Fields:**
+
+- `enabled`
+  - `true` (default recommended for LLM/agent-style deployments)
+  - `false` for fully stateless APIs
+- `ttl_seconds`
+  - Per-session inactivity timeout. After this, the session is expired and its state can be garbage-collected.
+  - Choose based on your use case (e.g. 15–30 min for chats, hours for workflows)
+- `max_state_bytes`
+  - A guardrail on how large the serialized `session_state` can get
+  - Prevents unbounded growth, DoS-style misuse, and huge payloads when reloading/saving state
+
+#### Using Session State in Steps
+
+Within a deployed pipeline, steps access session state via `get_step_context()`:
+
+```python
+from zenml import step, get_step_context
+
+@step
+def chat_step(message: str) -> str:
+    ctx = get_step_context()
+    session_state = ctx.session_state  # session dict; persisted if sessions enabled
+
+    history = session_state.setdefault("history", [])
+    history.append({"role": "user", "content": message})
+
+    reply = f"Echo {len(history)}: {message}"
+    history.append({"role": "assistant", "content": reply})
+
+    session_state["last_reply"] = reply
+    return reply
+```
+
+If sessions are disabled, `session_state` behaves as an empty, non-persisted dict so the same code is safe to run.
+
+**Example flow:**
+
+```bash
+# 1st call – server may generate a session_id
+zenml deployment invoke my_deployment --city="London"
+
+# 2nd call – reusing the session
+zenml deployment invoke my_deployment \
+  --city="Berlin" \
+  --session-id="session-12345"
+```
+
+The [weather agent example](https://github.com/zenml-io/zenml/tree/develop/examples/weather_agent) shows how a real pipeline uses `session_state` to keep a running history of weather analyses across turns.
+
 ## Deployment Lifecycle
 
 Once a Deployment is created, it is tied to the specific **Deployer** stack component that was used to provision it and can be managed independently of the active stack as a standalone entity with its own lifecycle.
@@ -415,7 +493,7 @@ curl -X POST http://localhost:8000/invoke \
 
 ## Deployment Initialization, Cleanup and State
 
-It often happens that the HTTP requests made to the same deployment share some type of initialization or cleanup or need to share the same global state or. For example:
+It often happens that the HTTP requests made to the same deployment share some type of initialization or cleanup or need to share the same global state. For example:
 
 * a machine learning model needs to be loaded in memory, initialized and then shared between all the HTTP requests made to the deployment in order to be used by the deployed pipeline to make predictions
 
@@ -461,6 +539,17 @@ The following happens when the pipeline is deployed and then later invoked:
 
 This mechanism can be used to initialize and share global state between all the HTTP requests made to the deployment or to execute long-running initialization or cleanup operations when the deployment is started or stopped rather than on each HTTP request.
 
+{% hint style="info" %}
+**Deployment State vs Session State**
+
+ZenML deployments support two types of state:
+
+- **`pipeline_state`**: Deployment-global state shared across all invocations (e.g., loaded models, DB clients, caches). Set via `on_init` hook, accessed via `get_step_context().pipeline_state`.
+- **`session_state`**: Per-session state that persists across multiple invocations with the same `session_id` (e.g., conversation history, user context). Accessed via `get_step_context().session_state`.
+
+Use `pipeline_state` for expensive resources you want to load once and reuse, and `session_state` for conversational or multi-turn workflows where each session needs its own memory.
+{% endhint %}
+
 ## Deployment Configuration
 
 The deployer settings cover aspects of the pipeline deployment process and specific back-end infrastructure used to provision and manage the resources required to run the deployment servers. Independently of that, `DeploymentSettings` can be used to fully customize all aspects pertaining to the deployment ASGI application itself, including:
@@ -510,13 +599,14 @@ For more detailed information on deployment options, see the [deployment setting
 3. **Return Useful Data**: Design pipeline outputs to provide meaningful responses
 4. **Use Type Annotations**: Leverage Pydantic models for complex parameter types
 5. **Use Global Initialization and State**: Use the `on_init` and `on_cleanup` hooks along with the `pipeline_state` step context property to initialize and share global state between all the HTTP requests made to the deployment. Also use these hooks to execute long-running initialization or cleanup operations when the deployment is started or stopped rather than on each HTTP request.
-5. **Handle Errors Gracefully**: Implement proper error handling in your steps
-6. **Test Locally First**: Validate your deployable pipeline locally before deploying to production
+6. **Keep Session State Small**: For session-aware deployments, store only compact summaries, IDs, and essential context in `session_state`. Move large artifacts (documents, embeddings, full histories) to external storage (vector stores, databases, object storage) and keep only references in session state. This matches best practices from frameworks like LangChain/LangGraph.
+7. **Handle Errors Gracefully**: Implement proper error handling in your steps
+8. **Test Locally First**: Validate your deployable pipeline locally before deploying to production
 
 ## Conclusion
 
 Pipeline deployment transforms ZenML pipelines from batch processing workflows into real-time services. By following the guidelines for deployable pipelines and understanding the deployment lifecycle, you can create robust, scalable ML services that integrate seamlessly with web applications and real-time systems.
 
 See also:
 - [Steps & Pipelines](../steps-pipelines/steps_and_pipelines.md) - Core building blocks
-- [Deployer Stack Component](../../component-guide/deployers/README.md) - The stack component that manages the deployment of pipelines as long-running HTTP servers
+- [Deployer Stack Component](../../component-guide/deployers/README.md) - The stack component that manages the deployment of pipelines as long-running HTTP servers
diff --git a/docs/book/how-to/deployment/deployment_settings.md b/docs/book/how-to/deployment/deployment_settings.md
@@ -127,6 +127,7 @@ Check out [this page](https://docs.zenml.io/concepts/steps_and_pipelines/configu
 `DeploymentSettings` expose the following basic customization options. The sections below provide
 short examples and guidance.
 
+- sessions (session-aware invocations)
 - application metadata and paths
 - built-in endpoints and middleware toggles
 - static files (SPAs) and dashboards
@@ -135,6 +136,48 @@ short examples and guidance.
 - startup and shutdown hooks
 - uvicorn server options, logging level, and thread pool size
 
+### Sessions
+
+Configure session support for deployments that need to maintain state across multiple invocations (e.g., LLM agents, chatbots, multi-turn workflows):
+
+```python
+from zenml.config import DeploymentSettings
+
+settings = DeploymentSettings(
+    sessions={
+        "enabled": True,           # Enable session support
+        "ttl_seconds": 1800,       # 30 minute session timeout
+        "max_state_bytes": 32768,  # 32KB state size limit
+    }
+)
+```
+
+Or in YAML:
+
+```yaml
+settings:
+  deployment:
+    sessions:
+      enabled: true
+      ttl_seconds: 1800
+      max_state_bytes: 32768
+```
+
+**Session Configuration Fields:**
+
+- `enabled` (default: `True`): Enable or disable session support. When enabled, each invocation can include a `session_id` to maintain state across calls.
+- `ttl_seconds` (default: `86400` / 24 hours): Inactivity timeout before a session expires. Choose based on your use case (e.g., 15-30 min for chats, hours for workflows).
+- `max_state_bytes` (default: `65536` / 64 KB): Maximum size for serialized session state. This prevents unbounded growth and potential abuse.
+
+**When to Use Sessions:**
+
+- ✅ LLM chat and conversational agents
+- ✅ Multi-step workflows requiring context
+- ✅ User-specific short-term state
+- ❌ Fully stateless REST APIs
+
+For more details on using session state in your pipeline steps, see [Managing conversational session state](../steps-pipelines/advanced_features.md#managing-conversational-session-state).
+
 ### Application metadata
 
 You can set `app_title`, `app_description`, and `app_version` to be reflected in the ASGI application's metadata:

diff --git a/docs/book/how-to/steps-pipelines/advanced_features.md b/docs/book/how-to/steps-pipelines/advanced_features.md
@@ -674,6 +674,52 @@ def my_step(some_parameter: int = 1):
     raise ValueError("My exception")
 ```
 
+### Managing conversational session state
+
+When a deployment is invoked with sessions enabled, each step can access a per-session dictionary through the step context. This is useful for LLM workflows, agents, or any pipeline that needs to remember information across `/invoke` calls.
+
+#### Understanding Deployment State
+
+ZenML deployments support two types of state:
+
+- **`pipeline_state`**: Deployment-global state shared across all invocations (e.g., loaded models, DB clients, caches). Set via `on_init` hook, accessed via `get_step_context().pipeline_state`.
+- **`session_state`**: Per-session state that persists across multiple invocations with the same `session_id` (e.g., conversation history, user context). Accessed via `get_step_context().session_state`.
+
+This mirrors common LLM/agent designs: small short-term memory (session state) + external long-term memory (vector stores, databases).
+
+#### Using Session State
+
+```python
+from zenml import step, get_step_context
+
+@step
+def agent_turn(message: str) -> str:
+    ctx = get_step_context()
+    session_state = ctx.session_state  # Live dict persisted after the run
+
+    history = session_state.setdefault("history", [])
+    history.append({"role": "user", "content": message})
+
+    # Use external tools/vector DB for heavy context; keep session state light
+    reply = plan_and_call_llm(history=history[-10:], message=message)
+
+    history.append({"role": "assistant", "content": reply})
+    session_state["last_reply"] = reply
+    return reply
+```
+
+#### Best Practices for Session State
+
+- **Keep it compact**: Store summaries, pointers, IDs, and essential context only
+- **Push large artifacts elsewhere**: Documents, embeddings, and full histories belong in databases, vector stores, or object storage
+- **Use size guardrails**: The `deployment_settings.sessions.max_state_bytes` setting (default 64 KB) prevents unbounded growth
+- **Configure TTL appropriately**: Set `ttl_seconds` based on your use case (e.g., 15-30 min for chats, hours for workflows)
+- **Store references, not content**: Keep file paths, document IDs, and embedding keys in session state rather than the actual data
+
+This approach matches best practices from frameworks like LangChain and LangGraph, where short-term working memory is kept small and structured.
+
+If sessions are disabled for a deployment, `ctx.session_state` simply returns an empty dict, so the same code works without extra guards.
+
 ### Using Alerter in Hooks
 
 You can use the [Alerter stack component](https://docs.zenml.io/component-guide/alerters) to send notifications when steps fail or succeed:

diff --git a/examples/weather_agent/pipelines/weather_agent.py b/examples/weather_agent/pipelines/weather_agent.py
@@ -19,7 +19,7 @@
     init_hook,
 )
 from starlette.middleware.gzip import GZipMiddleware
-from steps import analyze_weather_with_llm, get_weather
+from steps import analyze_weather_with_llm, compare_city_trends, get_weather
 
 from zenml import pipeline
 from zenml.config import (
@@ -266,7 +266,7 @@ def on_shutdown(
 )
 def weather_agent(
     city: str = "London",
-) -> tuple[Dict[str, float], str]:
+) -> tuple[Dict[str, float], str, str]:
     """Weather agent pipeline.
 
     Args:
@@ -277,4 +277,5 @@ def weather_agent(
     """
     weather_data = get_weather(city=city)
     result = analyze_weather_with_llm(weather_data=weather_data, city=city)
-    return weather_data, result
+    comparison = compare_city_trends(analysis=result)
+    return weather_data, result, comparison
diff --git a/examples/weather_agent/run.py b/examples/weather_agent/run.py
@@ -14,4 +14,6 @@
     # run = client.get_pipeline_run(run.id)
     if run:
         result = run.steps["analyze_weather_with_llm"].output.load()
+        comparison = run.steps["compare_city_trends"].output.load()
         print(result)
+        print("\n" + comparison)
diff --git a/examples/weather_agent/steps/__init__.py b/examples/weather_agent/steps/__init__.py
@@ -4,9 +4,11 @@
 pipeline.
 """
 
+from .comparison import compare_city_trends
 from .weather_agent import analyze_weather_with_llm, get_weather
 
 __all__ = [
     "analyze_weather_with_llm",
+    "compare_city_trends",
     "get_weather",
-]
+]
diff --git a/examples/weather_agent/steps/comparison.py b/examples/weather_agent/steps/comparison.py
@@ -0,0 +1,35 @@
+"""Comparison steps for the weather agent pipeline."""
+
+from typing import Annotated
+
+from zenml import step
+from zenml.steps import get_step_context
+
+
+@step
+def compare_city_trends(analysis: str) -> Annotated[str, "city_comparison"]:
+    """Return how the current city compares to the previous turn.
+
+    Args:
+        analysis: The analysis of the current city.
+
+    Returns:
+        A string comparing the current city to the previous city.
+    """
+    session_state = get_step_context().session_state
+    history = session_state.get("history", [])
+    if len(history) < 2:
+        return "Not enough history to compare cities yet."
+
+    current = history[-1]
+    previous = history[-2]
+    delta_temp = current["temperature"] - previous["temperature"]
+    delta_humidity = current["humidity"] - previous["humidity"]
+    delta_wind = current["wind_speed"] - previous["wind_speed"]
+
+    return (
+        f"Comparing {current['city']} to {previous['city']}:\n"
+        f"• Temperature change: {delta_temp:+.1f}°C\n"
+        f"• Humidity change: {delta_humidity:+.0f}%\n"
+        f"• Wind speed change: {delta_wind:+.1f} km/h"
+    )
diff --git a/examples/weather_agent/steps/weather_agent.py b/examples/weather_agent/steps/weather_agent.py
@@ -40,6 +40,17 @@ def analyze_weather_with_llm(
     wind = weather_data["wind_speed"]
 
     step_context = get_step_context()
+    session_state = step_context.session_state
+    history = session_state.setdefault("history", [])
+    history.append(
+        {
+            "city": city,
+            "temperature": round(temp, 2),
+            "humidity": humidity,
+            "wind_speed": round(wind, 2),
+        }
+    )
+    session_state["turn_count"] = len(history)
     pipeline_state = step_context.pipeline_state
 
     client = None
@@ -83,7 +94,7 @@ def analyze_weather_with_llm(
 
         llm_analysis = response.choices[0].message.content
 
-        return f"""🤖 LLM Weather Analysis for {city}:
+        return f"""🤖 LLM Weather Analysis for {city} (turn {len(history)}):
 
 {llm_analysis}
 
@@ -138,7 +149,7 @@ def analyze_weather_with_llm(
         if wind > 20:
             warning += " Strong winds - secure loose items."
 
-        return f"""🤖 Weather Analysis for {city}:
+        return f"""🤖 Weather Analysis for {city} (turn {len(history)}):
 
 Assessment: {temp_desc.title()} weather with {humidity}% humidity
 Comfort Level: {comfort}/10