|
| 1 | +# Best Practices |
| 2 | + |
| 3 | +## Rule Categories |
| 4 | + |
| 5 | +### Priority A: Essential |
| 6 | + |
| 7 | +The essential rules are the most important ones. |
| 8 | +Use them to ensure that your SpiceDB cluster is performant, your schema is sane, and your authorization logic is sound. |
| 9 | +Exceptions to these rules should be rare and well justified. |
| 10 | + |
| 11 | +### Priority B: Strongly Recommended |
| 12 | + |
| 13 | +The strong recommendation rules will improve the schema design, developer experience, and performance of your SpiceDB cluster. |
| 14 | +In most cases, these rules should be followed. |
| 15 | + |
| 16 | +### Priority C: Recommended |
| 17 | + |
| 18 | +The recommended rules reflect how we would run our own systems, but may not apply to every use case and may not make sense in every situation. |
| 19 | +Follow them if you can and ignore them if you can’t. |
| 20 | + |
| 21 | +## Priority A Rules: Essential |
| 22 | + |
| 23 | +### Make Sure your Schema Fails Closed |
| 24 | + |
| 25 | +Tags: **schema** |
| 26 | + |
| 27 | +This is related to the idea of using negation sparingly, and of phrasing your schema additively. |
| 28 | +Give thought to what happens if your application fails to write a relation: should the user have access in that case? |
| 29 | +The answer is almost always `no`. |
| 30 | + |
| 31 | +This example is very simple, but illustrates the basic point: |
| 32 | + |
| 33 | +#### Avoid |
| 34 | + |
| 35 | +This schema starts with everyone having access and reduces it as you add users to the deny list. |
| 36 | +If you fail to write a user to the deny list, they'll have access when they shouldn't: |
| 37 | + |
| 38 | +```zed |
| 39 | +definition user {} |
| 40 | +
|
| 41 | +definition resource { |
| 42 | + relation public: user:* |
| 43 | + relation deny: user |
| 44 | +
|
| 45 | + permission view = public - deny |
| 46 | +} |
| 47 | +``` |
| 48 | + |
| 49 | +#### Prefer |
| 50 | + |
| 51 | +By contrast, this schema defaults to nobody having access, and therefore fails closed: |
| 52 | + |
| 53 | +```zed |
| 54 | +definition user {} |
| 55 | +
|
| 56 | +definition resource { |
| 57 | + relation user: user |
| 58 | +
|
| 59 | + permission view = user |
| 60 | +} |
| 61 | +``` |
| 62 | + |
| 63 | +This is an admittedly simple example, but the concept holds in more complex schemas. |
| 64 | +This will also sometimes require a conversation about the business logic of your application. |
| 65 | + |
| 66 | +### Tune Connections to Datastores |
| 67 | + |
| 68 | +Tags: **operations** |
| 69 | + |
| 70 | +To size your SpiceDB connection pools, start by determining the maximum number of allowed connections based on the documentation for your selected datastore, divide that number by the number of SpiceDB pods you’ve deployed, then split it between read and write pools. |
| 71 | + |
| 72 | +Use these values to set the `--datastore-conn-pool-read-max-open` and `--datastore-conn-pool-write-max-open` flags, and set the corresponding min values to half of each, adjusting as needed based on whether your workload leans more heavily on reads or writes. |
| 73 | + |
| 74 | +#### Example |
| 75 | + |
| 76 | +Let's say you have a database instance that supports 200 connections, and you know that you read more than you write. |
| 77 | +You have 4 SpiceDB instances in your cluster. |
| 78 | +A starting point for tuning this might be: |
| 79 | + |
| 80 | +```sh |
| 81 | +spicedb serve |
| 82 | +# other flags here |
| 83 | +--datastore-conn-pool-read-max-open 30 |
| 84 | +--datastore-conn-pool-read-min-open 15 |
| 85 | +--datastore-conn-pool-write-max-open 20 |
| 86 | +--datastore-conn-pool-write-min-open 10 |
| 87 | +``` |
| 88 | + |
| 89 | +This reserves 50 connections per SpiceDB instance and distributes them accordingly. |
| 90 | + |
| 91 | +The `pgxpool_empty_acquire` metric can help you understand if your SpiceDB pods are starved for connections if you're using Postgres or Cockroach. |
| 92 | + |
| 93 | +### Test Your Schema |
| 94 | + |
| 95 | +Tags: **schema** |
| 96 | + |
| 97 | +You should be testing the logic of your schema to ensure that it behaves the way you expect. |
| 98 | + |
| 99 | +- For unit testing and TDD, use test relations + assertions and [zed validate](https://authzed.com/docs/spicedb/modeling/validation-testing-debugging#zed-validate). |
| 100 | +- For snapshot testing, use test relations + expected relations and [zed validate](https://authzed.com/docs/spicedb/modeling/validation-testing-debugging#zed-validate). |
| 101 | +- For integration testing, use the SpiceDB test server with SpiceDB [serve-testing](https://authzed.com/docs/spicedb/modeling/validation-testing-debugging#integration-test-server). |
| 102 | + |
| 103 | +### Prefer Relations to Caveats |
| 104 | + |
| 105 | +Tags: **schema** |
| 106 | + |
| 107 | +If an authorization concept can be expressed using relations, it should be. |
| 108 | +We provide caveats as an escape hatch; they should only be used for context that’s only available at request time, or else ABAC logic that cannot be expressed in terms of relationships. |
| 109 | + |
| 110 | +This is because caveats come with a performance penalty. |
| 111 | +A caveated relationship is both harder to cache and also slows down computation of the graph walk required to compute a permission. |
| 112 | + |
| 113 | +Some examples: |
| 114 | + |
| 115 | +- A banlist - this could be expressed as a list in caveat context, but it can also be expressed as a relation with negation. |
| 116 | +- A notion of public vs internal - boolean flags seem like an obvious caveat use case, but they can also be expressed using self relations. |
| 117 | +- Dynamic roles - these could be expressed as a list in caveats, and it’s not immediately obvious how to build them into a SpiceDB schema, but our [Google Cloud IAM example](https://authzed.com/blog/google-cloud-iam-modeling) shows how it’s possible. |
| 118 | + |
| 119 | +### Make Your Writes Idempotent |
| 120 | + |
| 121 | +Tags: **application** |
| 122 | + |
| 123 | +Relations in SpiceDB are binary (a relation is present or it's not), and `WriteRelationships` calls are atomic. |
| 124 | +As much as possible, we recommend that you use the [`TOUCH`](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.RelationshipUpdate) semantic for your write calls, because it means that you can easily retry writes and recover from failures. |
| 125 | + |
| 126 | +If you’re concerned about sequencing your writes, or your writes have dependencies, we recommend using [preconditions](https://buf.build/authzed/api/docs/main:authzed.api.v1#authzed.api.v1.Precondition). |
| 127 | + |
| 128 | +### Don’t truncate your tables when running Postgres |
| 129 | + |
| 130 | +Tags: **operations** |
| 131 | + |
| 132 | +If you truncate your Postgres table, your SpiceDB pods will become unresponsive until you run SpiceDB datastore repair. |
| 133 | +We recommend either dropping the tables entirely and recreating them with `spicedb datastore migrate head` or deleting the data using a `DeleteRelationships` call instead. |
| 134 | + |
| 135 | +To ensure that every request, whether cached or not, gets a consistent point-in-time view of the underlying data, SpiceDB uses Multi-Version Concurrency Control. |
| 136 | +Some datastores provide this natively; in others we’ve implemented it on top of the datastore. |
| 137 | +In Postgres, the implementation of MVCC depends on the internals of the transaction counter being stored as data in the tables, so if you truncate the relationships table you desync the transaction counter with the stored relationships. |
| 138 | + |
| 139 | +## Priority B Rules: Strongly Recommended |
| 140 | + |
| 141 | +### Understand your consistency needs |
| 142 | + |
| 143 | +Tags: **operations** |
| 144 | + |
| 145 | +SpiceDB gives the user the ability to make tradeoffs between cache performance and up-to-date visibility using [its consistency options](https://authzed.com/docs/spicedb/concepts/consistency). |
| 146 | +In addition to these call-time options, there are also some flags that can provide better cache performance if additional staleness is acceptable. |
| 147 | +For example, by default, SpiceDB sets the Quantization Interval to 5s; check operations are cached within this window when using `minimize_latency` or `at_least_as_fresh` calls. |
| 148 | +Setting this window to be larger increases the ability of SpiceDB to use cached results with a tradeoff of results staying in the cache longer. |
| 149 | +More details about how these flags work together can be found in our [Hotspot Caching blog post](https://authzed.com/blog/hotspot-caching-in-google-zanzibar-and-spicedb). |
| 150 | +To change this value, set the `--datastore-revision-quantization-interval` flag. |
| 151 | + |
| 152 | +When it comes to write consistency, SpiceDB defaults to high safety, |
| 153 | +especially in distributed database writing scenarios, guaranteeing a visibility order. |
| 154 | +Individual datastores may also allow a relaxation of this guarantee, based on your scenario; |
| 155 | +for example, [setting CockroachDB’s overlap strategy](https://authzed.com/docs/spicedb/concepts/datastores#overlap-strategy), |
| 156 | +can let you trade some ordering and consistency guarantees across domains for greatly increased write throughput. |
| 157 | + |
| 158 | +### Use GRPC When Possible |
| 159 | + |
| 160 | +Tags: **application** |
| 161 | + |
| 162 | +SpiceDB can be configured to expose both an [HTTP API](https://authzed.com/docs/spicedb/getting-started/client-libraries#http-clients) and associated Swagger documentation. |
| 163 | +While this can be helpful for initial exploration, we strongly recommend using one of our gRPC-based official client libraries if your networking and calling language support it. |
| 164 | +gRPC is significantly more performant and lower-latency than HTTP, and client-streaming services like ImportBulk can’t be used with the HTTP API. |
| 165 | + |
| 166 | +### Keep Permission Logic in SpiceDB |
| 167 | + |
| 168 | +Tags: **schema** |
| 169 | + |
| 170 | +One of the big benefits to using a centralized authorization system like SpiceDB is that there's one place to look for your authorization logic, and authorization logic isn't duplicated across services. |
| 171 | +It can be tempting to define the authorization logic for an endpoint as being the `AND` or `OR` of the checks of other permissions, especially when the alternative is writing a new schema. |
| 172 | +However, this increases the likelihood of drift across your system, hides the authorization logic for a system in that system's codebase, and increases the load on SpiceDB. |
| 173 | + |
| 174 | +### Avoid Cycles in your Schema |
| 175 | + |
| 176 | +Tags: **schema** |
| 177 | + |
| 178 | +Recursive schemas can be very powerful, but can also lead to large performance issues when used incorrectly. |
| 179 | +A good rule of thumb is, if you need a schema definition to recur, have it refer to itself (e.g., groups can have subgroups). |
| 180 | +Avoid situations where a definition points to a separate definition that, further down the permission chain, points to the original definition by accident. |
| 181 | + |
| 182 | +Avoid: |
| 183 | + |
| 184 | +```zed |
| 185 | +definition user { |
| 186 | + relation org: organization |
| 187 | +} |
| 188 | +
|
| 189 | +definition group { |
| 190 | + relation member: user |
| 191 | +} |
| 192 | +
|
| 193 | +definition organization { |
| 194 | + relation subgroup: group |
| 195 | +} |
| 196 | +``` |
| 197 | + |
| 198 | +Preferred: |
| 199 | + |
| 200 | +```zed |
| 201 | +definition user {} |
| 202 | +
|
| 203 | +definition group { |
| 204 | + relation member: user | group |
| 205 | +} |
| 206 | +``` |
| 207 | + |
| 208 | +### Phrase Permissions Additively/Positively |
| 209 | + |
| 210 | +Tags: **schema** |
| 211 | + |
| 212 | +A more comprehensible permission system is a more secure permission system. |
| 213 | +One of the easiest ways to maintain your authorization logic is to treat permissions as `positive` or `additive`: a user gains permissions when relations are written. |
| 214 | +This reduces the number of ways that permission logic can interact, and prevents the granting of permission accidentally. |
| 215 | + |
| 216 | +In concrete terms, that means use wildcards and negations sparingly. |
| 217 | +Start with no access and build up; don’t start with full access and pare down. |
| 218 | + |
| 219 | +### Use Unique Identifiers for Object Identifiers |
| 220 | + |
| 221 | +Tags: **application** |
| 222 | + |
| 223 | +Because you typically want to centralize your permissions in SpiceDB, that also means that most of the `IDs` of objects in SpiceDB are references to external entities. |
| 224 | +These external entities shouldn't overlap. |
| 225 | +To that end, we recommend either using `UUIDs` or using another identifier from the upstream that you can be sure will be unique, such as the unique sub field assigned to a user token by your IDP. |
| 226 | + |
| 227 | +### Avoid ReadRelationships API |
| 228 | + |
| 229 | +Tags: **application** |
| 230 | + |
| 231 | +The `ReadRelationships` API should be treated as an escape hatch, used mostly for data introspection. |
| 232 | +Using it for permission logic is a code smell. |
| 233 | +All checks and listing of IDs should use `Check`, `CheckBulk`, `LookupResources`, and `LookupSubjects`. |
| 234 | +If you find yourself reaching for the `ReadRelationships` API for permission logic, there's probably a way to modify your schema to use one of the check APIs instead. |
| 235 | + |
| 236 | +### Prefer CheckBulk To LookupResources |
| 237 | + |
| 238 | +Tags: **application** |
| 239 | + |
| 240 | +Both `CheckBulk` and `LookupResources` can be used to determine whether a subject has access to a list of objects. |
| 241 | +Where possible, we recommend `CheckBulk`, because its work is bounded to the list of requested checks, whereas the wrong `LookupResources` call can return the entire world and therefore be slow. |
| 242 | + |
| 243 | +LookupResources generally requires a lot of work, causes a higher load, and subsequently has some of the highest latencies. |
| 244 | +If you need its semantics but its performance is insufficient, we recommend checking out our [Materialize](https://authzed.com/products/authzed-materialize) offering. |
| 245 | + |
| 246 | +## Priority C Rules: Recommended |
| 247 | + |
| 248 | +### Treat Writing Schema like Writing DB Migrations |
| 249 | + |
| 250 | +Tags: **operations** |
| 251 | + |
| 252 | +We recommend treating an update to your SpiceDB schema as though it were a database migration. |
| 253 | +Keep it in your codebase, test it before deployment, and write it to your SpiceDB cluster as a part of your continuous integration process. |
| 254 | +This ensures that updates to your schema are properly controlled. |
| 255 | + |
| 256 | +### Load Test |
| 257 | + |
| 258 | +Tags: **operations** |
| 259 | + |
| 260 | +To evaluate the performance and capabilities of your SpiceDB cluster and its underlying datastore, AuthZed provides [Thumper](https://github.com/authzed/thumper) — a load testing tool. |
| 261 | +You can use Thumper to simulate workloads and validate schema updates before deploying them to a production environment. |
| 262 | + |
| 263 | +### Use ZedTokens and “At Least As Fresh” for Best Caching |
| 264 | + |
| 265 | +Tags: **application** |
| 266 | + |
| 267 | +SpiceDB’s fully consistent mode (`fully_consistent`) forces the use of the most recent datastore revision, which might not be the most optimal, and reduces cache hit rate, increasing latency and load on the datastore. |
| 268 | + |
| 269 | +If possible, we recommend using `at_least_as_fresh` with `ZedTokens` instead. |
| 270 | +Capture the `ZedToken` returned by your initial request, then include it in all subsequent calls. |
| 271 | +SpiceDB will guarantee you see a state at least as fresh as that token while still leveraging in-memory and datastore caches to deliver low-latency responses |
| 272 | + |
| 273 | +### Prefer Checking Permissions Instead of Relationships |
| 274 | + |
| 275 | +Tags: **application** |
| 276 | + |
| 277 | +It's possible to make a `Check` call with a relation as the permission. |
| 278 | +Even in a simple schema, we recommend instead that you have a permission that points at the relation and to check the relation. |
| 279 | +This is because if the logic of the check needs to change, it's easy to change the definition of a permission and difficult to change the definition of a relation (it often requires a data migration). |
| 280 | + |
| 281 | +### Enable schema watch cache |
| 282 | + |
| 283 | +Tags: **operations** |
| 284 | + |
| 285 | +In order to minimize load on the database, you can enable schema watch cache using the flag `--enable-experimental-watchable-schema-cache`. |
| 286 | +The schema watch cache is a mechanism that improves performance and responsiveness by caching the currently loaded schema and watching for changes in real time. |
| 287 | + |
| 288 | +While we recommend enabling this, it isn't enabled by default because it requires additional configuration and knowledge of your datastore. |
| 289 | +For Postgres, [`track_commit_timestamp`](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-TRACK-COMMIT-TIMESTAMP) must be set to `on` for the Watch API to be enabled. |
| 290 | +For Spanner, there are a maximum of 5 changefeeds available globally for a table, and this consumes one of them. |
| 291 | + |
| 292 | +### Use the Operator |
| 293 | + |
| 294 | +Tags: **operations** |
| 295 | + |
| 296 | +To ensure seamless rollouts, upgrades, and schema migrations, it is recommended to use the SpiceDB Kubernetes Operator if you’re using Kubernetes. |
| 297 | +The Operator automates many operational tasks and helps maintain consistency across environments. |
| 298 | +You can find the official documentation for the SpiceDB operator [here](https://authzed.com/docs/spicedb/concepts/operator). |
| 299 | + |
| 300 | +### Ensure that SpiceDB Can Talk To Itself |
| 301 | + |
| 302 | +Tags: **operations** |
| 303 | + |
| 304 | +In SpiceDB, dispatching subproblems refers to the internal process of breaking down a permission check or relationship evaluation into smaller logical components. |
| 305 | +These subproblems are dispatched horizontally between SpiceDB nodes, which shares the workload and increases cache hit rate - this is [SpiceDB’s horizontal scalability](https://authzed.com/blog/consistent-hash-load-balancing-grpc). |
| 306 | +For this to work, the SpiceDB nodes must be configured to be aware of each other. |
| 307 | + |
| 308 | +In our experience, running SpiceDB on Kubernetes with our [Operator](https://authzed.com/docs/spicedb/concepts/operator) is the easiest and best way to achieve this. |
| 309 | +It’s possible to configure dispatch using DNS as well, but non-Kubernetes based dispatching relies upon DNS updates, which means it can become stale if DNS is changing. |
| 310 | +This is not recommended unless DNS updates are rare. |
| 311 | + |
| 312 | +### Choose the Right Load Balancer |
| 313 | + |
| 314 | +Tags: **operations** |
| 315 | + |
| 316 | +In our experience, TCP-level L4 load balancers play more nicely with gRPC clients than HTTP-level L7 load balancers. |
| 317 | +For example, we’ve found that even though AWS Application Load Balancers purport to support gRPC, they have a tendency to drop connections and otherwise misbehave; AWS Network Load Balancers seem to work better. |
| 318 | + |
| 319 | +### Use the Provided Metrics, Traces, and Profiles |
| 320 | + |
| 321 | +Tags: **operations** |
| 322 | + |
| 323 | +To gain deeper insights into the performance of your SpiceDB cluster, the pods expose both Prometheus metrics and `pprof` profiling endpoints. |
| 324 | +You can also configure tracing to export data to compatible OpenTelemetry backends. |
| 325 | + |
| 326 | +- Refer to the [SpiceDB Prometheus documentation](https://authzed.com/docs/spicedb/ops/observability#prometheus) for details on collecting metrics. |
| 327 | + - AuthZed Cloud supports exporting metrics to Datadog via the official [AuthZed cloud datadog integration](https://docs.datadoghq.com/integrations/authzed_cloud/). |
| 328 | + - To gain a complete picture of your SpiceDB cluster’s performance, it’s important to export metrics from the underlying datastore. |
| 329 | + These metrics help identify potential bottlenecks and performance issues. |
| 330 | + AuthZed Cloud provides access to both CockroachDB and PostgreSQL metrics via its cloud telemetry endpoints, enabling deeper visibility into database behavior. |
| 331 | +- The [profiling documentation](https://authzed.com/docs/spicedb/ops/observability#profiling) explains how to use the pprof endpoints. |
| 332 | +- The [tracing documentation](https://authzed.com/docs/spicedb/ops/observability#opentelemetry-tracing) walks you through sending trace data to a Jaeger endpoint. |
| 333 | + |
| 334 | +### Use Partials + Composable Schema to Organize your Schema |
| 335 | + |
| 336 | +Tags: **schema** |
| 337 | + |
| 338 | +As a schema grows in size and complexity, it can become difficult to navigate and grok. |
| 339 | +We implemented [Composable Schemas](https://authzed.com/docs/spicedb/modeling/composable-schemas) to solve this problem, allowing you to break down a schema into multiple files and definitions into multiple problems. |
| 340 | + |
| 341 | +### Don't Re-Use Permissions Across Use Cases |
| 342 | + |
| 343 | +Tags: **schema** |
| 344 | + |
| 345 | +When adding a new feature or service, it can be tempting to re-use existing permissions that currently match the semantics you’re looking for, rather than doing the work of modifying the schema to introduce a new permission. |
| 346 | +However, if the authorization business logic changes between use cases, you’ll not only have to do the work of modifying the permission, but also modifying the call site, so we recommend frontloading that work. |
| 347 | + |
| 348 | +### Use Expiration Feature for Expiration Logic |
| 349 | + |
| 350 | +Tags: **schema** |
| 351 | + |
| 352 | +Expiration is a common use case – at some future time, a permission is revoked. |
| 353 | +It’s so common, it’s now [a built-in feature](https://authzed.com/docs/spicedb/concepts/expiring-relationships), and is far more efficient for SpiceDB to handle than doing the same with caveats! |
0 commit comments