You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BitFaster.Caching uses modern techniques to optimize hit rate, concurrent throughput and latency.
9
8
10
-
# Quick Start
9
+
# Features
11
10
12
-
Please refer to the [wiki](https://github.com/bitfaster/BitFaster.Caching/wiki) for more detailed documentation.
13
-
14
-
## ConcurrentLru
15
-
16
-
`ConcurrentLru` is a light weight drop in replacement for `ConcurrentDictionary`, but with bounded size enforced by a pseudo LRU eviction policy. There are no background threads, no lock contention, lookups are fast and hit rate outperforms a pure LRU in all tested scenarios.
17
-
18
-
Choose a capacity and use just like ConcurrentDictionary, but with bounded size:
11
+
-[ConcurrentLru](https://github.com/bitfaster/BitFaster.Caching/wiki/ConcurrentLru), a lightweight pseudo LRU based on the [2Q](https://www.vldb.org/conf/1994/P439.PDF)) eviction policy. Also with [time based eviction](https://github.com/bitfaster/BitFaster.Caching/wiki/ConcurrentTLru).
12
+
-[ConcurrentLfu](https://github.com/bitfaster/BitFaster.Caching/wiki/ConcurrentLfu), an approximate LFU based on the [W-TinyLFU](https://arxiv.org/pdf/1512.00727.pdf) eviction policy.
13
+
- Configurable [atomic valueFactory](https://github.com/bitfaster/BitFaster.Caching/wiki/Atomic-GetOrAdd) to mitigate [cache stampede](https://en.wikipedia.org/wiki/Cache_stampede).
14
+
- Configurable [thread-safe wrappers for IDisposable](https://github.com/bitfaster/BitFaster.Caching/wiki/IDisposable-and-Scoped-values) cache values.
15
+
- A [builder API](https://github.com/bitfaster/BitFaster.Caching/wiki/Cache-Builders) to easily configure cache features.
16
+
-[SingletonCache](https://github.com/bitfaster/BitFaster.Caching/wiki/SingletonCache) for caching single instance values, such as lock objects.
17
+
- High performance [concurrent counters](https://github.com/bitfaster/BitFaster.Caching/wiki/Metrics).
Please refer to the [wiki](https://github.com/bitfaster/BitFaster.Caching/wiki) for full API documentation, and a complete analysis of hit rate, latency and throughput.
29
22
30
-
ConcurrentDictionary GetOrAdd is not performed atomically. In other words, concurrent requests for the same key can invoke the valueFactory delegate multiple times, and the last write wins.
31
-
32
-
ConcurrentLru can be configured to use atomic GetOrAdd using the ConcurrentLruBuilder:
23
+
# Getting started
24
+
25
+
BitFaster.Caching is installed from NuGet:
33
26
34
-
```csharp
35
-
varlru=newConcurrentLruBuilder<int, SomeItem>()
36
-
.WithCapacity(666)
37
-
.WithAtomicGetOrAdd()
38
-
.Build();
27
+
`dotnet add package BitFaster.Caching`
39
28
40
-
varvalue=lru.GetOrAdd(1, (k) =>newSomeItem(k));
41
-
```
29
+
## ConcurrentLru
42
30
43
-
## Time based eviction
31
+
`ConcurrentLru` is a light weight drop in replacement for `ConcurrentDictionary`, but with bounded size enforced by the TU-Q eviction policy (similar to [2Q](https://www.vldb.org/conf/1994/P439.PDF)). There are no background threads, no global locks, concurrent throughput is high, lookups are fast and hit rate outperforms a pure LRU in all tested scenarios.
44
32
45
-
`ConcurrentTLru` functions the same as `ConcurrentLru`, but entries also expire after a fixed duration since creation or most recent replacement. This can be used to remove stale items. If the values generated for each key can change over time, `ConcurrentTLru` is eventually consistent where the inconsistency window = time to live (TTL).
33
+
Choose a capacity and use just like `ConcurrentDictionary`, but with bounded size:
46
34
47
35
```csharp
48
-
varlru=newConcurrentLruBuilder<int, SomeItem>()
49
-
.WithCapacity(666)
50
-
.WithExpireAfterWrite(TimeSpan.FromMinutes(5))
51
-
.Build();
52
-
53
-
varvalue=lru.GetOrAdd(1, (k) =>newSomeItem(k));
54
-
```
55
-
56
-
## Caching IDisposable values
57
-
58
-
It can be useful to combine object pooling and caching to reduce allocations, using IDisposable to return objects to the pool. All cache classes in BitFaster.Caching own the lifetime of cached values, and will automatically dispose values when they are evicted.
59
-
60
-
To avoid races using objects after they have been disposed by the cache, use `IScopedCache` which wraps values in `Scoped<T>`. The call to `ScopedGetOrAdd` creates a `Lifetime` that guarantees the scoped object will not be disposed until the lifetime is disposed. Scoped cache is thread safe, and guarantees correct disposal for concurrent lifetimes.
`SingletonCache` enables mapping every key to a single instance of a value, and keeping the value alive only while it is in use. This is useful when the total number of keys is large, but few will be in use at any moment and removing an item while in use would result in an invalid program state.
44
+
`ConcurrentLfu` is a drop in replacement for `ConcurrentDictionary`, but with bounded size enforced by the [W-TinyLFU eviction policy](https://arxiv.org/pdf/1512.00727.pdf). `ConcurrentLfu` has near optimal hit rate and high scalability. Reads and writes are buffered then replayed asynchronously to mitigate lock contention.
87
45
88
-
The example below shows how to implement exclusive Url access using a lock object per Url.
46
+
Choose a capacity and use just like `ConcurrentDictionary`, but with bounded size:
*DISCLAIMER: Always measure performance in the context of your application. The results provided here are intended as a guide.*
110
-
111
-
The cache replacement policy must maximize the cache hit rate, and minimize the computational and space overhead involved in implementing the policy. Below an analysis of hit rate vs cache size, latency and throughput is provided.
112
-
113
-
## ConcurrentLru Hit rate
114
-
115
-
The charts below show the relative hit rate of classic LRU vs Concurrent LRU on a [Zipfian distribution](https://en.wikipedia.org/wiki/Zipf%27s_law) of input keys, with parameter *s* = 0.5 and *s* = 0.86 respectively. If there are *N* items, the probability of accessing an item numbered *i* or less is (*i* / *N*)^*s*.
116
-
117
-
Here *N* = 50000, and we take 1 million sample keys. The hit rate is the number of times we get a cache hit divided by 1 million.
118
-
This test was repeated with the cache configured to different sizes expressed as a percentage *N* (e.g. 10% would be a cache with a capacity 5000). ConcurrentLru is configured with the default `FavorFrequencyPartition` to allocate internal queue capacity (see [here](https://github.com/bitfaster/BitFaster.Caching/wiki/ConcurrentLru#how-it-works) for details).
As above, but interleaving a sequential scan of every key (aka sequential flooding). In this case, ConcurrentLru performs significantly better across the board, and is therefore more resistant to scanning.
These charts summarize the percentage increase in hit rate for ConcurrentLru vs LRU. Increase is in hit rate is significant at lower cache sizes, outperforming the classic LRU by over 150% when *s* = 0.5 in the best case for both Zipf and scan access patterns.
In these benchmarks, a cache miss is essentially free. These tests exist purely to compare the raw execution speed of the cache bookkeeping code. In a real setting, where a cache miss is presumably quite expensive, the relative overhead of the cache will be very small.
160
-
161
-
Benchmarks are based on BenchmarkDotNet, so are single threaded. The ConcurrentLru family of classes are composed internally of ConcurrentDictionary.GetOrAdd and ConcurrentQueue.Enqueue/Dequeue method calls, and scale well to concurrent workloads.
162
-
163
-
Benchmark results below are from a workstation with the following config:
164
-
165
-
~~~
166
-
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
167
-
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
The relative ranking of each cache implementation is stable across .NET Framework/Core/5/6 and on the CPU architectures available in Azure (e.g. Intel Skylake, AMD Zen). Absolute performance can vary.
174
-
175
-
### What are FastConcurrentLru/FastConcurrentTLru?
176
-
177
-
These are classes that execute with the hit counting logic eliminated (via JIT). If hit counts are not required, this makes the code around 10% faster.
178
-
179
-
### Lookup keys with a Zipf distribution
180
-
181
-
Take 1000 samples of a [Zipfian distribution](https://en.wikipedia.org/wiki/Zipf%27s_law) over a set of keys of size *N* and use the keys to lookup values in the cache. If there are *N* items, the probability of accessing an item numbered *i* or less is (*i* / *N*)^*s*.
182
-
183
-
*s* = 0.86 (yields approx 80/20 distribution)<br>
184
-
*N* = 500
185
-
186
-
Cache size = *N* / 10 (so we can cache 10% of the total set). ConcurrentLru has approximately the same computational overhead as a standard LRU in this single threaded test.
187
-
188
-
| Method | Mean | Error | StdDev | Ratio | RatioSD |
In this test the same items are fetched repeatedly, no items are evicted. Representative of high hit rate scenario, when there are a low number of hot items.
199
-
200
-
- ConcurrentLru family does not move items in the queues, it is just marking as accessed for pure cache hits.
201
-
- Classic Lru must maintain item order, and is internally splicing the fetched item to the head of the linked list.
202
-
- MemoryCache and ConcurrentDictionary represent a pure lookup. This is the best case scenario for MemoryCache, since the lookup key is a string (if the key were a Guid, using MemoryCache adds string conversion overhead).
203
-
204
-
FastConcurrentLru does not allocate and is approximately 5-10x faster than System.Runtime.Caching.MemoryCache or the newer Microsoft.Extensions.Caching.Memory.MemoryCache.
205
-
206
-
| Method | Runtime | Mean | StdDev | Ratio |Allocated |
In this test, we generate 2000 samples of 500 keys with a Zipfian distribution (s = 0.86). Caches have size 50. From N concurrent threads, fetch the sample keys in sequence (each thread is using the same input keys). The principal scalability limit in concurrent applications is the exclusive resource lock. As the number of threads increases, ConcurrentLru significantly outperforms an LRU implemented with a short lived exclusive lock used to synchronize the linked list data structure.
229
-
230
-
This test was run on a Standard D16s v3 Azure VM (16 cpus), with .NET Core 3.1.
0 commit comments