-
Notifications
You must be signed in to change notification settings - Fork 0
Caching
Caching is implemented with these features in mind:
- Survive app reboots
- Shared across multiple app instances
- Partial cache busting
- Warming up data on boot that is heavily relied on
- Memoizing derivative data into the app memory instead of redis
We don't use any of NestJS's caching. NestJS allows caching per GET request, but that's bad strategy at least for our use case since it doesn't allow partial caching or cache busting. Also, we don't even use the cache-manager NestJS provides, because we need more fine-grained control of the cache (to be more specific, the store client needs to be able to delete by a pattern. More on this later.). Instead of caching GET requests, caching is done on service level, usually leveraging the generic solutions provided by "lower level services" (store.service and rest-client.service at the time of writing this doc). This makes service level communication fast, because each service optimizes the caching of the data it handles.
We use Redis for in-memory caching. This way the cache survives reboots, can be flushed outside the app and can be shared by multiple instances of the app. Use the RedisCacheService for caching - it's in a global module.
Glossary for this chapter:
- writing operation is used for any operations that mutate a resource: "PUT/POST/DELETE", or "update", "create" and "delete".
-
query or a clause is used in the context of the store query model, used in
getPage(query)ofstore.service.ts.
Remote REST sources use the rest-client.service, which can be configured to cache the results with the cache option. All GET responses are then cached, until a writing operation to the same resource is done.
Store service caching works similar as rest-client.services caching for a simple method get(id): it's cached until a writing operation touches the same id.
Things get more complicate for getPage() and methods relying on that (getAll(), findOne()), since we have only subset (a "page") of the whole dataset and the query clauses can be quite expressive. To be able to automatically bust the cache, you need to configure the following for the store service:
-
keysmust be an array of all possible keys in all the possible queries. -
primaryKeysmust be an array of keys that are always defined for a query. The search values for the primary keys in the query can't be of array type.
Also, you can do queries only with the following clauses:
- literals (e.g.
1,"foo",true) - literal arrays (.e.g.
["foo", "bar"], which means "foo or bar") existsnot(exists)range(from: string, to: string)-
and()andnot()composed of the above clauses
Which should cover most use cases. For more complicated queries like and(not(or({ a: 1 }, { b: 2 })), or(and(not({ c: 3 }), { a: 1 }))) you have to write the cache logic to the service itself that is using store.
Take a look at store-service.spec.ts for examples.
We have Redis for cache. We can create arbitrary cache keys, and bust them using asterisk pattern. For example, a key a:1;b:2 could be busted with a:1;b:*. So, we can create cache keys for queries, which denote sets (in terms of mathematical set theory) in strings.
To be able to say which sets are subsets of other sets, we need to first be able to define what is the "whole set space". For this, we need to know all the possible keys of a query. That's why we need the keys in the cache config.
Let there be config with keys ["collectionID", "public", "owner"], and the following queries with their corresponding cache keys:
| Query | Corresponding cache key |
|---|---|
{ collectionID: "HR.61" } |
key_collectionID:;HR.61;key_owner:;*;key_public:;*; |
{ collectionID: ["HR.61", HR.21"] } |
key_collectionID:;HR.21;HR.61;key_owner:;*;key_public:;*; |
Since the queries can be array literals (which are union of the literals), we use ; as a separator in the cache key string. Note that the asterisk here is just notation for "any value". It doesn't have anything to do with pattern matching yet - the keys itself don't have the property of pattern matching. Array values are sorted in the cache key so pattern matching works regardless of the order.
When we do a writing operation, the resource of the operations would be able to bust the cache like so:
| Resource | Corresponding cache key to bust with |
|---|---|
{ collectionID: "HR.61", public: true } |
key_collectionID:*;HR.61;*key_owner:*key_public:*;true;* |
Here * acts with pattern matching: this key would bust both of the aforementioned query keys:
key_collectionID:;HR.61;key_owner:;*;key_public:;*;
matches: *;HR.61;* *** ***;true;*
key_collectionID:;HR.21;HR.61;key_owner:;*;key_public:;*;
matches *******;HR.61;* *** ***;true;*
The cache busting by pattern matching is handled by the RedisCacheService's patternDel() method, which leverages Redis' ability to scan keys by a pattern and then removes the found keys. Also, for removing the keys, we use Redis' unlink operation instead of delete, as it should be more performant for large cache values.
Let's look at a situation where the following queries have been made:
| Query | Corresponding Cache key |
|---|---|
{ collectionID: "HR.61" } |
key_collectionID:;HR.61;key_owner:;*;key_public:;*; |
{ collectionID: ["HR.61", HR.21"] }: |
key_collectionID:;HR.21;HR.61;key_owner:;*;key_public:;*; |
{ public: true }: |
key_collectionID:;*;key_owner:;*;key_public:;true; |
Then, let's perform the following writing operations:
{ collectionID: "HR.61", public: true }
Which would bust the cache with this pattern:
key_collectionID:*;HR.61;*key_owner:*key_public:*;true;*
As you can see, this doesn't work. It busts all queries with collectionID: HR.61 and public: true, but it should bust all queries that have collectionID: HR.61 or public: true. For example a response for a query with public: true can contain values with collectionID: HR.61 and vice versa. So, we'd have to bust all combinations of the resource properties (bust all cache keys with collectionID: HR.61 and all with public: true). The cache must be busted for n = length of keys times, which is too much and doesn't scale. Also, the cache is busted for so large chunks that it's just a bad strategy.
This is why we need primaryKeys. Let's look at the definition again:
primaryKeysmust be an array of keys that are always defined for a query. The search values for the primary keys in the query can't be of array type.
If we treat collectionID as a primary key, we can be sure that a query with key: public queries also collectionID, so it's in the subset of all queries made against collectionID. For all non-primary keys, we just bust all values. Hence, no matter how complicated the query is, we can just bust it always by the primary key.
For example, lets look at these queries and their corresponding cache keys:
| Idx | Query | Corresponding Cache key |
|---|---|---|
| 1. |
{ collectionID: "HR.61", owner: "bilbo" }: |
key_collectionID:;HR.61;key_owner:;bilbo;key_public:;*; |
| 2. |
{ collectionID: "HR.61", owner: "frodo" }: |
key_collectionID:;HR.61;key_owner:;frodo;key_public:;*; |
| 3. |
{ collectionID: "HR.61", owner: "frodo", public: true }: |
key_collectionID:;HR.61;key_owner:;frodo;key_public:;true; |
| 4. |
{ collectionID: ["HR.61", HR.21"], owner: "frodo" }: |
key_collectionID:;HR.21;HR.61;key_owner:;frodo;key_public:;*; |
| 5. |
{ collectionID: "HR.61", public: true }: |
key_collectionID:;*;key_owner:;*;key_public:;true; |
Now let's make writing operation with this resource:
| Resource | Cache pattern to bust | Idxs to bust |
|---|---|---|
{ collectionID: "HR.61", owner: "bilbo" }: |
key_collectionID:*;HR.61;*key_owner:;bilbo;key_public:;*; |
1, 2, 3, 4 |
{ collectionID: "HR.21", owner: "gollum", public: false } |
key_collectionID:*;HR.21;*key_owner:*key_public:*;public;* |
4 |
{ collectionID: "HR.1", owner: "gollum", public: false } |
key_collectionID:*;HR.1;*key_owner:*key_public:*;public;* |
- |
All of these bust the cache as they should: all queries are subspaces of a collectionID query, so if a resource with collectionID is created, we bust all sets that have that collection ID.
Take a look at store-cache.spec.ts for more elaborate examples.
Additional highlights:
- The keys and primary keys can be configured for the whole module, or per each
getPage()(or it's derivative methodsfindOne()andfindAll()) usage. This allows more fine-grained caching per method. - A primary key value can also be
existsornot(exists), since we can deduct from both what the query space is for those clauses("must exists" or "must not exist"). It can't be any more complicated clause containingexiststhough.