Caching

Caching is implemented with these features in mind:

Survive app reboots
Shared across multiple app instances
Partial cache busting
Warming up data on boot that is heavily relied on
Memoizing derivative data into the app memory instead of redis

We don't use any of NestJS's caching. NestJS allows caching per GET request, but that's bad strategy at least for our use case since it doesn't allow partial caching or cache busting. Also, we don't even use the cache-manager NestJS provides, because we need more fine-grained control of the cache (to be more specific, the store client needs to be able to delete by a pattern. More on this later.). Instead of caching GET requests, caching is done on service level, usually leveraging the generic solutions provided by "lower level services" (store.service and rest-client.service at the time of writing this doc). This makes service level communication fast, because each service optimizes the caching of the data it handles.

Technically

We use Redis for in-memory caching. This way the cache survives reboots, can be flushed outside the app and can be shared by multiple instances of the app. Use the RedisCacheService for caching - it's in a global module.

Lower level service solutions

Glossary for this chapter:

writing operation is used for any operations that mutate a resource: "PUT/POST/DELETE", or "update", "create" and "delete".
query or a clause is used in the context of the store query model, used in getPage(query) of store.service.ts.

Rest client service

Remote REST sources use the rest-client.service, which can be configured to cache the results with the cache option. All GET responses are then cached, until a writing operation to the same resource is done.

Store service

Store service caching works similar as rest-client.services caching for a simple method get(id): it's cached until a writing operation touches the same id.

Things get more complicate for getPage() and methods relying on that (getAll(), findOne()), since we have only subset (a "page") of the whole dataset and the query clauses can be quite expressive. To be able to automatically bust the cache, you need to configure the following for the store service:

keys must be an array of all possible keys in all the possible queries.
primaryKeys must be an array of keys that are always defined for a query. The search values for the primary keys in the query can't be of array type.

Also, you can do queries only with the following clauses:

literals (e.g. 1, "foo", true)
literal arrays (.e.g. ["foo", "bar"], which means "foo or bar")
exists
not(exists)
range(from: string, to: string)
and() and not() composed of the above clauses

Which should cover most use cases. For more complicated queries like and(not(or({ a: 1 }, { b: 2 })), or(and(not({ c: 3 }), { a: 1 }))) you have to write the cache logic to the service itself that is using store.

Take a look at store-service.spec.ts for examples.

Technical explanation of the store cache

We have Redis for cache. We can create arbitrary cache keys, and bust them using asterisk pattern. For example, a key a:1;b:2 could be busted with a:1;b:*. So, we can create cache keys for queries, which denote sets (in terms of mathematical set theory) in strings.

To be able to say which sets are subsets of other sets, we need to first be able to define what is the "whole set space". For this, we need to know all the possible keys of a query. That's why we need the keys in the cache config.

Example

Let there be config with keys ["collectionID", "public", "owner"], and the following queries with their corresponding cache keys:

Query	Corresponding cache key
`{ collectionID: "HR.61" }`	`key_collectionID:;HR.61;key_owner:;;key_public:;;`
`{ collectionID: ["HR.61", HR.21"] }`	`key_collectionID:;HR.21;HR.61;key_owner:;;key_public:;;`

Since the queries can be array literals (which are union of the literals), we use ; as a separator in the cache key string. Note that the asterisk here is just notation for "any value". It doesn't have anything to do with pattern matching yet - the keys itself don't have the property of pattern matching. Array values are sorted in the cache key so pattern matching works regardless of the order.

When we do a writing operation, the resource of the operations would be able to bust the cache like so:

Resource	Corresponding cache key to bust with
`{ collectionID: "HR.61", public: true }`	`key_collectionID:;HR.61;key_owner:key_public:;true;*`

Here * acts with pattern matching: this key would bust both of the aforementioned query keys:

key_collectionID:;HR.61;key_owner:;*;key_public:;*;
matches:        *;HR.61;*         ***           ***;true;*

key_collectionID:;HR.21;HR.61;key_owner:;*;key_public:;*;
matches         *******;HR.61;*         ***           ***;true;*

The cache busting by pattern matching is handled by the RedisCacheService's patternDel() method, which leverages Redis' ability to scan keys by a pattern and then removes the found keys. Also, for removing the keys, we use Redis' unlink operation instead of delete, as it should be more performant for large cache values.

Why do we need `primaryKeys`?

Let's look at a situation where the following queries have been made:

Query	Corresponding Cache key
`{ collectionID: "HR.61" }`	`key_collectionID:;HR.61;key_owner:;;key_public:;;`
`{ collectionID: ["HR.61", HR.21"] }`:	`key_collectionID:;HR.21;HR.61;key_owner:;;key_public:;;`
`{ public: true }`:	`key_collectionID:;;key_owner:;;key_public:;true;`

Then, let's perform the following writing operations:

{ collectionID: "HR.61", public: true }

Which would bust the cache with this pattern:

key_collectionID:*;HR.61;*key_owner:*key_public:*;true;*

As you can see, this doesn't work. It busts all queries with collectionID: HR.61 and public: true, but it should bust all queries that have collectionID: HR.61 or public: true. For example a response for a query with public: true can contain values with collectionID: HR.61 and vice versa. So, we'd have to bust all combinations of the resource properties (bust all cache keys with collectionID: HR.61 and all with public: true). The cache must be busted for n = length of keys times, which is too much and doesn't scale. Also, the cache is busted for so large chunks that it's just a bad strategy.

This is why we need primaryKeys. Let's look at the definition again:

primaryKeys must be an array of keys that are always defined for a query. The search values for the primary keys in the query can't be of array type.

If we treat collectionID as a primary key, we can be sure that a query with key: public queries also collectionID, so it's in the subset of all queries made against collectionID. For all non-primary keys, we just bust all values. Hence, no matter how complicated the query is, we can just bust it always by the primary key.

For example, lets look at these queries and their corresponding cache keys:

Idx	Query	Corresponding Cache key
1.	`{ collectionID: "HR.61", owner: "bilbo" }`:	`key_collectionID:;HR.61;key_owner:;bilbo;key_public:;*;`
2.	`{ collectionID: "HR.61", owner: "frodo" }`:	`key_collectionID:;HR.61;key_owner:;frodo;key_public:;*;`
3.	`{ collectionID: "HR.61", owner: "frodo", public: true }`:	`key_collectionID:;HR.61;key_owner:;frodo;key_public:;true;`
4.	`{ collectionID: ["HR.61", HR.21"], owner: "frodo" }`:	`key_collectionID:;HR.21;HR.61;key_owner:;frodo;key_public:;*;`
5.	`{ collectionID: "HR.61", public: true }`:	`key_collectionID:;;key_owner:;;key_public:;true;`

Now let's make writing operation with this resource:

Resource	Cache pattern to bust	Idxs to bust
`{ collectionID: "HR.61", owner: "bilbo" }`:	`key_collectionID:;HR.61;key_owner:;bilbo;key_public:;*;`	1, 2, 3, 4
`{ collectionID: "HR.21", owner: "gollum", public: false }`	`key_collectionID:;HR.21;key_owner:key_public:;public;*`	4
`{ collectionID: "HR.1", owner: "gollum", public: false }`	`key_collectionID:;HR.1;key_owner:key_public:;public;*`	-

All of these bust the cache as they should: all queries are subspaces of a collectionID query, so if a resource with collectionID is created, we bust all sets that have that collection ID.

Take a look at store-cache.spec.ts for more elaborate examples.

Additional highlights:

The keys and primary keys can be configured for the whole module, or per each getPage() (or it's derivative methods findOne() and findAll()) usage. This allows more fine-grained caching per method.
A primary key value can also be exists or not(exists), since we can deduct from both what the query space is for those clauses("must exists" or "must not exist"). It can't be any more complicated clause containing exists though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Caching

Technically

Lower level service solutions

Rest client service

Store service

Technical explanation of the store cache

Example

Why do we need `primaryKeys`?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Uh oh!

Caching

Technically

Lower level service solutions

Rest client service

Store service

Technical explanation of the store cache

Example

Why do we need primaryKeys?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Why do we need `primaryKeys`?