diff --git a/docs/architecture/comparison.md b/docs/architecture/comparison.md index c8a5297..75e1b06 100644 --- a/docs/architecture/comparison.md +++ b/docs/architecture/comparison.md @@ -31,7 +31,7 @@ PgDog aims to be the de facto PostgreSQL proxy and pooler. Below is a feature co |-|-|-| | [Manual routing](../features/sharding/manual-routing.md) | Only using comments (regex), doesn't work with prepared statements | :material-check-circle-outline: | | [Automatic routing](../features/sharding/query-routing.md) | No | :material-check-circle-outline: | -| [Primary key generation](../features/sharding/schema_management/primary_keys.md) | No | :material-check-circle-outline: | +| [Primary key generation](../features/sharding/sequences.md) | No | :material-check-circle-outline: | | [Cross-shard queries](../features/sharding/cross-shard-queries/index.md) | No | Partial support | | [COPY](../features/sharding/cross-shard-queries/copy.md) | No | :material-check-circle-outline: | | [Postgres-compatible sharding functions](../features/sharding/sharding-functions.md) | No | Same functions as declarative partitioning | diff --git a/docs/enterprise_edition/index.md b/docs/enterprise_edition/index.md index aa3bbec..a8c85e4 100644 --- a/docs/enterprise_edition/index.md +++ b/docs/enterprise_edition/index.md @@ -86,4 +86,3 @@ PgDog Enterprise is new and in active development. A lot of the features we want | QoS | Quality of service guarantees, incl. throttling on a per-user/database/query level. | | AWS RDS integration | Deploy PgDog on top of AWS RDS, without the hassle of Kubernetes or manual configuration. | | Automatic resharding | Detect hot shards and re-shard data without operator intervention. | -| [Durable two-phase](cross-shard-writes.md) | Rollback / commit abandoned two-phase transactions. | diff --git a/docs/enterprise_edition/insights/query_plans.md b/docs/enterprise_edition/insights/query_plans.md index e343854..d26f979 100644 --- a/docs/enterprise_edition/insights/query_plans.md +++ b/docs/enterprise_edition/insights/query_plans.md @@ -6,7 +6,7 @@ icon: material/chart-timeline For any [running query](active_queries.md) exceeding a configurable time threshold, PgDog will ask Postgres for a query plan. The query plans are stored in their own view, accessible via two methods: 1. [`SHOW QUERY_PLANS`](#admin-database) admin command -2. [Activity](active_queries.md#dashboard) view in the dashboard +2. [Activity](active_queries.md#web-ui) view in the dashboard ## How it works @@ -63,4 +63,4 @@ query_plan_max_age = 15_000 ### Dashboard -The query plans are automatically attached to running queries and sent to the Dashboard via a dedicated connection. They can be viewed in real-time in the [Activity](active_queries.md#dashboard) tab. +The query plans are automatically attached to running queries and sent to the Dashboard via a dedicated connection. They can be viewed in real-time in the [Activity](active_queries.md#web-ui) tab. diff --git a/docs/features/sharding/cross-shard-queries/insert.md b/docs/features/sharding/cross-shard-queries/insert.md index 80bc242..120cf29 100644 --- a/docs/features/sharding/cross-shard-queries/insert.md +++ b/docs/features/sharding/cross-shard-queries/insert.md @@ -94,9 +94,9 @@ VALUES RETURNING *; ``` -However, if you prefer to use sequences instead, you can rely on [database-generated](../schema_management/primary_keys.md) primary keys. +However, if you prefer to use sequences instead, you can rely on [database-generated](../sequences.md) primary keys. -Statements that don't include the primary key in the `INSERT` tuple will be sent to one of the shards, using the same round robin algorithm used for [omnisharded](#omnisharded-tables) tables. The shard will then generate the primary key value using PgDog's [sharded sequences](../schema_management/primary_keys.md#pgdognext_id_seq). +Statements that don't include the primary key in the `INSERT` tuple will be sent to one of the shards, using the same round robin algorithm used for [omnisharded](#omnisharded-tables) tables. The shard will then generate the primary key value using PgDog's [sharded sequences](../schema_management/functions.md#pgdognext_id_seq). For example, assuming the table `users` is sharded on the primary key `id`, omitting it from the `INSERT` statement will send it to only one of the shards: diff --git a/docs/features/sharding/schema_management/.pages b/docs/features/sharding/schema_management/.pages new file mode 100644 index 0000000..7336dee --- /dev/null +++ b/docs/features/sharding/schema_management/.pages @@ -0,0 +1,6 @@ +nav: + - 'manager.md' + - 'functions.md' + - 'index.md' + - 'cache.md' + - 'migrations.md' diff --git a/docs/features/sharding/schema_management/functions.md b/docs/features/sharding/schema_management/functions.md new file mode 100644 index 0000000..0620a6f --- /dev/null +++ b/docs/features/sharding/schema_management/functions.md @@ -0,0 +1,69 @@ +--- +icon: material/function-variant +--- + +# Schema manager functions + +The [schema manager](index.md) uses PL/pgSQL functions to generate shard-aware identifiers and perform other actions inside the database to make sharding work. These functions are documented below. + +## Functions + +### `pgdog.next_id_seq` + +The `pgdog.next_id_seq` function generates a unique, shard-aware `BIGINT` number that can be used as a primary key. It accepts the following arguments: + +| Argument | Data type | Description | +|-|-|-| +| `sequence_name` | `regclass` | The sequence used as basis for generating integers. | +| `table_name` | `regclass` | The partitioned by hash table which is required for `satisfies_hash_partition` to extract the hash data type. | + +If not specified, `table_name` will default to `'pgdog.validator_bigint'::regclass`, so this function can be used with any Postgres sequence. For [sharded sequences](../sequences.md), a special table is created in the `pgdog_internal` schema for each sharded sequence, to avoid lock contention on a single Postgres catalog entity. + +#### Sequence cache + +When looking for the next valid number, `next_id_seq` will consume several values from the sequence in a row. By default, each call to `nextval` requires a write to the WAL, which could be a bit slower than optimal. To mitigate this, we automatically increase the sequence's cache size to 100, which is usually enough to generate the next value entirely in memory. + +### `pgdog.next_uuid_auto` + +The `pgdog.next_uuid_auto` function generates a unique, shard-aware `UUID` value which can be used as a primary key. It accepts no arguments and uses `pgdog.validator_uuid` as basis for calling `satisfies_hash_partition`. + +##### Example + +=== "Function" + ```postgresql + SELECT pgdog.next_uuid_auto(); + ``` +=== "Output" + ``` + next_uuid_auto + -------------------------------------- + f54c49c1-47f6-4ca1-a108-782286e447c3 + ``` + +UUID generation is not a big problem for sharded databases, since clients can generate and provide UUIDs as part of a query. PgDog still supports generating shard-aware UUIDs in the database, so this function can be configured as a default instead of `gen_random_uuid()`, for example: + +```postgresql +CREATE TABLE measurements ( + id UUID PRIMARY KEY DEFAULT pgdog.next_uuid_auto(), + value REAL NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now() +); +``` + +### `pgdog.install_sharded_sequence` + +The `pgdog.install_sharded_sequence` function replaces the `DEFAULT` value for a table column with a call to [`pgdog.next_id_seq`](#pgdognext_id_seq). It accepts the following arguments: + +| Argument | Data type | Description | +|-|-|-| +| `schema_name` | `text` | The name of the schema where the table resides. This is commonly `public` but can be any other schema. | +| `table_name` | `text` | The name of the table that contains the primary key column. | +| `column_name` | `text` | The name of the primary key column. | +| `lock_timeout` | `text` | Maximum amount of time this function call will be allowed to block other queries from accessing this table while it mutates the schema definition. This is set to `1s` by default. | + +Under the hood, this function will create two entities: + +1. A regular Postgres sequence (using `CREATE SEQUENCE`) +2. A copy of the table specified in `table_name` in the `pgdog_internal` schema and set that as the `table_name` argument for `pgdog.next_id_seq` + +One table and sequence is created per column, so it's possible to install multiple sharded sequences into the same table. Creating separate tables for each sharded sequence prevents lock contention on Postgres catalog entities while generating values. diff --git a/docs/features/sharding/schema_management/index.md b/docs/features/sharding/schema_management/index.md index 328a4cc..bb18f9d 100644 --- a/docs/features/sharding/schema_management/index.md +++ b/docs/features/sharding/schema_management/index.md @@ -29,4 +29,4 @@ DDL statements to all shards concurrently, ensuring table and index definitions Primary keys are typically generated automatically by Postgres. We provide pl/PgSQL functions to make this work in sharded databases as well. -[**→ Primary keys**](primary_keys.md) +[**→ Sharded sequences**](../sequences.md) diff --git a/docs/features/sharding/schema_management/manager.md b/docs/features/sharding/schema_management/manager.md index 68f65d6..4b97c20 100644 --- a/docs/features/sharding/schema_management/manager.md +++ b/docs/features/sharding/schema_management/manager.md @@ -19,9 +19,11 @@ The `--database` parameter expects the name of the sharded database in [`pgdog.t | Entity type | Name | Description | |-|-|-| | Schema | `pgdog` | The schema which contains functions and tables used for sharding and synchronization. | +| Schema | `pgdog_internal` | The schema which contains copies of sharded tables and their [sharded sequences](../sequences.md). | | Table | `pgdog.config` | Table with configuration options specific to each shard, e.g., shard number, total number of shards, etc. | -| Function | [`pgdog.next_id_seq`](primary_keys.md#pgdognext_id_seq) | Globally unique ID generation for `BIGINT` columns. | -| Function | [`pgdog.next_uuid_auto`](primary_keys.md#uuids) | Globally unique UUID generation for `UUID` columns. | +| Function | [`pgdog.next_id_seq`](functions.md#pgdognext_id_seq) | Globally unique, shard-aware, ID generation for `BIGINT` columns. Uses [sharded sequences](../sequences.md). | +| Function | [`pgdog.next_uuid_auto`](functions.md#pgdognext_uuid_auto) | Globally unique, shard-aware UUID generation for `UUID` columns. | +| Function | [`pgdog.install_sharded_sequence`](functions.md#pgdoginstall_sharded_sequence) | PL/pgSQL function used to setup a [sharded sequence](../sequences.md) for a primary key. | !!! note PgDog gets all necessary information about shards from its configuration. Unless the configuration files are in `$PWD`, you should pass them as arguments, for example: diff --git a/docs/features/sharding/schema_management/primary_keys.md b/docs/features/sharding/schema_management/primary_keys.md deleted file mode 100644 index e674235..0000000 --- a/docs/features/sharding/schema_management/primary_keys.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -icon: material/key-variant ---- -# Primary keys - -Primary keys are columns with a unique index and a not null constraint. Theoretically, any data type can be used as the primary key, but the common ones are `BIGINT` (specified as `BIGSERIAL` for automatic generation) and `UUID`. - -In sharded databases, primary keys generated on each shard have to be _globally_ unique: no two shards can contain a row with the same value. To make this easy and avoid using external ID generation services, PgDog provides a few pl/PgSQL functions that can do this automatically from inside Postgres. - -!!! note - Make sure to install and enable the [schema manager](manager.md) before using this functionality. - -## How it works - -Take the following table as an example: - -```postgresql -CREATE TABLE users ( - id BIGSERIAL PRIMARY KEY, - email VARCHAR NOT NULL, - created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() -); -``` - -If you run this command through PgDog, this table will be created on all shards. Underneath, Postgres expands `BIGSERIAL` to the following code: - -```postgresql -CREATE TABLE users ( - id BIGINT UNIQUE NOT NULL DEFAULT nextval('users_id_seq'::regclass), - email VARCHAR NOT NULL, - created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() -); -``` - -The `users_id_seq` is a sequence, automatically created by Postgres, that will be used to generate unique values for inserted rows that don't provide one for the `id` column. - -Since each shard has its own sequence, values pulled from it would repeat on all shards, creating duplicate references to different objects in your database. To avoid this, we've written a pl/PgSQL function to replace `nextval` that can generate globally unique values. - -### `pgdog.next_id_seq` - -The function `pgdog.next_id_seq`, installed by default if you're using our [schema manager](manager.md), accepts a sequence and returns a unique and valid value for each shard it's executed on. For example: - -=== "Function" - ```postgresql - SELECT pgdog.next_id_seq('users_id_seq'::regclass) - ``` - -=== "Output" - ``` - next_id_seq - ------------- - 13 - ``` - -The function consumes values from the sequence until it finds one that satisfies the [sharding function](../sharding-functions.md) and the shard number of the current database. To make use of it, set it as the default value for your table's primary key, like so: - -```postgresql -ALTER TABLE users -ALTER COLUMN id SET DEFAULT pgdog.next_id_seq('users_id_seq'::regclass); -``` - -#### Sequence cache - -When looking for the next valid number, `next_id_seq` will consume several values from the sequence in a row. By default, each call to `nextval` requires a write to the WAL, which could be a bit slower than optimal. To mitigate this, increase the sequence's cache: - -```postgresql -ALTER SEQUENCE users_id_seq CACHE 250; -``` - -This will keep 250 values of the sequence in memory instead of on disk. If you're deploying a large number of shards, increase the cache size accordingly. - -### UUIDs - -Since UUIDs are randomly generated, they don't need a sequence to guarantee uniqueness. If you're using UUIDs as sharding keys and don't want to generate them in your application, you can use one of our pl/PgSQL functions to create valid values on each shard in the cluster: - -=== "Function" - ```postgresql - SELECT pgdog.next_uuid_auto(); - ``` -=== "Output" - ``` - next_uuid_auto - -------------------------------------- - f54c49c1-47f6-4ca1-a108-782286e447c3 - ``` - -Just like with `BIGSERIAL`, you can set this function as the default on a column: - -```postgresql -ALTER TABLE users -ALTER COLUMN uuid SET DEFAULT pgdog.next_uuid_auto(); -``` diff --git a/docs/roadmap.md b/docs/roadmap.md index 91d20ea..c2b8a94 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -99,7 +99,7 @@ Manage [table schema(s)](features/sharding/schema_management/index.md) and ensur | Feature | Status | Notes | |-|-|-| -| [Primary keys](features/sharding/schema_management/primary_keys.md) | :material-calendar-check: | `BIGINT` and `UUID` partially supported for hash-based sharding only. [#386](https://github.com/pgdogdev/pgdog/issues/386). Other data types require cross-shard unique index support. | +| [Primary keys](features/sharding/sequences.md) | :material-calendar-check: | `BIGINT` and `UUID` supported for hash-based sharding only. [#386](https://github.com/pgdogdev/pgdog/issues/386). Other data types require cross-shard unique index support. | | Unique indexes | :material-calendar-check: | Enforce uniqueness constraints across an unsharded column(s). [#439](https://github.com/pgdogdev/pgdog/issues/439). | | `CHECK` constraints | :material-close: | They are generally arbitrary SQL checks and need to be executed prior to row updates. | | Schema validator | :material-calendar-check: | Check that all shards have identical tables, indexes, etc. | diff --git a/mkdocs.yml b/mkdocs.yml index 4808097..8506e95 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -83,7 +83,8 @@ plugins: redirect_maps: 'features/healthchecks.md': 'features/load-balancer/healthchecks.md' 'features/sharding/migrations.md': 'features/sharding/schema_management/migrations.md' - 'features/sharding/primary-keys.md': 'features/sharding/schema_management/primary_keys.md' + 'features/sharding/primary-keys.md': 'features/sharding/sequences.md' + 'features/sharding/schema_management/primary_keys.md': 'features/sharding/sequences.md' 'features/sharding/cross-shard/index.md': 'features/sharding/cross-shard-queries/index.md' extra: