PostgreSQL is one of the most advanced open-source databases, offering a plethora of features, one of which is data replication. Data replication is crucial for ensuring data availability and reliability. In PostgreSQL, there are two main types of replication mechanisms:
1. Physical Replication Mechanism
In physical replication, data from a cluster is copied from the primary server to a standby system. This method involves transferring the write-ahead log (WAL) to the standby server.
Types of Standby Systems:
- Hot Standby: Keeps the standby database updated with the primary and is used to execute read data queries.
- Warm Standby: Keeps the standby database updated, but does not allow client connections.
- Cold Standby: Keeps the standby database updated and only starts when the primary server has failed.
2. Logical Replication Mechanism
Logical replication replicates data objects and their changes based on their replication identity or primary key. Unlike physical replication, logical replication does not physically transfer write-ahead logs to the standby server. Instead, it allows replication at the level of tables, columns, or rows, offering fine-grained control over both data replication and security.
How Logical Replication Works:
Logical replication uses a publish-and-subscribe model. It can consolidate multiple databases into a single database, allowing multiple publishers to be subscribed to by a single subscriber or multiple subscribers to create several standby databases.
- Snapshots: Used to transfer data from the publisher to the subscriber, enabling real-time replication.
- Publication: Created in the primary database (publisher), specifying the data to be replicated.
- Subscription: Created in the standby database (subscriber), defining the connection to the publisher and the set of publications it subscribes to.
Publications in Logical Replication
By default, all operation types are replicated, but publications can limit the changes to be replicated. Any combination of INSERT
, UPDATE
, DELETE
, and TRUNCATE
operations can be published.
- Replica Identity: A key for a table used during updates and deletes in the subscriber’s table. By default, it is the primary key. If the table lacks a unique or primary key, the entire row serves as the replica identity.
- Commands:
- Create Publication:
CREATE PUBLICATION
command is used to create a publication. - Drop Publication:
DROP PUBLICATION
command is used to remove a publication. - Alter Publication:
ALTER PUBLICATION
command is used to modify an existing publication.
- Create Publication:
Subscriptions in Logical Replication
Subscriptions define the connection to another database and a set of publications to subscribe to.
- Subscriber Behavior: The subscriber database functions like any other PostgreSQL instance and can act as a publisher for other databases by defining its publications.
- Commands:
- Create Subscription:
CREATE SUBSCRIPTION
command adds a subscription. - Alter Subscription:
ALTER SUBSCRIPTION
command stops/resumes a subscription. - Drop Subscription:
DROP SUBSCRIPTION
command removes a subscription.
- Create Subscription:
- Schema Definitions: Schema definitions executed in the publisher are not replicated, and the published tables must exist on the subscriber. Tables and columns are matched by name, not by order.
- Conflicts: If incoming data violates constraints, replication stops, requiring manual conflict resolution.
When is Logical Replication Useful?
Logical replication is beneficial in specific scenarios. Here are some use cases:
- Consolidating multiple databases into a single database for analytical purposes.
- Replicating data between different major versions of PostgreSQL.
- Sending incremental changes in a single database or a subset to other databases.
- Providing access to replicated data to different user groups.
- Sharing a subset of the database between multiple databases.
Conclusion
Logical replication in PostgreSQL is straightforward to implement and enables the creation of multiple standby database backups with real-time data. It enhances application performance by facilitating continuous reading and writing to the database.
Written by: Nagesh Kshirsagar
Edited by: Thailambal PN
Table of Contents