Skip to content

Table Partitioning

Overview

Table partitioning is a technique that divides a single large table into smaller, more manageable parts. In a partitioned table, data is divided into different partitions based on the values of specified columns (partition keys). Each partition is independent and can be managed individually, including backup and recovery, index maintenance, queries, etc.

Here are some advantages of using table partitioning:

  • Improved query performance: Queries can only access partitions relevant to the query conditions, reducing the amount of data scanned. When query conditions include partition keys, query performance can be greatly improved.
  • Better data management: Table partitioning makes data more manageable. For example, partitions can split data by time, geographic location, or other relevant columns, making data backup and recovery easier and more efficient.
  • Better data availability: When a partition fails, only that partition is affected, while data in other partitions remains available. This improves system availability and fault tolerance.
  • Better maintenance: Using table partitioning allows database administrators to more easily maintain data. For example, specific partitions can be backed up or indexes rebuilt without affecting the entire table.

In summary, table partitioning is a very useful technique that makes data more manageable, maintainable, and can improve query performance and data availability.

Informat supports enabling table partitioning for data tables. Informat will automatically split the source data table into partitioned tables according to the table partitioning strategy. When writing data, it will be written to different partitioned tables according to the partitioning strategy.

Table Partitioning Strategies

  • Range Partitioning

    Data is divided into multiple partitions according to continuous ranges of the partition key, with each partition containing a range of partition key values. Range partitioning is typically used for data partitioned by time or numeric range. For example, a table partitioned by time can store each month's data in a separate partition.

  • List Partitioning

    Data is divided into multiple partitions according to discrete value lists of the partition key, with each partition containing a value list of the partition key. List partitioning is typically used for data with well-defined value sets. For example, a table partitioned by region can divide data into different partitions, each representing a region.

  • Hash Partitioning

    Data is divided into multiple partitions according to the hash value of the partition key, with each partition having similar row counts and data sizes. Hash partitioning is typically used for random access workloads because it can evenly distribute data across multiple partitions.

When using multi-column partitioning, it is necessary to ensure that data in child tables can only conform to one possible partitioning scenario. Therefore, for multi-column partitioning, the conditions for each child table need to cover all columns and must not have overlapping conditions.

Notes

Using table partitioning can bring many benefits to database applications, but the following points should be noted during use:

  • Selection of partition keys

    Choosing the right partition key is important because it directly affects data distribution and query performance. The partition key should be one of the commonly used query conditions and should be evenly distributed across different partitions.

  • Selection of partition quantity

    The selection of partition quantity depends on factors such as data volume, hardware configuration, and application requirements. In general, the partition quantity should be dynamically adjusted based on data growth rate and query requirements.

  • Selection of indexes

    When using table partitioning, indexes also need to be partitioned. In general, indexes need to be created separately for each partition in the partitioned table to improve query performance.

  • Backup and recovery

    Backing up and recovering partitioned tables requires additional steps. Each partition's data needs to be backed up, and data needs to be recovered partition by partition during recovery.

  • Data migration

    In the scenario of table partitioning, special attention needs to be paid to data migration. When migrating data, the partitioning situation needs to be considered to avoid placing data in the wrong partition or causing data duplication or loss.

In summary, the above points need to be noted when using table partitioning to avoid problems and maximize the advantages of table partitioning.

Converting Between Regular Tables and Partitioned Tables

After enabling table partitioning, Informat will automatically create a partition table and generate partitioned tables according to the set partitioning strategy. After publishing the application, Informat will automatically insert data from the regular table into the partitioned tables, and then delete the regular table. The process of disabling partitioning is the opposite of creation. Generally, partitioning needs to be enabled only when the data volume of the data table is large, and the initialization process of automatically inserting into the partitioned table will be relatively slow.

Table Partitioning Examples

 +------------+
   |  mytable   |
   +------------+
   | id         |
   | created_at |
   | value      |
   +------------+
          |
          | PARTITION BY RANGE (created_at)
          v
+-----------------+    +-----------------+    +-----------------+
| mytable_2019_p1 |    | mytable_2020_p1 |    | mytable_2021_p1 |
+-----------------+    +-----------------+    +-----------------+
| id              |    | id              |    | id              |
| created_at      |    | created_at      |    | created_at      |
| value           |    | value           |    | value           |
+-----------------+    +-----------------+    +-----------------+

In the above example, the mytable table is partitioned by the created_at field and divided into 3 partitions: mytable_2019_p1, mytable_2020_p1, and mytable_2021_p1. Each partition is divided according to time range. When data is inserted into the partitioned table, it will be automatically routed to the corresponding partition based on the value of the created_at field. This can improve query performance and make data management more flexible and efficient.

Partitioning by Range

Date type values are in the format 'yyyy-MM-dd', string type values are in the format 'value', and numeric type values are in the format 1.

Date partitioning example:

Partitioning by List

String type values are in the format 'value', numeric type values are in the format 1, and multiple values are separated by commas.

Partitioning by Hash

Hash partitioning calculates the partition where data is located based on the hash value of the partition key.

Example of creating 10 partitions with a modulus of 10:

text
Hash remainder 0
Hash remainder 1
Hash remainder 2
Hash remainder 3
Hash remainder 4
Hash remainder 5
Hash remainder 6
Hash remainder 7
Hash remainder 8
Hash remainder 9