Here's a classic scenario
You work on a project that stores transactional data in a database. The application gets deployed to production and early on the performance is great, selecting data from the database is snappy and insert latency goes unnoticed. Over a time period of days/weeks/months the database starts to get bigger and queries slow down.
There are various approaches that can help you make your application and database run faster. A Database Administrator (DBA) will take a look and see that the database is tuned. They offer suggestions to add certain indexes, move logging to separate disk partitions, adjust database engine parameters and verify that the database is healthy. You can also add Provisioned iOPS on EBS volumes or obtain faster (not just separate) disk partitions. This will buy you more time and may resolve this issues to a degree.
At a certain point you realize the data in the database is the bottleneck.
In many application databases some of the data is historical information that becomes less important after a certain amount of time. If you figure out a way to get rid of this data your queries will run faster, backups run quicker and use less disk space. You can delete it but then the data is gone forever. You could run a slew of DELETE statements causing a ton of logging and consuming database engine resources. So how do we get rid of old data efficiently but not lose the data forever?
Table partitioning is a good solution to this very problem. You take one massive table and split it into many smaller tables - these smaller tables are called partitions or child tables. Operations such as backups, SELECTs, and DELETEs can be performed against individual partitions or against all of the partitions. Partitions can also be dropped or exported as a single transaction requiring minimal logging.
Let's start with some terminology that you will see in the remainder of this blog.
Also referred to as a Master Partition Table, this table is the template child tables are created from. This is a normal table, but it doesn't contain any data and requires a trigger (more on this later). There is a one-to-many relationship between a master table and child tables, that is to say that there is one master table and many child tables.
These tables inherit their structure (in other words, their Data Definition Languageor DDLfor short) from the master table and belong to a single master table. The child tables contain all of the data. These tables are also referred to as Table Partitions.
A partition functionis a Stored Procedure that determines which child table should accept a new record. The master table has a trigger which calls a partition function. There are two typical methodologies for routing records to child tables:
- By Date Values- An example of this is purchase order date. As purchase orders are added to the master table, this function is called by the trigger. If you create partitions by day, each child partition will represent all the purchase orders entered for a particular day. This method is covered by this article.
- By_Fixed Values_- An example of this is by geographic location such as states. In this case, you can have 50 child tables, one for each US state. As INSERTs are fired against the master the partition function sorts each new row into one of the child tables. This methodology isn't covered by this article because it doesn't help us remove older data.
Let's Try Configuring Table Partitions!
The example solution demonstrates the following:
- Automatically creating database table partitions based on date
- Schedule the export of older table partitions to compressed flat files in the OS
- Drop the old table partition without impacting performance
- Reload older partitions so they are available to the master partition
Take the time to let that last piece of the solution sink in. Most of the documentation on partitioning I've read through simply uses partitioning to keep the database lean and mean. If you needed older data, you'd have to keep an old database backup. I'll show you how you can keep the database lean and mean through partitioning but also have the data available to you when you need it without db backups.
Commands run in shell as the root user will be prefixed by:
Commands run in a shell as a non-root user, eg. postgres will be prefixed by:
Commands run within the PostgreSQL database system will look as follows:
What you'll need
The examples below use PostgreSQL 9.2 on Engine Yard. You will also need git for installing plsh.
Summary of Steps
Here's a summary of what we are going to do:
- Create a master table
- Create a trigger function
- Create a table trigger
- Create partition maintenance function
- Schedule the partition maintenance
- Reload old partitions as needed
Create a Master Table
For this example we'll be creating a table to store basic performance data (cpu, memory, disk) about a group of servers (server_id) every minute (time).
Notice that in the code above the column name time is in quotes. This is necessary because time is a keyword in PostgreSQL. For more information on Date/Time keywords and functions visit the PostgreSQL Manual.
Create Trigger Function
The trigger function below does the following
- Creates child partition child tables with dynamically generated “CREATE TABLE” statements if the child table does not already exist.
- Partitions (child tables) are determined by the values in the “time” column, creating one partition per calendar day.
- Time is stored in epoch format which is an integer representation of the number of seconds since 1970-01-01 00:00:00+00. More information on Epoch can be found at http://en.wikipedia.org/wiki/Unix_time
- Each day has 86400 seconds, midnight for a particular day is an epoch date divisible by 86400 without a remainder.
- The name of each child table will be in the format of
Create a Table Trigger
Now that the Partition Function has been created an Insert Trigger needs to be added to the Master Table which will call the partition function when new records are inserted.
At this point you can start inserting rows against the Master Table and see the rows being inserted into the correct child table.
Create Partition Maintenance Function
Now let's put the master table on a diet. The function below was built generically to handle the partition maintenance, which is why you won't see any direct syntax for server.
How it works
- All of the child tables for a particular master table are scanned looking for any partitions where the name of the partition corresponds to a date older than 15 days ago.
- Each “too old” partition is exported/dumped to the local file system by calling the db function
myschema.export_partition(text, text). More on this is in the next section.
- If and only if the export to the local filesystem was successful the child table is dropped.
- This function assumes that the folder
/db/partition_dumpexists on the local db server. More on this in the next section. If you are wondering where the partitions are exported to this is where you should look!
Note that the code above uses the
plsh language extension which is explained below. Also note that on our systems bash is located at /bin/bash, this may vary.
That Was Fun, Where are we?
Almost there. So far we've made all the necessary changes within the database to accommodate table partitions:
- Created a new master table
- Created the trigger and trigger function for the master table
- Created partition maintenance functions to export older partitions to the os and drop the old partitions
You could stop here if you'd like and proceed to the section "Now Let's See it in Action" but sure to continue below to configure automated maintenance.
What we have left to do for automated maintenance is:
- Install the plsh extension
- Setup the os to store partition dumps
- Create a cron job to automate the calling of the maintenance partition function
Configure PostgreSQL and OS
Enabling PLSH in PostgreSQL
The PLSH extension is needed for PostgreSQL to execute shell commands. This is used by
myschema.export_partition(text,text) to dynamically create a shell string to execute
pg_dump. Starting as root, execute the following commands,
Create the directory,
Ensure that the postgres user owns the directory, and your deployment user's group has permissions as well so that it can read the files. The default deploy user is ‘deploy' on Engine Yard Cloud.
Even further information on PL/SH can be found in the plsh project's README.
Schedule Partition Maintenance
The commands below will schedule the
partition_maintenance job to run at midnight every day
View the cron jobs for the postgres user to ensure the crontab line is correct:
Make sure your
/db/partition_dump folder is backed up if you are not using an Engine Yard Cloud instance. If you ever need the data again you'll need these files to restore the old partitions. This may be as simple as rsyncing (copying) these files to another server just to be sure. We find that sending these to S3 works well for archival purposes.
Now your master tables are scheduled for partition maintenance and you can rest knowing that you've create something special; a nimble database that will keep itself on a weight loss program!
Reload Old Partitions
If you have separation anxiety from your old data or maybe a dull compliance request landed on your desk then you can reload the old partitions from the file system.
To reload a partition first we navigate to
/db/partition_dump on the local db server and identify the file and then as the postgres user we import the file back into the database.
After the partition file is loaded it will be queryable from the master table. Be aware that when the next time the partition maintenance job runs the newly imported partition will be exported again.
Now Let's See it in Action
Create Child Tables
Let's insert two rows of data to see creation of new child partitions in action. Open a psql session and execute the following:
So what happened? Assuming this is the first time you've run this two new child tables were created, see the comments inline with the sql statement on the child tables that were created. The first insert can be seen by selecting against either the parent or child:
Note the use of double quotes around the child partition table name, they aren't there because the table is inherited, they are there because of hyphen used between the year-month-day.
Perform Partition Maintenance
The two rows we inserted are more than 15 days old. Manually running the partition maintenance job (same job as would be run by cron) will export these two partitions to the os and drop the partitions.
When the job is done you can see the two exported files:
Selecting against the master table should yield 0 rows, the two child tables will also no longer exist.
Reload Exported Partitions
If you want to reload the first child partition from the exported file, gunzip it then reload it using psql:
Selecting against the master table will yield 1 row, the first child table will now exist as well.
Our database files reside on a partition mounted at /db which is separate from our root (‘/') partition.
For more information on PostgreSQL extensions visit the extensions documentation. The database engine doesn't return the number of rows affected correctly (always 0 rows affected) when performing INSERTs and UPDATEs against the master table. If you use Ruby, be sure to adjust your code for the fact that the pg gem won't have the correct value when reporting cmd_tuples. If you are using an ORM then hopefully they adjust for this accordingly.
Make sure you are backing up the exported partition files in
/db/partition_dump, these files lie outside of the database backup path.
The database user that is performing the INSERT against the master table needs to also have DDL permissions to create the child tables.
There is a small performance impact when performing an INSERT against a master table since the trigger function will be executed.
Ensure that you are running the absolute latest point release for your version of PostgreSQL, this ensures you are running the most stable and secure version available.
This solution works for my situation, your requirements may vary so feel free to modify, extend, mutilate, laugh hysterically or copy this for your own use.
One of the original assumptions was the creation of partitions for each 24 hour period, but this can be any interval of time (1 hour, 1 day, 1 week, every 435 seconds) with a few modifications. Part II of this blog post will discuss the necessary changes needed to the partition_maintenance function and table trigger. I'll also explore how to create a second "archive" database that you can use to automatically load old partition data, keeping the primary database lean and mean for everyday use.