How We Migrated a Large Complex Database from MySQL to PostgreSQL at Gotedo
Recently at the Gotedo Church Management Software, we were faced with the decision to continue using MySQL and implement hierarchical data completely from scratch or migrate to PostgreSQL and take advantage of the ltree
data type for easy handling of hierarchical data. Other reasons included taking advantage of the advanced full-text search capabilities of PostgreSQL and positioning the software for future multi-tenancy requirements with PostgreSQL's multi-schema database design. It would also be easy to perform this kind of huge database engine migration now that the software has few active users. The MySQL database had 114 tables with delicate foreign key relationships.
We use AdonisJS for the Gotedo backend, so the examples will be highlighted with AdonisJS Lucid ORM and JavaScript language. You can easily adapt them to suit your language/framework. Also, at the time of writing this, Gotedo is still using AdonisJS 4.x, so the syntax of the examples will be a bit different from that of AdonisJS 5.x.
Migration Steps
Table Migrations
The first step was refactoring the existing table migration scripts and making them compatible with PostgreSQL. Since AdonisJS uses Knex.js underneath the Lucid ORM, there was minimal refactoring of the migration scripts as Knex.js ensures compatibility with MySQL and PostgreSQL out of the box. However, there were some raw queries in the table migration scripts which needed to be adapted for PostgreSQL.
Generally, the following changes were made to the table migration scripts:
1. No AFTER
or BEFORE
While Creating Columns
PostgreSQL does not support specifying the column before or after which to insert a column.
2. Creation of Full-text Indexes
PostgreSQL GIN
index was used to replace MySQL FULLTEXT
index. Learn more here.
/** @type {import('@adonisjs/lucid/src/Schema')} */
const Schema = use('Schema');
class UserProfileSchema extends Schema {
up() {
- this.raw('ALTER TABLE user_profiles ADD FULLTEXT (full_name)');
+ this.raw(`CREATE INDEX user_profiles_full_name_fulltext_idx ON user_profiles USING GIN (to_tsvector('english', full_name));`);
}
}
module.exports = UserProfileSchema;
3. Creation of BTREE indexes
/** @type {import('@adonisjs/lucid/src/Schema')} */
const Schema = use('Schema');
class ServiceSchema extends Schema {
up() {
- this.raw('ALTER TABLE services ADD INDEX (service_year)');
- this.raw('ALTER TABLE services ADD INDEX (service_month)');
- this.raw('ALTER TABLE services ADD INDEX (service_day)');
+ this.raw('CREATE INDEX IF NOT EXISTS services_service_year_idx ON services (service_year)');
+ this.raw('CREATE INDEX IF NOT EXISTS services_service_month_idx ON services (service_month)');
+ this.raw('CREATE INDEX IF NOT EXISTS services_service_day_idx ON services (service_day)');
}
}
module.exports = ServiceSchema;
4. Generated Columns
PostgreSQL is very strict with the expressions for generated columns. Most expressions which are successful in MySQL will fail in PostgreSQL either due to the error, generation expression is not immutable
or that the function does not exist in PostgreSQL.
In the case of generating substrings from a date
data type, the SUBSTRING
function could not be used due to the immutable expression
error. So, I opted to use the DATE_PART
function - which I must admit was very elegant.
/** @type {import('@adonisjs/lucid/src/Schema')} */
const Schema = use('Schema');
class ServiceSchema extends Schema {
up() {
- this.raw(
- 'ALTER TABLE services ADD COLUMN service_year INT(6) GENERATED ALWAYS AS (SUBSTRING(service_date,1,4)*1) STORED AFTER service_date'
- );
+ this.raw("ALTER TABLE services ADD COLUMN service_year INTEGER GENERATED ALWAYS AS (DATE_PART('year', service_date)) STORED");
- this.raw(
- 'ALTER TABLE services ADD COLUMN service_month INT(4) GENERATED ALWAYS AS (SUBSTRING(service_date,6,2)*1) STORED AFTER service_year'
- );
+ this.raw("ALTER TABLE services ADD COLUMN service_month INTEGER GENERATED ALWAYS AS (DATE_PART('month', service_date)) STORED");
- this.raw(
- 'ALTER TABLE services ADD COLUMN service_day INT(4) GENERATED ALWAYS AS (SUBSTRING(service_date,9,2)*1) STORED AFTER service_month'
- );
+ this.raw("ALTER TABLE services ADD COLUMN service_day INTEGER GENERATED ALWAYS AS (DATE_PART('day', service_date)) STORED");
}
}
module.exports = ServiceSchema;
In the case of generating a stored full_name
column from first name
, last_name
, and middle_name
columns, the IF
, CONCAT
, and CONCAT_WS
functions could not be used. I opted for using the CASE
statement plus PostgreSQL string concatenation operator (||
). See example below.
'use strict';
/** @type {import('@adonisjs/lucid/src/Schema')} */
const Schema = use('Schema');
class UserProfileSchema extends Schema {
up() {
// Using CONCAT_WS causes error: generation expression is not immutable
this.raw(
- "ALTER TABLE user_profiles ADD COLUMN full_name VARCHAR(100) GENERATED ALWAYS AS (IF(middle_name IS NULL, CONCAT(first_name,' ', last_name), CONCAT(first_name,' ',middle_name,' ', last_name))) STORED AFTER last_name"
+ "ALTER TABLE user_profiles ADD COLUMN full_name VARCHAR(100) GENERATED ALWAYS AS (CASE WHEN middle_name IS NULL THEN first_name || ' ' || last_name ELSE first_name || ' ' || middle_name || ' ' || last_name END) STORED"
);
}
}
module.exports = UserProfileSchema;
MySQL to PostgreSQL Data Migration with pgloader
Once the table migrations were successful, it was time to transfer existing data from MySQL to PostgreSQL.
The internet will filled with recommendations for pgloader but this article IS NOT a recommendation for pgloader
. As a matter of fact, if pgloader
had worked for me, I might not have had any need to document this process. However, it could work for others so I will share what I learnt while struggling to make pgloader
work and the challenges I faced.
You can find installation instructions here. For my Macbook M1 laptop, I had to build pgloader
from source for it to work. The build steps include:
- Download the latest release from github.com/dimitri/pgloader/tags.
- Extract compress file.
- Change into the directory of the extracted folder. e.g.
cd pgloader-3.6.3
. - Run the command:
make
and wait for the build process to be completed. - Now, you run pgloader with
./build/bin/pgloader
. - Test the programme with
./build/bin/pgloader --help
.
To get pgloader
to work, I spent hours which ran into days tweaking the configuration file for pgloader
. I arrived at this configuration below. With a configuration file, you can run pgloader
with:
./build/bin/pgloader ~/Documents/Mysql_Psql_Migration.txt
Where ~/Documents/Mysql_Psql_Migration.txt
is the path to the configuration file.
In the configuration file below, uppercase words wrapped with %%
indicate variables to be changed by you. Explanations for all options used in the configuration below can be found on this doc page: Migrating a MySQL Database to PostgreSQL
.
LOAD DATABASE
FROM mysql://%MYSQL_USERNAME%:%MYSQL_PASSWORD%@%MYSQL_HOST%/%MYSQL_DB_NAME%
INTO postgresql://%PG_USERNAME%:%PG_PASSWORD%@%PG_HOST%/%PG_DB_NAME%
WITH disable triggers, create tables, create indexes, foreign keys, uniquify index names, reset sequences, drop indexes, drop schema
SET PostgreSQL PARAMETERS
session_replication_role to 'replica'
SET MySQL PARAMETERS
net_read_timeout = '500', net_write_timeout = '500'
ALTER schema '%DB_NAME%' rename to 'public'
BEFORE LOAD DO
$$ CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; $$
AFTER LOAD DO
$$ SET session_replication_role = 'origin'; $$;
pgloader
stalled each time I attempt to migrate data with it. It never successfully completed the migration. At the point where it stalls, a visual inspection of the tables showed that foreign keys were not created even though all data were inserted. What is relational database without referential integrity!
So, I decided to face my face my fears to write a custom data migration script with AdonisJS Ace command. It turned out to be the best decision. As expected, that also had its own challenges but since I was the one creating the data migration script I had control over every aspect and could fix any errors which surfaced. I will share these experiences below.
Custom Data Migration with AdonisJS
The data migration script is written for AdonisJS 4.x.
Step 1: Create the PostgreSQL Database
This database creation step is very important because you need to define the encoding and collation which affects how PostgreSQL handles sorting, capitalisation, and character sets.
If you need to drop the PostgreSQL database used for the data migration, use the following command. Be careful not to drop another important database.
psql -U DB_USER -c "DROP DATABASE IF EXISTS DB_NAME WITH (FORCE);"
Where DB_USER
and DB_NAME
are variables to be changed. Option FORCE
will close any active connection to the database and allow it to be dropped. Read more here.
The command for creating a database is:
psql -U DB_USER -c "CREATE DATABASE DB_NAME OWNER DB_USER ENCODING 'UTF-8' LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8' template template0;"
The portion of the above
psql
command wrapped with double quotes is the actualCREATE
statement. Learn more about theCREATE
statement here.
There are two options for psql
command above:
-U
: for specifying the authenticating user for the command.-c
: for specifying the actual SQL statement to be ran e.g.CREATE DATABASE DB_NAME OWNER DB_USER ENCODING 'UTF-8' LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8' template template0;
ENCODING
specifies the character set the database supports. Using UTF-8 ensures that the database supports all possible character sets. Learn more here.LC_COLLATE
specifies the collation which controls how query results are sorted.LC_CTYPE
controls how capitalisation is handled within your database.template
specifies which database template should be used as the base/reference point for creating the new database. Iftemplate
is not provided, PSQL will use the template of the maintenance database - usuallytemplate1
- (which might not have the correct collation and encoding settings we want).
Step 2: Prepare the PostgreSQL environmental variables and configuration
For AdonisJS, you will need to add some environment variables for the PostgreSQL database into your .env
file. Make sure you set DB_CONNECTION
to pg
.
DB_CONNECTION=pg
PG_DB_HOST=127.0.0.1
PG_DB_PORT=5432
PG_DB_USER=postgres
PG_DB_PASSWORD=postgres
PG_DB_DATABASE=DB_NAME
Update your config/database.js
file to include the PG
variables:
'use strict';
/** @type {import('@adonisjs/framework/src/Env')} */
const Env = use('Env');
/** @type {import('@adonisjs/ignitor/src/Helpers')} */
const Helpers = use('Helpers');
module.exports = {
/*
|--------------------------------------------------------------------------
| Default Connection
|--------------------------------------------------------------------------
|
| Connection defines the default connection settings to be used while
| interacting with SQL databases.
|
*/
connection: Env.get('DB_CONNECTION', 'sqlite'),
/*
|--------------------------------------------------------------------------
| MySQL
|--------------------------------------------------------------------------
|
| Here we define connection settings for MySQL database.
|
| npm i --save mysql
|
*/
mysql: {
client: 'mysql',
connection: {
host: Env.get('DB_HOST', 'localhost'),
port: Env.get('DB_PORT', ''),
user: Env.get('DB_USER', 'root'),
password: Env.get('DB_PASSWORD', ''),
database: Env.get('DB_DATABASE', 'adonis'),
},
debug: Env.get('DB_DEBUG', 'false') === 'true',
},
/*
|--------------------------------------------------------------------------
| PostgreSQL
|--------------------------------------------------------------------------
|
| Here we define connection settings for PostgreSQL database.
|
| npm i --save pg
|
*/
pg: {
client: 'pg',
connection: {
host: Env.get('PG_DB_HOST', 'localhost'),
port: Env.get('PG_DB_PORT', ''),
user: Env.get('PG_DB_USER', 'root'),
password: Env.get('PG_DB_PASSWORD', ''),
database: Env.get('PG_DB_DATABASE', 'adonis'),
},
debug: Env.get('DB_DEBUG', 'false') === 'true',
},
};
Step 3: Migrate the tables
Now, you are ready to migrate the PostgreSQL tables of the created database. Change into the directory of your AdonisJs application and run:
./node_modules/.bin/adonis migration:refresh
Step 4: Create the Migration Command
Here, we implement the custom data migration script as shown below. The migration script is based on AdonisJs Ace command.
- Create the command
./node_modules/.bin/adonis make:command PostgresqlMigrator
- Register the command in
start/app.js
:
/*
|--------------------------------------------------------------------------
| Commands
|--------------------------------------------------------------------------
|
| Here you store ace commands for your package
|
*/
+ const commands = ['App/Commands/PostgresqlMigrator'];
Step 5: The MySQL to PostgreSQL Migration Script
Open app/Commands/PostgresqlMigrator.js
and override the file with the following:
'use strict';
const Config = use('Config');
const Database = use('Database');
const lodash = require('lodash');
const { Command } = require('@adonisjs/ace');
class PostgresqlMigrator extends Command {
static get signature() {
return 'postgresql:migrator';
}
static get description() {
return 'Migrate MySQL data to PostgreSQL';
}
async handle(args, options) {
const pg = Config.get('database.pg');
// Disable foreign key checks
await Database.connection('pg').raw(`SET session_replication_role = 'replica'`);
// Create a transaction
const trx = await Database.connection('pg').beginTransaction();
// Get all tables on the database
let allTables = await Database.connection('mysql').raw(`SHOW TABLES FROM ${pg.connection.database}`);
allTables = JSON.parse(JSON.stringify(allTables))[0];
allTables = allTables.map(t => Object.values(t)[0]).filter(t => t !== 'adonis_schema');
const generatedColumns = [
{ tableName: 'financial_budget_items', columns: ['projected_total_cost'] },
{ tableName: 'financial_request_items', columns: ['total_cost'] },
{ tableName: 'services', columns: ['service_day', 'service_month', 'service_year'] },
{ tableName: 'service_attendance', columns: ['service_year', 'service_month', 'service_day'] },
{ tableName: 'personal_profiles', columns: ['full_name', 'date_of_birth_year', 'date_of_birth_month', 'date_of_birth_day'] },
{ tableName: 'user_profiles', columns: ['full_name', 'date_of_birth_month', 'date_of_birth_day', 'date_of_birth_year'] },
];
try {
// Transfer data into all tables
for (let i = 0; i < allTables.length; i++) {
const tableName = allTables[i];
let existingData = await Database.connection('mysql').from(tableName).select('*');
existingData = JSON.parse(JSON.stringify(existingData));
const hasGeneratedColumns = generatedColumns.find(e => e.tableName === tableName);
if (hasGeneratedColumns) {
const newExistingData = [];
for (let j = 0; j < existingData.length; j++) {
const data = existingData[j];
newExistingData.push(lodash.omit(data, hasGeneratedColumns.columns));
}
existingData = newExistingData;
}
//console.log(existingData);
// Insert records
await trx.batchInsert(tableName, existingData, 100);
// Set auto-increments
// First exclude all uuid id columns
let dataType = await trx.raw(
`SELECT table_name, column_name, data_type
from information_schema.columns
where table_catalog = :db
and table_schema = 'public'
and table_name = :table limit 1;`,
{ db: pg.connection.database, table: tableName }
);
dataType = JSON.parse(JSON.stringify(dataType));
dataType = dataType ? dataType.rows[0] : null;
console.log(dataType);
if (dataType && dataType.data_type !== 'uuid') {
console.log('this will run');
// Set the auto-increment value
await trx.raw(
`SELECT SETVAL(
(SELECT PG_GET_SERIAL_SEQUENCE(:table, 'id')),
(SELECT MAX("id") FROM :table:));
`,
{ table: tableName }
);
}
}
await trx.commit();
} catch (error) {
console.log(error);
await trx.rollback();
}
await Database.connection('pg').raw(`SET session_replication_role = 'origin'`);
// Without the following line, the command will not exit!
Database.close();
}
}
module.exports = PostgresqlMigrator;
Some explanations of what is going on within the migration script.
- The
handle
method is the entry point into the command. const pg = Config.get('database.pg');
: here, we start by reading thepg
database settings inconfig/database.js
.await Database.connection('pg').raw(`SET session_replication_role = 'replica'`);
: here, we disable foreign key checks during the migration process. This is important because of how intertwined the foreign relationships are. For example, theorganisations
table references theusers
table at the columncreated_by
while theusers
table references theorganisations
table at the columnorganisation_creator
. No matter which table you migrate first, there will always be a referential integrity error.Without disabling foreign key checks, you will get the following error:
{ length: 296, severity: 'ERROR', code: '23503', detail: 'Key (org_size)=(3) is not present in table "organisation_sizes".', hint: undefined, position: undefined, internalPosition: undefined, internalQuery: undefined, where: undefined, schema: 'public', table: 'organisations', column: undefined, dataType: undefined, constraint: 'organisations_org_size_foreign', file: 'ri_triggers.c', line: '2539', routine: 'ri_ReportViolation' }
const trx = await Database.connection('pg').beginTransaction();
: here, we connect to the PostgreSQL database via thepg
connection and begin a new database transaction. The transaction will be used to rollback the migrated data if any error occurs. With this, you do not need to drop and re-create the database after each failure. Since we are merely copying over data with theid
s preserved, referential integrity will be maintained after the migration.Below, we read all tables available on the MySQL database via the
mysql
connection to theallTables
variable, and map over the array to get only the table names while excluding theadonis_schema
table name which is already populated with the table migration status information.let allTables = await Database.connection('mysql').raw(`SHOW TABLES FROM ${pg.connection.database}`); allTables = JSON.parse(JSON.stringify(allTables))[0]; allTables = allTables.map(t => Object.values(t)[0]).filter(t => t !== 'adonis_schema');```js
Below, we define an array which contains all tables with generated columns and the column names of the generated columns. This is used to skip the migration of such columns.
const generatedColumns = [ { tableName: 'financial_budget_items', columns: ['projected_total_cost'] }, { tableName: 'financial_request_items', columns: ['total_cost'] }, { tableName: 'services', columns: ['service_day', 'service_month', 'service_year'] }, { tableName: 'service_attendance', columns: ['service_year', 'service_month', 'service_day'] }, { tableName: 'personal_profiles', columns: ['full_name', 'date_of_birth_year', 'date_of_birth_month', 'date_of_birth_day'] }, { tableName: 'user_profiles', columns: ['full_name', 'date_of_birth_month', 'date_of_birth_day', 'date_of_birth_year'] }, ];
With the
try...catch
block, we loop through theallTables
array of table names.At the statement:
let existingData = await Database.connection('mysql').from(tableName).select('*');
We read/select all columns from the current table from the MySQL connection/database.
We check if the table contains generated columns, then omit the generated columns if
true
.if (hasGeneratedColumns) { const newExistingData = []; for (let j = 0; j < existingData.length; j++) { const data = existingData[j]; newExistingData.push(lodash.omit(data, hasGeneratedColumns.columns)); } existingData = newExistingData; }
Without the above check for generated columns, you will get the follow error:
{ length: 172, severity: 'ERROR', code: '428C9', detail: 'Column "full_name" is a generated column.', hint: undefined, position: undefined, internalPosition: undefined, internalQuery: undefined, where: undefined, schema: undefined, table: undefined, column: undefined, dataType: undefined, constraint: undefined, file: 'rewriteHandler.c', line: '908', routine: 'rewriteTargetListIU' }
We insert the MySQL data into PostgreSQL database via the transaction reference. Knex.js batchInsert function is used to control the number of rows (100 in this case) inserted per batch.
await trx.batchInsert(tableName, existingData, 100);
Since, we are inserting the data with the
id
s from MySQL, we need to set the PostgreSQL sequence (or auto-increment) value to the maximum value for each migrated table. However, mostid
column on the database use theuuid
data type. Trying to set the sequence value on auuid
column will result in an error. So, we have to skip theuuid
columns as well. Theinformation_schema.columns
table hold all information about all tables on the database.Below, for each table, we read the corresponding data type of the
id
column from theinformation_schema.columns
table.let dataType = await trx.raw( `SELECT table_name, column_name, data_type from information_schema.columns where table_catalog = :db and table_schema = 'public' and table_name = :table limit 1;`, { db: pg.connection.database, table: tableName } ); dataType = JSON.parse(JSON.stringify(dataType)); dataType = dataType ? dataType.rows[0] : null;
If the
id
column is not of typeuuid
, we go ahead to set the value of the sequence/auto-increment for the column. Read more aboutPG_GET_SERIAL_SEQUENCE
function here.if (dataType && dataType.data_type !== 'uuid') { console.log('this will run'); // Set the auto-increment value await trx.raw( `SELECT SETVAL( (SELECT PG_GET_SERIAL_SEQUENCE(:table, 'id')), (SELECT MAX("id") FROM :table:)); `, { table: tableName } ); }
Without the above check, you might get error like:
- function max(uuid) does not exist
{ length: 203, severity: 'ERROR', code: '42883', detail: undefined, hint: 'No function matches the given name and argument types. You might need to add explicit type casts.', position: '109', internalPosition: undefined, internalQuery: undefined, where: undefined, schema: undefined, table: undefined, column: undefined, dataType: undefined, constraint: undefined, file: 'parse_func.c', line: '636', routine: 'ParseFuncOrColumn' }
If there are no errors, the transaction will be committed at:
await trx.commit();
.Foreign key checks if restored via
await Database.connection('pg').raw(`SET session_replication_role = 'origin'`);
.Close all opened database connections:
Database.close();
Other possible errors which could be encountered is:
{
length: 136,
severity: 'ERROR',
code: '22P02',
detail: undefined,
hint: undefined,
position: undefined,
internalPosition: undefined,
internalQuery: undefined,
where: "unnamed portal parameter $152 = '...'",
schema: undefined,
table: undefined,
column: undefined,
dataType: undefined,
constraint: undefined,
file: 'uuid.c',
line: '137',
routine: 'string_to_uuid'
}
In this case, check to ensure that you are not trying to insert a string
into a uuid
column. If that is the case, change the column definition to string
type and re-migration the tables.
I hope this helps someone out there. I will appreciate your feedback if you found this article helpful.