To import data from MySQL into a MS SQL database, we need to first analyze the output of mysqldump, the backup utility of MySQL. I have covered the most obvious differences to T-SQL in my previous post.
Fortunately, the output of mysqldump is line-oriented, i.e. every CREATE TABLE statement, table column or index, and INSERT statement is in a separate line.
In a brute-force approach, we can therefore parse each line using regular expressions, and do not require full parsing involving tokenizer and grammar.
I found a couple of interesting points that need to be considered when generating T-SQL statements:
- As T-SQL did not support “DROP IF EXISTS” before SQL Server 2016, we need to provide the classical variant “IF OBJECT_ID() IS NOT NULL”
- NVARCHAR(MAX) columns cannot be indexed
- BINARY data needs to be 0x-encoded
- UNIQUE INDEXes should not include NULLable columns
- Multi-row INSERT INTO statements cannot contain more than 1000 rows
- ‘0000-00-00’ is a valid date in MySQL, but not in MSSQL. The value is being replaced by ‘1800-01-01’ to avoid the error message
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
Some indexes may require manually editing column collations, e.g. for unique case-sensitive MySQL indexes.
When executing T-SQL statements in SSMS, a number of warnings or errors can occur:
- A warning that indexed data is supported to a maximum length of 960 bytes
- Inserting data may result in the error “String or binary data would be truncated.”
- Some files are too big to execute in SSMS, and need to be executed using sqlcmd
- A batch with too many too large INSERT INTO statements may raise the error
There is insufficient system memory in resource pool ‘default’ to run this query
so we break up the batch using “GO”
- Multiple cascade paths
MS SQL Server does not support ON DELETE CASCADE clauses such that the grandchild record of a parent-child-grandchild relation cannot be uniquely identified. Creating such a FOREIGN KEY raises the error message
Msg 1785, Level 16, State 0, Line 1
Introducing FOREIGN KEY constraint ‘fk_xxx’ on table ‘xxx’ may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
In such cases, either drop the ON DELETE CASCADE if possible, and replace the deletion cascade with an INSTEAD OF DELETE trigger.
I developed mysqldump2mssql, a prototype implementing these rules, which is available on my GitHub. The program has been developed and tested against MySQL dumps of MediaWiki, Joomla, Bugzilla and Nextcloud.
The tool generates up to 4 files for each table defined in the source SQL file:
- create.sql: the CREATE TABLE statement, optional CREATE UNIQUE INDEX statements
- data.sql: the INSERT INTO statements
- fk.sql: the FOREIGN KEY relations to other tables
- fulltext.sql: the CREATE FULLTEXT statement
The filenames of the generated files consist of the sequence number of execution, the table name, and one of the endings mentioned in above table.