Remove Unused References with Visual Studio 2017

I liked the Reference Assistant extension very much, but unfortunately it only works for Visual Studio versions 2010 through 2013, and I always wanted to adapt the extension to later versions of Visual Studio.

Now I did.

So I forked the original code from the repository on GitHub (really originally hosted on CodePlex).

Since I had no experience in working with the VSSDK, I created a dummy project and clicked my way through it, and tried to figure out, where the original code hooked into VS, and how to migrate that code from 2010 to 2017.

One of the main obstacles was the setup of the Options page, which seems to have changed fundamentally. Fortunately, the Extensibility Samples also contain a project named Options covering the Tools->Options dialog.

Of course, migration also means upgrading .Net versions (implied by creating a VS2017 extension project), replacing assembly references with NuGet packages, and a little bit of tuning and tweaking.

You can find the upgraded Reference Assistant on GitHub, the first release version is here.

What I like about this tool is that is lets you preview the changes it is going to make, and exclude references from removal you know are required, as the tool does not consider occasional implied references and assemblies referenced in .config files.

Importing mysqldump into MS SQL Server

To import data from MySQL into a MS SQL database, we need to first analyze the output of mysqldump, the backup utility of MySQL. I have covered the most obvious differences to T-SQL in my previous post.

Fortunately, the output of mysqldump is line-oriented, i.e. every CREATE TABLE statement, table column or index, and INSERT statement is in a separate line.

In a brute-force approach, we can therefore parse each line using regular expressions, and do not require full parsing involving tokenizer and grammar.

I found a couple of interesting points that need to be considered when generating T-SQL statements:

  • As T-SQL did not support “DROP IF EXISTS” before SQL Server 2016, we need to provide the classical variant “IF OBJECT_ID() IS NOT NULL”
  • NVARCHAR(MAX) columns cannot be indexed
  • BINARY data needs to be 0x-encoded
  • UNIQUE INDEXes should not include NULLable columns
  • Multi-row INSERT INTO statements cannot contain more than 1000 rows
  • ‘0000-00-00’ is a valid date in MySQL, but not in MSSQL. The value is being replaced by ‘1800-01-01’ to avoid the error message

    The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.

Some indexes may require manually editing column collations, e.g. for unique case-sensitive MySQL indexes.

When executing T-SQL statements in SSMS, a number of warnings or errors can occur:

  • A warning that indexed data is supported to a maximum length of 960 bytes
  • Inserting data may result in the error “String or binary data would be truncated.”
  • Some files are too big to execute in SSMS, and need to be executed using sqlcmd
  • A batch with too many too large INSERT INTO statements may raise the error

    There is insufficient system memory in resource pool ‘default’ to run this query

    so we break up the batch using “GO”

  • Multiple cascade paths

MS SQL Server does not support ON DELETE CASCADE clauses such that the grandchild record of a parent-child-grandchild relation cannot be uniquely identified. Creating such a FOREIGN KEY raises the error message

Msg 1785, Level 16, State 0, Line 1
Introducing FOREIGN KEY constraint ‘fk_xxx’ on table ‘xxx’ may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

In such cases, either drop the ON DELETE CASCADE if possible, and replace the deletion cascade with an INSTEAD OF DELETE trigger.

I developed mysqldump2mssql, a prototype implementing these rules, which is available on my GitHub. The program has been developed and tested against MySQL dumps of MediaWiki, Joomla, Bugzilla and Nextcloud.

The tool generates up to 4 files for each table defined in the source SQL file:

  • create.sql: the CREATE TABLE statement, optional CREATE UNIQUE INDEX statements
  • data.sql: the INSERT INTO statements
  • fk.sql: the FOREIGN KEY relations to other tables
  • fulltext.sql: the CREATE FULLTEXT statement

The filenames of the generated files consist of the sequence number of execution, the table name, and one of the endings mentioned in above table.

Retrieving Active GitHub Forks

The problem with GitHub repositories is that, whenever there exists forks for a given repository, it is not immediately obvious which forks are active, containing newer commits, and which forks are just-for-fun forks, which should have been Starred rather than Forked.

Of course, you can always navigate to that forked repository, and check for “commits ahead” and “commits behind”, but that’s really painful.

Apparently there are others that think such a feature would be useful

but the functionality is still not built-in. Both tickets link to helper applications or scripts that might perform the task, though.

So I came across Active GitHub Forks which lists Stars, Forks and Last Push date for each forked repository of a given base repository. (Their are also other solutions such as this Active GitHub Forks page)

So I forked this repository, had a little look on the GitHub API, I learned that you can query the forks of a given repository, sorted by stars or newest or oldest, but the list of forked repositories does not contain the relevant attributes.

Instead, you have to query each of the forked repositories separately to retrieve their attributes “total_commits”, “ahead_by”, “behind_by”. And each separate query weighs on GitHub’s rate limit. (No wonder then that some of the tools screenscrape rather than query the API)

Visit my solution of Active GitHub Forks and query the forked repositories individually.

Active GitHub Forks screenshot