I created two Python scripts to extract German IPA data from German and English Wiktionary dumps.
See my posts
and the corresponding GitHub repository german-ipa-dict
.
I created two Python scripts to extract German IPA data from German and English Wiktionary dumps.
See my posts
and the corresponding GitHub repository german-ipa-dict
.
I recently got github’s email on their changes to authentication tokens, but of course ignored as long as I could.
We recently updated the format of our API authentication tokens, providing additional security benefits to all our customers.
So today was the day to update my local authentication token in TortoiseGit.
I headed over to github’s Personal Access Token page to generate a new token.
But it was not really clear how to add it as authentication for repositories existing on my local PC.
After some google-fu, I opened the TortoiseGit Settings dialog in the root directory of one of my repositories,
changed the Credential helper to Advanced, and entered my github username.
The next TortoiseGit/Push… command open an authentication dialog where I copied the github token in the password field.
After the push was completed, I could verify that the new token was added in Windows under Control Panel / Accounts / Credential Manager / Windows Credentials
I liked the Reference Assistant extension very much, but unfortunately it only works for Visual Studio versions 2010 through 2013, and I always wanted to adapt the extension to later versions of Visual Studio.
Now I did.
So I forked the original code from the repository on GitHub (really originally hosted on CodePlex).
Since I had no experience in working with the VSSDK, I created a dummy project and clicked my way through it, and tried to figure out, where the original code hooked into VS, and how to migrate that code from 2010 to 2017.
One of the main obstacles was the setup of the Options page, which seems to have changed fundamentally. Fortunately, the Extensibility Samples also contain a project named Options covering the Tools->Options dialog.
Of course, migration also means upgrading .Net versions (implied by creating a VS2017 extension project), replacing assembly references with NuGet packages, and a little bit of tuning and tweaking.
You can find the upgraded Reference Assistant on GitHub, the first release version is here.
What I like about this tool is that is lets you preview the changes it is going to make, and exclude references from removal you know are required, as the tool does not consider occasional implied references and assemblies referenced in .config files.
To import data from MySQL into a MS SQL database, we need to first analyze the output of mysqldump, the backup utility of MySQL. I have covered the most obvious differences to T-SQL in my previous post.
Fortunately, the output of mysqldump is line-oriented, i.e. every CREATE TABLE statement, table column or index, and INSERT statement is in a separate line.
In a brute-force approach, we can therefore parse each line using regular expressions, and do not require full parsing involving tokenizer and grammar.
I found a couple of interesting points that need to be considered when generating T-SQL statements:
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
Some indexes may require manually editing column collations, e.g. for unique case-sensitive MySQL indexes.
When executing T-SQL statements in SSMS, a number of warnings or errors can occur:
There is insufficient system memory in resource pool ‘default’ to run this query
so we break up the batch using “GO”
MS SQL Server does not support ON DELETE CASCADE clauses such that the grandchild record of a parent-child-grandchild relation cannot be uniquely identified. Creating such a FOREIGN KEY raises the error message
Msg 1785, Level 16, State 0, Line 1
Introducing FOREIGN KEY constraint ‘fk_xxx’ on table ‘xxx’ may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
In such cases, either drop the ON DELETE CASCADE if possible, and replace the deletion cascade with an INSTEAD OF DELETE trigger.
I developed mysqldump2mssql, a prototype implementing these rules, which is available on my GitHub. The program has been developed and tested against MySQL dumps of MediaWiki, Joomla, Bugzilla and Nextcloud.
The tool generates up to 4 files for each table defined in the source SQL file:
The filenames of the generated files consist of the sequence number of execution, the table name, and one of the endings mentioned in above table.
The problem with GitHub repositories is that, whenever there exists forks for a given repository, it is not immediately obvious which forks are active, containing newer commits, and which forks are just-for-fun forks, which should have been Starred rather than Forked.
Of course, you can always navigate to that forked repository, and check for “commits ahead” and “commits behind”, but that’s really painful.
Apparently there are others that think such a feature would be useful
but the functionality is still not built-in. Both tickets link to helper applications or scripts that might perform the task, though.
So I came across Active GitHub Forks which lists Stars, Forks and Last Push date for each forked repository of a given base repository. (Their are also other solutions such as this Active GitHub Forks page)
So I forked this repository, had a little look on the GitHub API, I learned that you can query the forks of a given repository, sorted by stars or newest or oldest, but the list of forked repositories does not contain the relevant attributes.
Instead, you have to query each of the forked repositories separately to retrieve their attributes “total_commits”, “ahead_by”, “behind_by”. And each separate query weighs on GitHub’s rate limit. (No wonder then that some of the tools screenscrape rather than query the API)
Visit my solution of Active GitHub Forks and query the forked repositories individually.