The great Database Diagram mystery

You may (or may not) know that the Database Diagrammer of SSMS stores the created diagrams in a table called dbo.sysdiagrams.

This table has always been a mystery to me, especially the column “definition”, as it contains binary data, and documentation cannot be found about its format.

So when I tried to figure out what all the bytes meant, and my standard editor plus hex viewer plug-in (read: Notepad++) could not help me any further, I searched and found FlexHEX which promised to provide the functionality to annotate and structure binary data. Somehow this did not work out as expected – I could not figure out how to create and edit “structures” – but as I clicked around, I hit the menu “Open OLE Compound File…” as a last resort. What could go wrong.

flexhex compound file.png

To my great surprise, this command actually opened the file and displayed a file structure, with each of the sections containing some data.

As it turned out, the section DSREF-SCHEMA-CONTENTS was the one I was looking for, as it contains, as I later found out, all the textual information in the diagram:

  •  (original) diagram name
  • connection string
  • list of all tables in the diagram (as of last save)

flexhex data diagram.png

This section is stored as a Microsoft Compound File or Compound File Binary Format or Compound Document File, and the easiest way to process it in C# is the nuget package OpenMcdf (source on GitHub).

Writing a small tool, I verified that the sections from the binary data really have those names, and are not artifacts of the hex editor.

Once I got hold of the data contained in section DSREF-SCHEMA-CONTENTS, making sense of the structure of this data was quite straight-forward.

SQL Server Sysobjects Types (Version 2019)

MS SQL Server stores the information about the objects (tables, views, functions, etc) stored in the database in a table, which is accessible by selecting from the sys.objects view. The type of object is given by the column “type”

I collected this cross-version information from Microsoft Docs online, but fixed a couple of historical mistakes there.

Object type: 2008 2012 2016 2019
sys.objects.type
AF Aggregate function (CLR) x x x x
C CHECK constraint x x x x
D DEFAULT (constraint or stand-alone) x x x x
EC Edge constraint x
ET External Table x x
F FOREIGN KEY constraint x x x x
FN SQL scalar function x x x x
FS Assembly (CLR) scalar function x x x x
FT Assembly (CLR) table-valued function x x x x
IF SQL inline table-valued function x x x x
IT Internal table x x x x
P SQL stored procedure x x x x
PC Assembly (CLR) stored procedure x x x x
PG Plan guide x x x x
PK PRIMARY KEY constraint x x x x
R Rule (old-style, stand-alone) x x x x
RF Replication-filter-procedure x x x x
S System base table x x x x
SN Synonym x x x x
SO Sequence object x x x
SQ Service queue x x x x
TA Assembly (CLR) DML trigger x x x x
TF SQL table-valued-function x x x x
TR SQL DML trigger x x x x
TT Table type x x x x
U Table (user-defined) x x x x
UQ UNIQUE constraint x x x x
V View x x x x
X Extended stored procedure x x x x

The sys.objects.type_desc column contains the description corresponding to the type:

type sys.objects.type_desc
AF AGGREGATE_FUNCTION
C CHECK_CONSTRAINT
D DEFAULT_CONSTRAINT
EC
ET
F FOREIGN_KEY_CONSTRAINT
FN SQL_SCALAR_FUNCTION
FS CLR_SCALAR_FUNCTION
FT CLR_TABLE_VALUED_FUNCTION
IF SQL_INLINE_TABLE_VALUED_FUNCTION
IT INTERNAL_TABLE
P SQL_STORED_PROCEDURE
PC CLR_STORED_PROCEDURE
PG PLAN_GUIDE
PK PRIMARY_KEY_CONSTRAINT
R RULE
RF REPLICATION_FILTER_PROCEDURE
S SYSTEM_TABLE
SN SYNONYM
SO SEQUENCE_OBJECT
SQ SERVICE_QUEUE
TA CLR_TRIGGER
TF SQL_TABLE_VALUED_FUNCTION
TR SQL_TRIGGER
TT TYPE_TABLE
U USER_TABLE
UQ UNIQUE_CONSTRAINT
V VIEW
X EXTENDED_STORED_PROCEDURE

My previous post on this topic only covered SQL Server up to version 2008.

Length of UTF-8 VARCHAR in SQL Server

Foreword

For as long as I can remember, a VARCHAR (or CHAR) was always defined as “1 character equals 1 byte”. Different character sets (code pages) where implemented as COLLATIONs, so that you had basic database support for internationalization.

Then came Unicode, and we got NVARCHAR strings (or NCHAR), where the rule was “1 character equals 2 bytes”, and we could store any text from around the world without bothering with code pages, encodings, etc. The .Net framework brought us the string class with similar features and the world was beautiful.

Then, in 2001, came Unicode 3.1 and needed more space:

For the first time, characters are encoded beyond the original 16-bit codespace or Basic Multilingual Plane (BMP or Plane 0). These new characters, encoded at code positions of U+10000 or higher, are synchronized with the forthcoming standard ISO/IEC 10646-2. For further information, see Article IX, Relation to 10646. Unicode 3.1 and 10646-2 define three new supplementary planes.

These additional planes were immediately supported in SQL Server 2012. From now on, using an *_SC collation, NVARCHARs could be 2 or 4 bytes per character.

In C#, the StringInfo class handles supplementary planes, but it seems, they are still a bit behind:

Starting with the .NET Framework 4.6.2, character classification is based on The Unicode Standard, Version 8.0.0. For the .NET Framework 4 through the .NET Framework 4.6.1, it is based on The Unicode Standard, Version 6.3.0. In .NET Core, it is based on The Unicode Standard, Version 8.0.0.

(For the record, the current Unicode version is 12.1, and 13.0 is going to be released soon)

UTF-8 Collations

So now SQL Server 2019 supports UTF-8-enabled collations.

A question on SO quoted the documentation as

A common misconception is to think that CHAR(n) and VARCHAR(n), the n defines the number of characters. But in CHAR(n) and VARCHAR(n) the n defines the string length in bytes (0-8,000). n never defines numbers of characters that can be stored

(emphasis mine) which confused me a little bit, and the quote continues

The misconception happens because when using single-byte encoding, the storage size of CHAR and VARCHAR is n bytes and the number of characters is also n.

(emphasis mine).

This got me investigating, and I had a look into this issue. I create a UTF8-enabled database with a table with all kinds of N/VARCHAR columns

CREATE DATABASE [test-sc] COLLATE Latin1_General_100_CI_AI_KS_SC_UTF8

CREATE TABLE [dbo].[UTF8Test](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [VarcharText] [varchar](50) COLLATE Latin1_General_100_CI_AI NULL,
  [VarcharTextSC] [varchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC NULL,
  [VarcharUTF8] [varchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC_UTF8 NULL,
  [NVarcharText] [nvarchar](50) COLLATE Latin1_General_100_CI_AI_KS NULL,
  [NVarcharTextSC] [nvarchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC NULL,
  [NVarcharUTF8] [nvarchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC_UTF8 NULL
)

I inserted test data from various Unicode ranges

INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES ('a','a','a','a','a','a')
INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES ('ö','ö','ö',N'ö',N'ö',N'ö')
-- U+56D7
INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES (N'囗',N'囗',N'囗',N'囗',N'囗',N'囗')
-- U+2000B
INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES (N'𠀋',N'𠀋',N'𠀋',N'𠀋',N'𠀋',N'𠀋')

then selected the lengths and data lengths of each text field

SELECT TOP (1000) [Id]
    ,[VarcharText],[VarcharTextSC],[VarcharUTF8]
    ,[NVarcharText],[NVarcharTextSC],[NVarcharUTF8]
FROM [test-sc].[dbo].[UTF8Test]
SELECT TOP (1000) [Id]
    ,LEN([VarcharText]) VT,LEN([VarcharTextSC]) VTSC
    ,LEN([VarcharUTF8]) VU
    ,LEN([NVarcharText]) NVT,LEN([NVarcharTextSC]) NVTSC
    ,LEN([NVarcharUTF8]) NVU
FROM [test-sc].[dbo].[UTF8Test]
SELECT TOP (1000) [Id]
    ,DATALENGTH([VarcharText]) VT,DATALENGTH([VarcharTextSC]) VTSC
    ,DATALENGTH([VarcharUTF8]) VU
    ,DATALENGTH([NVarcharText]) NVT,DATALENGTH([NVarcharTextSC]) NVTSC
    ,DATALENGTH([NVarcharUTF8]) NVU
FROM [test-sc].[dbo].[UTF8Test]

Select Lengths.png

I was surprised to find that the old mantra “a VARCHAR only stores single byte characters” needs to be revised when using UTF8 collations.

Table data only

Note that only table columns are associated with collations, but not T-SQL variables, as you cannot declare a collation on a variable

SELECT @VarcharText = [VarcharText],@NVarcharText = [NVarcharText]
FROM [test-sc].[dbo].[UTF8Test]
WHERE [Id] = 4
SELECT @VarcharText, Len(@VarcharText), DATALENGTH(@VarcharText)
    , @NVarcharText, Len(@NVarcharText), DATALENGTH(@NVarcharText)

SELECT @VarcharText = [VarcharTextSC], @NVarcharText = [NVarcharTextSC]
FROM [test-sc].[dbo].[UTF8Test]
WHERE [Id] = 4
SELECT @VarcharText, Len(@VarcharText), DATALENGTH(@VarcharText)
    , @NVarcharText, Len(@NVarcharText), DATALENGTH(@NVarcharText)

SELECT @VarcharText = [VarcharUTF8], @NVarcharText = [NVarcharUTF8]
FROM [test-sc].[dbo].[UTF8Test]
WHERE [Id] = 4
SELECT @VarcharText, Len(@VarcharText), DATALENGTH(@VarcharText)
    , @NVarcharText, Len(@NVarcharText), DATALENGTH(@NVarcharText)

 

Select Variable Lengths.png

 

Retrieving Active GitHub Forks

The problem with GitHub repositories is that, whenever there exists forks for a given repository, it is not immediately obvious which forks are active, containing newer commits, and which forks are just-for-fun forks, which should have been Starred rather than Forked.

Of course, you can always navigate to that forked repository, and check for “commits ahead” and “commits behind”, but that’s really painful.

Apparently there are others that think such a feature would be useful

but the functionality is still not built-in. Both tickets link to helper applications or scripts that might perform the task, though.

So I came across Active GitHub Forks which lists Stars, Forks and Last Push date for each forked repository of a given base repository. (Their are also other solutions such as this Active GitHub Forks page)

So I forked this repository, had a little look on the GitHub API, I learned that you can query the forks of a given repository, sorted by stars or newest or oldest, but the list of forked repositories does not contain the relevant attributes.

Instead, you have to query each of the forked repositories separately to retrieve their attributes “total_commits”, “ahead_by”, “behind_by”. And each separate query weighs on GitHub’s rate limit. (No wonder then that some of the tools screenscrape rather than query the API)

Visit my solution of Active GitHub Forks and query the forked repositories individually.

Active GitHub Forks screenshot

Updating YuJisho: a Unicode CJK Character web dictionary

My online CJK dictionary YuJisho got a facelift again – this time, from ASP.Net MVC3 to the current MVC 5, and from Bootstrap 2 to Bootstrap 4.

I hope this gets rid of the messages in the Google Search Console 😉 :

  •  Text too small to read
  • Clickable elements too close together
  • Viewport not set

There is a little change though: In times of GDPR &co, queries to Wikipedias and Wiktionaries need to invoked by clicking the “Query Wikipedia” button, rather than querying automatically.

click the button

click the button “Query Wikipedia”

results in links to various Wikipedias containing an article

If your browser / operating system fails to display certain Chinese characters, there is now a button “Load Glyphs” which tries to load the unsupported characters’ images as .svg from GlyphWiki.

after load glyphs

after load glyphs u

Please check the About page for more information.

Essential NuGet packages for ASP.Net MVC

Here are some useful packages I came across in recent years:

Topic Package NuGet Source
Logging
log4net nuget apache
Error Logger
elmah.corelibrary nuget github
Elmah.Mvc nuget github
ElmahAppender.log4net nuget github
Profiling
Glimpse nuget github
Glimpse.AspNet nuget
Glimpse.Mvc5 nuget
NHibernate.Glimpse nuget github
Minifier, Bundler
BundleTransformer.Core nuget github
BundleTransformer.Yui nuget github
Data Access
Nhibernate nuget github
Identity Management
NHibernate.AspNet.Identity nuget github
Templating
Handlebars.Net nuget github
RazorEngine nuget github
Documents
EPPlus nuget github
CsvHelper nuget github
PdfSharp nuget pdfsharp
HtmlRenderer.PdfSharp nuget github
Other
DocuVieware Lite
HTML5 Document Viewer For ASP.NET
homepage

SMOscript for Windows 10

I recently got notified that my tool SMOscript raises an error message when run on Windows 10.

I had a look at the problem, and indeed, as soon as the enumeration of database objects (e.g. using commands l (list) or s (script)) is to process a table, the program stop. In case of the script command, an error message is issued

Error: iterating objects: Attempt to retrieve data for object failed for Server ”.

The most recent version of SMOscript was still targeted on .Net 2.0, and used the SMO 100 assemblies. Time to upgrade! 😉

So I changed the .Net version to 4.5 and added the nuget SMO package and changed the .Net version to 4.7 to make the compiler happy.

Tests were performed successfully against SQL Server 2014 and 2019, but now SQL Server 2008 caused an error that I could work-around.

The tool now also supports options to sort database objects by schema and name, or by type and schema and name, rather than relying on the order resulting from the SMO method Database.EnumObjects().

Additionally, the -st (Sort Types) parameter allows to specify a custom sort order for database object types (rather than alphabetically), so that the parameter

smoscript -st Table,View

will process objects of types table and view first, and everything else afterwards. The parameter -st without a list of types will display the available types.

The latest version of SMOscript is available for download here.

Note that there is a regular build “V 0.40”, but also a build using the original SMO 100 assemblies.

smoscript 0.40 command-line parameters

smoscript 0.40 sort SMO Types

Database.EnumObjects() fails on SQL Server 2008

After replacing the SMO 100 assemblies (=SQL 2008) with the SMO 150 (=SQL 2019) nuget package, things worked for 2014 and 2019 databases.

But not for 2008 anymore: Calling Database.EnumObjects() now raised a FailedOperationException:

Microsoft.SqlServer.Management.Smo.FailedOperationException: Enum objects failed for Database ‘myDatabase’.
—> Microsoft.SqlServer.Management.Sdk.Sfc.InvalidVersionEnumeratorException: Operation not supported on version 10.0.5538.
at Microsoft.SqlServer.Management.Smo.XmlReadDoc.LoadInternal()
at Microsoft.SqlServer.Management.Smo.XmlReadDoc.LoadFile(Assembly a, String strFile)
at Microsoft.SqlServer.Management.Smo.SqlObject.LoadInitDataFromAssemblyInternal(Assembly assemblyObject, String file, ServerVersion ver, String alias, StringCollection requestedFields, Boolean store, StringCollection roAfterCreation, DatabaseEngineType databaseEngineType, DatabaseEngineEdition databaseEngineEdition, Stream configXml)
at Microsoft.SqlServer.Management.Smo.SqlObject.LoadInitData(String file, ServerVersion ver, DatabaseEngineType databaseEngineType, DatabaseEngineEdition databaseEngineEdition)
at Microsoft.SqlServer.Management.Sdk.Sfc.ObjectCache.LoadElement(ObjectLoadInfo oli, ServerVersion ver, DatabaseEngineType databaseEngineType, DatabaseEngineEdition databaseEngineEdition)
at Microsoft.SqlServer.Management.Sdk.Sfc.ObjectCache.GetElement(ObjectLoadInfo oli, ServerVersion ver, DatabaseEngineType databaseEngineType, DatabaseEngineEdition databaseEngineEdition)
at Microsoft.SqlServer.Management.Sdk.Sfc.ObjectCache.GetAllElements(Urn urn, ServerVersion ver, DatabaseEngineType databaseEngineType, DatabaseEngineEdition databaseEngineEdition, Object ci)
at Microsoft.SqlServer.Management.Sdk.Sfc.Environment.GetObjectsFromCache(Urn urn, Object ci)
at Microsoft.SqlServer.Management.Sdk.Sfc.Environment.GetData(Request req, Object ci)
at Microsoft.SqlServer.Management.Sdk.Sfc.Enumerator.GetData(Object connectionInfo, Request request)
at Microsoft.SqlServer.Management.Smo.Database.EnumObjects(DatabaseObjectTypes types, SortOrder order)
— End of inner exception stack trace —
at Microsoft.SqlServer.Management.Smo.Database.EnumObjects(DatabaseObjectTypes types, SortOrder order)
at Microsoft.SqlServer.Management.Smo.Database.EnumObjects()

However, iterating over the various SMO collections (Database.Tables, Database.Views, etc.) still works!

SMOscript tries to deal with this situation by handling the exception and calling the original enumeration code from back-in-the-old-days to retrieve the selected SMO objects.

SMO and .Net Frameworks and nuget

As I upgraded my tool SMOscript to .Net version 4.5, things compiled and ran nicely again.

Only after a while I noticed the Build warnings

1>C:\Program Files\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets(2110,5): warning MSB3274: The primary reference “Microsoft.SqlServer.Assessment, Version=1.100.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL” could not be resolved because it was built against the “.NETFramework,Version=v4.6.1” framework. This is a higher version than the currently targeted framework “.NETFramework,Version=v4.5”.
1>C:\Program Files\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets(2110,5): warning MSB3275: The primary reference “Microsoft.SqlServer.Management.Assessment, Version=15.100.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL” could not be resolved because it has an indirect dependency on the assembly “Microsoft.SqlServer.Assessment, Version=1.100.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91” which was built against the “.NETFramework,Version=v4.6.1” framework. This is a higher version than the currently targeted framework “.NETFramework,Version=v4.5”.
1>C:\Program Files\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets(2110,5): warning MSB3274: The primary reference “Microsoft.SqlServer.Management.Assessment, Version=15.100.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL” could not be resolved because it was built against the “.NETFramework,Version=v4.6.2” framework. This is a higher version than the currently targeted framework “.NETFramework,Version=v4.5”.
1>C:\Program Files\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets(2110,5): warning MSB3274: The primary reference “Microsoft.SqlServer.Management.SqlParser, Version=15.100.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL” could not be resolved because it was built against the “.NETFramework,Version=v4.6.2” framework. This is a higher version than the currently targeted framework “.NETFramework,Version=v4.5”.

So whatever functionality is implemented in the Management and Assessment assemblies, they come with the nuget package (latest version 150.18208.0 in my case) and are also copied to the project output.

Apparently the package definition contains the requirement for Framework “>= 4.6.0”, allows installation in a 4.5 project, but really requires 4.6.2. Reminds me of the dependency inconsistencies back in 2013.

Anyway, upgrading the project to .Net 4.7 got rid of these warnings.