Length of UTF-8 VARCHAR in SQL Server

February 21, 2020

Foreword

For as long as I can remember, a VARCHAR (or CHAR) was always defined as “1 character equals 1 byte”. Different character sets (code pages) where implemented as COLLATIONs, so that you had basic database support for internationalization.

Then came Unicode, and we got NVARCHAR strings (or NCHAR), where the rule was “1 character equals 2 bytes”, and we could store any text from around the world without bothering with code pages, encodings, etc. The .Net framework brought us the string class with similar features and the world was beautiful.

Then, in 2001, came Unicode 3.1 and needed more space:

For the first time, characters are encoded beyond the original 16-bit codespace or Basic Multilingual Plane (BMP or Plane 0). These new characters, encoded at code positions of U+10000 or higher, are synchronized with the forthcoming standard ISO/IEC 10646-2. For further information, see Article IX, Relation to 10646. Unicode 3.1 and 10646-2 define three new supplementary planes.

These additional planes were immediately supported in SQL Server 2012. From now on, using an *_SC collation, NVARCHARs could be 2 or 4 bytes per character.

In C#, the StringInfo class handles supplementary planes, but it seems, they are still a bit behind:

Starting with the .NET Framework 4.6.2, character classification is based on The Unicode Standard, Version 8.0.0. For the .NET Framework 4 through the .NET Framework 4.6.1, it is based on The Unicode Standard, Version 6.3.0. In .NET Core, it is based on The Unicode Standard, Version 8.0.0.

(For the record, the current Unicode version is 12.1, and 13.0 is going to be released soon)

UTF-8 Collations

So now SQL Server 2019 supports UTF-8-enabled collations.

A question on SO quoted the documentation as

A common misconception is to think that CHAR(n) and VARCHAR(n), the n defines the number of characters. But in CHAR(n) and VARCHAR(n) the n defines the string length in bytes (0-8,000). n never defines numbers of characters that can be stored

(emphasis mine) which confused me a little bit, and the quote continues

The misconception happens because when using single-byte encoding, the storage size of CHAR and VARCHAR is n bytes and the number of characters is also n.

(emphasis mine).

This got me investigating, and I had a look into this issue. I create a UTF8-enabled database with a table with all kinds of N/VARCHAR columns

CREATE DATABASE [test-sc] COLLATE Latin1_General_100_CI_AI_KS_SC_UTF8

CREATE TABLE [dbo].[UTF8Test](
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [VarcharText] [varchar](50) COLLATE Latin1_General_100_CI_AI NULL,
  [VarcharTextSC] [varchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC NULL,
  [VarcharUTF8] [varchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC_UTF8 NULL,
  [NVarcharText] [nvarchar](50) COLLATE Latin1_General_100_CI_AI_KS NULL,
  [NVarcharTextSC] [nvarchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC NULL,
  [NVarcharUTF8] [nvarchar](50) COLLATE Latin1_General_100_CI_AI_KS_SC_UTF8 NULL
)

I inserted test data from various Unicode ranges

INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES ('a','a','a','a','a','a')
INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES ('ö','ö','ö',N'ö',N'ö',N'ö')
-- U+56D7
INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES (N'囗',N'囗',N'囗',N'囗',N'囗',N'囗')
-- U+2000B
INSERT INTO [dbo].[UTF8Test] ([VarcharText],[VarcharTextSC],[VarcharUTF8],[NVarcharText],[NVarcharTextSC],[NVarcharUTF8])
    VALUES (N'𠀋',N'𠀋',N'𠀋',N'𠀋',N'𠀋',N'𠀋')

then selected the lengths and data lengths of each text field

SELECT TOP (1000) [Id]
    ,[VarcharText],[VarcharTextSC],[VarcharUTF8]
    ,[NVarcharText],[NVarcharTextSC],[NVarcharUTF8]
FROM [test-sc].[dbo].[UTF8Test]
SELECT TOP (1000) [Id]
    ,LEN([VarcharText]) VT,LEN([VarcharTextSC]) VTSC
    ,LEN([VarcharUTF8]) VU
    ,LEN([NVarcharText]) NVT,LEN([NVarcharTextSC]) NVTSC
    ,LEN([NVarcharUTF8]) NVU
FROM [test-sc].[dbo].[UTF8Test]
SELECT TOP (1000) [Id]
    ,DATALENGTH([VarcharText]) VT,DATALENGTH([VarcharTextSC]) VTSC
    ,DATALENGTH([VarcharUTF8]) VU
    ,DATALENGTH([NVarcharText]) NVT,DATALENGTH([NVarcharTextSC]) NVTSC
    ,DATALENGTH([NVarcharUTF8]) NVU
FROM [test-sc].[dbo].[UTF8Test]

Select Lengths.png

I was surprised to find that the old mantra “a VARCHAR only stores single byte characters” needs to be revised when using UTF8 collations.

Table data only

Note that only table columns are associated with collations, but not T-SQL variables, as you cannot declare a collation on a variable

SELECT @VarcharText = [VarcharText],@NVarcharText = [NVarcharText]
FROM [test-sc].[dbo].[UTF8Test]
WHERE [Id] = 4
SELECT @VarcharText, Len(@VarcharText), DATALENGTH(@VarcharText)
    , @NVarcharText, Len(@NVarcharText), DATALENGTH(@NVarcharText)

SELECT @VarcharText = [VarcharTextSC], @NVarcharText = [NVarcharTextSC]
FROM [test-sc].[dbo].[UTF8Test]
WHERE [Id] = 4
SELECT @VarcharText, Len(@VarcharText), DATALENGTH(@VarcharText)
    , @NVarcharText, Len(@NVarcharText), DATALENGTH(@NVarcharText)

SELECT @VarcharText = [VarcharUTF8], @NVarcharText = [NVarcharUTF8]
FROM [test-sc].[dbo].[UTF8Test]
WHERE [Id] = 4
SELECT @VarcharText, Len(@VarcharText), DATALENGTH(@VarcharText)
    , @NVarcharText, Len(@NVarcharText), DATALENGTH(@NVarcharText)

 

Select Variable Lengths.png

 


VS Solution Dependency Visualizer supports .NET Core Project Files

February 5, 2020

When I tried to analyze an unknown Visual Studio solution in my tool VS Solution Dependency Visualizer, I noticed that the project dependencies were not analyzed correctly. In fact, the dependencies were completely ignored.

A bit of research turned up that the solution in question was partly migrated to .NET Core, and the .csproj project files have a slightly different structure than good old .Net project files.

First, the classic .Net project files use a namespace

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

whereas .NET Core project files come without XML namespace, and have the root attribute Sdk:

<Project Sdk="Microsoft.NET.Sdk">

Also, files are not compiled if they are explicitly contained in the .csproj file, but rather by a rule-based Default Compilation mechanism which decides whether files in or below the project directory are compiled, embedded, or ignored.

For a detailed documentation of .NET Core project files, see MS docs or the dotnet GitHub docs.

The latest version of VS Solution Dependency Visualizer with support for .NET Core projects is available for download here.

BTW: In previous versions of this tool, I tried to register VS Solution Dependency Visualizer as an External Tool in all installed versions of Visual Studio.

However, things change all the time, and VS 2013 was the first version that did not support registration via registry, and with VS 2017 things got even more different, and finally I had to give up.

Instructions on how to add VS Solution Dependency Visualizer as an External Tool in your version of Visual Studio can be found in the readme file that comes with the installer.


Handling the “Size Limit” in AD Queries

February 10, 2017

I created a small tool to mirror AD data into an SQL Server database. The AD queries essentially looked like this

var conn = new ADODB.Connection();
conn.Open("Provider=ADsDSOObject", "", "", 0);

const string where = "objectCategory='group' ";      

var qry = string.Format(@"SELECT objectCategory, displayName, [more attributes]
FROM 'LDAP://{0}/{1}' 
WHERE {2}", server, start, where);

object recs;
var rs = conn.Execute(qry, out recs, 0);

for (; !rs.EOF; rs.MoveNext())
{
    // ... process record
}

The attributes available in AD have been taken from here.

The code worked fine for many months, until one day it threw an exception:

System.Runtime.InteropServices.COMException (0x80072023): The size limit for this request was exceeded.
at ADODB.RecordsetClass.MoveNext()

It turned out that the query result suddenly exceeded 3000 records, which may or may not be a magic or configurable limit of fetches for a single AD query – probably also including the number of records each fetch returns. Who knows.

Thanks goes to the internetz, which provided me with a solution which now fetches more than 3000 records. Just replace the conn.Execute() part with

var cmd = new ADODB.Command {ActiveConnection = conn, CommandText = qry};
cmd.Properties["Page Size"].Value = 10000;
cmd.Properties["Timeout"].Value = 30;
cmd.Properties["Cache Results"].Value = false;

Fixing PDFSharp hangs

January 12, 2017

To analyse a couple of PDF files whether they contain only images, I used the latest release build of PDFsharp, version 1.32.

However, when processing a certain file (of unknown origin) using code found in an SO answer

public static IEnumerable ExtractText(this PdfPage page)
{       
    var content = ContentReader.ReadContent(page);      
    var text = content.ExtractText();
    return text;
}   

the ExtractText() function simply would not return.

I upgraded to the most current build 1.50 beta 3, included the source in my project, and ran it in Debug mode, where execution halted in the file PDFsharp\src\PdfSharp\Pdf.Content\CParser.cs line 163 failing an assertion:

#if DEBUG
    default:
        Debug.Assert(false);
        break;
#endif

Without digging too deep into the analysis of PDF files, it was clear that the PDF contained a CSymbol that is not being handled by the library, and thus (most likely) ended up in an infinite loop inside CParser.ParseObject().

I fixed this by replacing the Debug.Assert statement with

        throw new Exception("unhandled PDF symbol " + symbol);

which fixed the situation for me.


Unit Testing ASP.Net MVC 5 Controllers using Rhino Mocks

February 12, 2016

Unit Testing ASP.Net MVC Controllers is not as straight-forward as you may thing, because as you get started, you face questions such as: how to create an HttpContext, an HttpContextBase, should you prefer mocking, and which mocking framework, etc.

After a day of googling and tinkering, I finally came up with a working solution using Rhino.Mocks and MvcContrib, both downloaded via nuget. To analyze Owin black magic, I used the source code repositories of Katana on Codeplex and symbolsource.

[TestClass()]
public class HomeIndexControllerTests
{
  [TestMethod()]
  public void HomeIndexTest()
  {

The authentication routine uses async calls, such as FindByNameAsync(). I tried to declare the test method as public async Task, but after starting the test, the IDE claimed it’s busy, but did not execute the test.

Without Task.Run(), the async method would fail throwing an AggregateException.

I found the work-around to encapsulate the whole test inside a Task.Run():

    Task.Run(() =>
    {
      var builder = new TestControllerBuilder();

First, we mock the ApplicationUserManager

      var us = MockRepository.GenerateStub<IUserStore<AppUser>>();
      var aum = MockRepository.GenerateStub<ApplicationUserManager>(us);

My class for storing users in the user database is called AppUser. The variable appuser is set to null for unauthenticated requests, and contains a valid object if authenticated:

      AppUser appuser = null;
      //appuser = new AppUser { .... values .... };  // uncomment for authenticated request

These stubs return the defined user object:

      us.Stub(u => u.FindByNameAsync(""))
        .IgnoreArguments()
        .Return(Task.FromResult(appuser));
      aum.Stub(m => m.FindByNameAsync((ClaimsIdentity)null, ""))
        .IgnoreArguments()
        .Return(Task.FromResult(appuser));

Create the Owin context and register the user manager:

      var owin = new OwinContext();
      owin.Set(aum);

Use the builder to create the controller under test:

      var controller = builder.CreateController<C.Home.IndexController>();

To test unauthenticated requests, we check whether the class or the method have an [Authorize] attribute

      if (appuser == null)
      {
        var type = controller.GetType();
        var attributes = type.GetCustomAttributes(typeof(AuthorizeAttribute), true);
        var methodInfo = type.GetMethod("Execute" 
          /*, new Type[] { ....parameter types.... } */
        );
        var methodAttributes = methodInfo.GetCustomAttributes(typeof(AuthorizeAttribute), true);
        Assert.IsFalse(attributes.Any() || methodAttributes.Any(), "Unauthorized request not allowed");
      }

Next, we wire up the HttpContext and Owin, and set the request’s cookies:

      var context = controller.HttpContext; 
      // the context is already mocked, so we can stub() it

      var dict = new Dictionary<object, object>();
      dict.Add("owin.Environment", owin.Environment);
      context.Stub(c => c.Items).Return(dict);
      var request = controller.Request;
      request.Stub(r => r.Cookies).Return(new HttpCookieCollection());

Next, we create a principal from the user object (see here for setting the name claim)

      var user = new GenericPrincipal(
        new ClaimsIdentity(
          new Claim[] { 
            new Claim(ClaimTypes.Name, appuser == null ? "" : appuser.UserName) }),
        new string[0]);
      context.User = user;

Then we create a fake HttpContext which matches the mocked HttpContextBase object created above:

      HttpContext.Current = new HttpContext(
        new HttpRequest("", "http://tempuri.org", ""),
        new HttpResponse(new StringWriter())
      );
      HttpContext.Current.Items["owin.Environment"] = owin.Environment;
      if (appuser != null)
        HttpContext.Current.User = user;

We are now ready to execute the controller method

      builder.InitializeController(controller);
      var result = controller.Execute();

And check the results: if the request is successful, the result is typically a ViewResult.

      Assert.IsInstanceOfType(result, typeof(RedirectToRouteResult));
      Assert.IsInstanceOfType(result, typeof(ViewResult));

We can now test the result values, and end the Task definition:

    }).GetAwaiter().GetResult();
  }
}

 


NHibernate SELECT MAX()

February 3, 2016

A form used to edit or add records was to set a number field to the highest assigned integer value plus one.

This is typically achieved by writing

SELECT Max(NumberField)+1 FROM [Table]

and in NHibernate you write something like

result.Number = session.Query<Table>()
    .Max(t => t.Number) + 1;

where Number is defined as int.

While this solution is principally correct, it fails if the table has no records:

Server Error in '/' Application.
Value cannot be null.
Parameter name: item
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.ArgumentNullException: Value cannot be null.
Parameter name: item
Stack Trace:
[ArgumentNullException: Value cannot be null.
Parameter name: item]
 System.ThrowHelper.IfNullAndNullsAreIllegalThenThrow(Object value, ExceptionArgument argName) +4195637
 System.Collections.Generic.List`1.System.Collections.IList.Add(Object item) +32
 NHibernate.Util.<>c__DisplayClass4.<AddAll>b__2() +13
 NHibernate.Util.ArrayHelper.AddAll(IList to, IList from) +445
 NHibernate.Engine.Query.HQLQueryPlan.PerformList(QueryParameters queryParameters, ISessionImplementor session, IList results) +573
 NHibernate.Impl.StatelessSessionImpl.List(IQueryExpression queryExpression, QueryParameters queryParameters, IList results) +329
[GenericADOException: Could not execute query[SQL: SQL not available]]
 NHibernate.Impl.StatelessSessionImpl.List(IQueryExpression queryExpression, QueryParameters queryParameters, IList results) +379
 NHibernate.Impl.AbstractSessionImpl.List(IQueryExpression queryExpression, QueryParameters parameters) +145
 NHibernate.Impl.AbstractQueryImpl2.List() +117
 NHibernate.Linq.DefaultQueryProvider.ExecuteQuery(NhLinqExpression nhLinqExpression, IQuery query, NhLinqExpression nhQuery) +36
 NHibernate.Linq.DefaultQueryProvider.Execute(Expression expression) +50
 NHibernate.Linq.DefaultQueryProvider.Execute(Expression expression) +11
 System.Linq.Queryable.Max(IQueryable`1 source, Expression`1 selector) +283

The problem is, of course, that SELECT MAX(NumberField) on an empty table results in a single record containing a NULL value for the computed column, and the NHibernate mapper throws an exception trying to assign the NULL value to the int property.

Additionally, the SELECT statement above should also be written as

SELECT ISNULL(MAX(Number), 0) + 1

to generate 1 for the first record, which already points in the right direction.

The correct NHibernate statement is therefore

result.Number = session.Query<Table>()
    .Max(t => (int?) t.Number) ?? 0 + 1;

casting the C# int property to a Nullable<int>, which accepts the NULL value resulting from MAX(). The ?? operator is the equivalent of T-SQL’s ISNULL() function.

Fundamentally, the behavior is caused be the definition of the Queryable.Max() method:

public static TResult Max<TSource, TResult>(
  this IQueryable<TSource> source, 
  Expression<Func<TSource, TResult>> selector);

If the selector returns int, then the method also returns int. Could the result type not have been declare as Nullable<TResult>?

What does Linq to Objects do? This simple code

IEnumerable<int> values = new int[0];
Console.WriteLine(values.Max());

throws an InvalidOperationException with the message

Sequence contains no elements

as the corresponding Max() method also does not handle Null values. The correct way in Linq to Objects is to call DefaultIfEmpty() to replace an empty enumerable with one containing a single-element default value:

IEnumerable<int> values = new int[0];
values = values.DefaultIfEmpty();
Console.WriteLine(values.Max());

So, I experimented a bit to find a meaningful solution

public static T? max<T>(IEnumerable<T> values) 
  where T: struct, IComparable<T>
{
  T? result = null;
  foreach (var v in values)
    if (!result.HasValue || (v.CompareTo(result.Value) > 0))
      result = v;
  return result;
}
public static T max<T>(IEnumerable<T> values) 
  where T : class, IComparable<T>
{
  T result = null;
  foreach (var v in values)
   if (result==null || (v.CompareTo(result) > 0))
     result = v;
  return result;
}

only to find that

  • C# does not include generic constraints into method signatures, causing the code to not compile
  • Enumerable.Max() and Queryable.Max() are separate extension methods

In fact, Enumerable.Max() (which handles arrays, Lists, and everything IEnumerable) defines various overloads for base types, and one generic overload, whereas Queryable.Max() only defines the generic method. So the code above would have to be changed to base type-specific methods and one generic method.

Of course, the above code for IEnumerable would have no effect on how NHibernate handles the IQueryable.Max() method. This would have to be dealt with using NHibernate extensions.


Unknown Class “ProfileCommon”

January 19, 2016

I was asked to have a look at a Web Site project, and whether it could be converted into a Web Application project. So I created a web application project, copied and added all project files, and after a bit of editing (adding namespaces, etc.) I came across the error:

The type or namespace name ‘ProfileCommon’ could not be found (are you missing a using directive or an assembly reference?)

I soon found out that the class ProfileCommon is being automatically generated during the build process of a Web Site project, and in the meantime a VS BuildTask and a couple lines of code can be found on the internetz which replicate to original code generator.

To understand what it going on, I opened the original web site project, searched for a reference to ProfileCommon, and hit Go To Definition to retrieve the generated code.

As it turns out, the generator simply parses the configuration/system.web/profile/properties section of the web site’s web.config file.

Every item in the section contains a property definition in the form

<add name="[PropertyName]" defaultValue="[value]" 
    type="[.Net Datatype]" />

From this information, the generator creates a public class like this:

using System;
using System.Web;
using System.Web.Profile;
public class ProfileCommon : System.Web.Profile.ProfileBase {

Each declared property is generated as

  public virtual [Datatype] [PropertyName] {
    get {
      return (([Datatype)(this.GetPropertyValue("[PropertyName]")));
    }
    set {
      this.SetPropertyValue("[PropertyName]", value);
    }
  }

and finally

  public virtual ProfileCommon GetProfile(string username) {
    return ((ProfileCommon)(ProfileBase.Create(username)));
  }
}

My guess is that the ProfileBase.GetPropertyValue() method also evaluates the defaultValue clause somehow, but MSDN does not mention this question.