XML Access Modes, Object Models, and Serialization in .Net

December 22, 2011

When I researched the topic of my recent post about the XmlSerializer I became aware that there are more ways to process XML data in .Net than I knew about. Here is an overview of how you can access XML data in .Net.

XML data

The .Net framework provides the following classes to store XML data in memory:

  • XSD proxy classes

XML serialization

The .Net framework implements a set of dependent classes to serialize and deserialize XML data. Each class in this list can be instantiated using a parameter of the classes below it:

  • filename (string)

XML query and manipulation

Besides the classes named above, other classes exist to query and manipulate XML data:

Putting it all together

 

Data model Access (MSDN summary) Serialization XML Query
XmlDocument “Represents an XML document.”
In-Memory
OuterXml
LoadXml()
Load(XmlReader), Save(XmlWriter)
XPath SelectSingleNode(), SelectNodes();
Linq To Objects;
XPathNavigator CreateNavigator()
XDocument “Represents an XML document.”
In-Memory
Load(XmlReader), Save(XmlWriter),
Parse()
XmlReader CreateReader();
Linq To Xml;
XPathNavigator CreateNavigator()
XSD proxy classes In-Memory XmlSerializer,
Introduction on MSDN
Linq To Objects
XmlReader “a reader that provides fast, non-cached, forward-only access to XML data.” Create(…) GetAttribute(), MoveTo*(),
Read*()
XmlNodeReader “a reader that provides fast, non-cached forward only access to XML data in an XmlNode.” (XmlNode) derived from XmlReader
XPathNavigator “a cursor model for navigating and editing XML data.”
“read-only if created by XPathDocument; editable if created by XmlDocument
(CreateNavigator()) GetAttribute(), Value*,
MoveTo();
XPath Select(), Select*();
XmlReader ReadSubtree()
XPathDocument “a fast, read-only, in-memory representation of an XML document” Constructor() XPathNavigator CreateNavigator()

Conversion and Comparison

MSDN

Converting from XmlDocument to XDocument

Comparing XmlReader to SAX Reader

StackOverflow

Use XDocument as the source for XmlSerializer.Deserialize?

Converting XDocument to XmlDocument and vice versa

XDocument or XMLDocument

XMLSerializer vs XMLReader vs XMLDocument vs XDocument Performance Comparison

Other
XmlDocument versus XDocument versus XmlReader/XmlWriter

Performance: LINQ to XML vs XmlDocument vs XmlReader


Running 32-bit ABCpdf Pro on 64-bit Windows

December 21, 2011

An application I have been developing needed to add professional generation of PDF documents (read: my home-brew translation of HTML to PDF using iTextSharp did not produce decent results). I decided to use ABCpdf which itself uses the MSHTML engine to render HTML and miraculously converts the result to PDF.

The application consists of a web application and a Windows service which both reference a library implementing the HTML generation and the invocation of ABCpdf.

Development and deployment on 32-bit server were successful, but a production system running Windows Server 64-bit caused several errors:

Exception System.DllNotFoundException

Unable to load DLL ‘ABCpdf8-64.dll’. The specified module could not be found (Exception from HRESULT: 0x8007007E)

This error was caused by the service running in 64-bit mode. (The web application had already been switched to 32-bit only by restricting the application pool to 32-bit execution). After setting the service exe’s “Platform target” property to x86, the error went away.

Exception WebSupergoo.ABCpdf8.Internal.PDFException

Cannot activate MSHtml engine. Please refer to documentation for more information.

The solution for this exception is to set the AppPool’s “Load User Profile” property to true in the Advanced Settings dialog.

After applying these settings, PDF generation also executed successfully on the 64-bit machine.


Configuring XmlSerializer to reproduce XDocument.Save format

December 21, 2011

Clean-up job in a recent project that uses XML to store certain data. What I found:

  • an XSD describing the XML schema
  • C# proxy classes previously generated by the VS xsd.exe tool
  • the actual XML files
  • all of them not always in sync

The code reading and writing the XML files used LinqToXml to query the files and manually instantiate the XSD-generated classes.

Since the purpose of the proxy classes is to simplify read, write and query operations on statically typed classes, I replaced the Xml.Linq with compile-time type-safe object operations.

In order to verify that my changes did not corrupt the file or change any of the data, I needed to compare the original .xml files and the ones generated by XmlSerializer. I noticed differences between the two sets caused by the classes that wrote the files:

  • XDocument.Root.Save does not generate namespace declarations, and writes a <element></element> closing tag for empty data.
  • XmlSerializer does not write an explicit closing tag but uses the <element /> notation, and generates some namespace declarations in the <?xml> header.

I found the first hints of dealing with the closing tag here and here by implementing an XmlTextWriter, and the final code looks like this:

class MyXmlTextWriter : XmlTextWriter
{
  public MyXmlTextWriter(Stream w)
    : base(w, Encoding.UTF8)
  {
    Indentation = 2;
    IndentChar = ' ';
    Formatting = System.Xml.Formatting.Indented;
  }

  public override void WriteEndElement()
  {
    if (this.WriteState == System.Xml.WriteState.Element)
      Formatting = System.Xml.Formatting.None;
    base.WriteFullEndElement();
    Formatting = System.Xml.Formatting.Indented;
  }
}

To omit the namespace declarations, one needs to provide an XmlSerializerNamespaces object for the XmlSerializer:

XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add(string.Empty, string.Empty);
ns.Add(string.Empty, "http://www.w3.org/2001/XMLSchema-instance");
ns.Add(string.Empty, "http://www.w3.org/2001/XMLSchema");

using (var fs = new FileStream(filename, FileMode.Create, FileAccess.Write))
  serializer.Serialize(new MyXmlTextWriter(fs), app, ns);

Open Data

December 9, 2011

When I developed my YuJisho online dictionary web application, I was looking for freely available fonts and dictionary data related to CJK languages.

For my dbscript database schema management application, I tried to find as many database schema samples as possible to test the application against.

There is a lot of data (raw, processed and visualized) available on the Internet, but occasionally it is hard to find. This raised the idea of providing a collection of references to free data sets on the web like the Guardian Data Store, and I was thinking about a platform to provide such links.

Now news is out that data.gov plan to release their platform as open source software (GitHub), but the code is still labeled as alpha. (data.gov HTML says it is based on Socrata, which also provides lots of links to open data).

Let me know what’s your experience with OpenData, or similar platforms.


xkcd: “Wisdom of the Ancients”

December 6, 2011

If you already spent hours searching the web for an obscure error message (and end up with NO solution), let me tell: You are not alone ;)


Blocking Rogue Bots in IIS7

December 2, 2011

Some crawlers, bots and spiders choose to ignore a robots.txt file causing high CPU usage and bandwidth for your hosted app.

You can block them using the IIS7 URL Rewrite 2 (download) module.

User-agent blocking can be configured per site (answer on SO) or per server.

For per-site blocking, simply add to the <configuration> section of your web.config:

<system.webServer>
  <rewrite>
    <rules>
      <rule name="Rule1" stopProcessing="true">
        <match url=".*" />
        <conditions>
          <add input="{HTTP_USER_AGENT}" pattern="MyNastySpiderName" />
        </conditions>
        <action type="CustomResponse" statusCode="403"
          statusReason="Forbidden: Access is denied."
          statusDescription="You do not have permission to view this page." />
        </rule>
      </rules>
  </rewrite>
</system.webServer>


Follow

Get every new post delivered to your Inbox.