Finding Images and Links in WordPress Export XML

I re-created two of my websites, and needed to know which images on those websites were referenced by my blog posts, and which links in the blog needed to be updated to the new sites.

New tasks for my old tool WP XSL GUI.

I created a couple of XSLT files to handle the tasks:

  • List of Images lists the <img> tags for each blog post that contains images
  • List of Images (embedded) generates the same list, but also included the images directly
  • List of Links lists the <a> tags for each blog post that contains links
  • List of Links (linked) generates the same list, but also includes the hyperlinks

Additional changes include

  • Upgrading the .Net Framework from 2.0 to 4.6
  • Replacing the setup project with Advanced Installer for Visual Studio.
  • Suggesting a default output name depending on the source file name and the transformation option
  • Changing the font to Segoe UI

wpxslgui.104.png

The latest version 1.04 of WP XSL GUI is available for download here.

WordPress XML Export Converter: New Version 1.03

While preparing this blog’s Table of Contents page I noticed that my WordPress tool wpxslgui would not process WP’s Export XML as it did before.

I noticed that the export format had changed from

xmlns:wp="http://wordpress.org/export/1.1/"

to

xmlns:wp="http://wordpress.org/export/1.2/"

Well, if this introduction sounds familiar, you are right.

Unfortunately, this time the change of the wp namespace meant that the wp:category has been dropped as a defined element.

For the Table of Contents xslt, this change means that the categories have to be extracted from the posts’ (<item> elements) <category> child elements. (The other two Xslt file used for generating a Single HTML and a Word HTML file are not affected by this change)

Since the categories occur several times throughout the blog’s XML file, they need to be collected and sorted before outputting them in the result file using the so-called Muenchian method.

To collect the categories, the <xsl:key> element is used:

<xsl:key name="categories" match="/rss/channel/item/category[@domain='category']" use="@nicename" />

The categories are selected and sorted using <xsl:applytemplates>

<xsl:apply-templates 
    select="item[wp:post_type = 'post' and wp:status = 'publish']/category" 
    mode="foo" >
  <xsl:sort order="ascending" select="text()"/>
</xsl:apply-templates>

Only the first of each set of category duplicates is output:

<xsl:template 
    match="item[wp:post_type = 'post' and wp:status = 'publish']/category
       [ generate-id() = generate-id(key('categories', @nicename)[1]) ]" 
    mode="foo">
  <xsl:value-of select="text()"/>
  <xsl:text>
</xsl:text>
</xsl:template>

To avoid outputting the categories’ text() property repeatedly, we need to prevent evaluating the inner text:

  <xsl:template match="text()" mode="foo"></xsl:template>

The main features of wpxslgui remained the same:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

After downloading the latest version of wpxslgui, export your WordPress blog to XML (select “All content”), and convert the file into any of the supported output formats.

WordPress XML Export Converter updated

While preparing this blog’s Table of Contents page I noticed that my WordPress tool wpxslgui would not process WP’s Export XML as it did before.

I noticed that the export format had changed from

xmlns:wp="http://wordpress.org/export/1.0/"

to

xmlns:wp="http://wordpress.org/export/1.1/"

and after fixing the .xsl files, the tool worked again.

I used the opportunity to fix the file operations to read and write UTF-8 encoded files, as was suggest in a previous comment.

I also modified the .xsl files to output published posts only. The previous versions ignored the “published” flag and would output drafts and feedback entries.

The main features of wpxslgui remained the same:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

After downloading the latest version of wpxslgui, export your WordPress blog to XML (select “All content”), and convert the file into any of the supported output formats.

Fan Spam

All these comments originated from 109.230.246.23 (utrace, dnsstuff) and were posted on my Products page:

What an exciting article, preserve writing companion

Sweetheart, this site is without a doubt fabolous, i simply like it

hi there, your website is wonderful. We do thank you for job

Sweetheart, this amazing site can be fabolous, i recently fantastic

How much of an intriguing posting, keep producing better half

How much of an important article, continue to keep creating mate

What an unique posting, continue to keep crafting special someone

… accidentally, of course, naming a web address ending in .pl.

See this WordPress Support page on how to disable comments on your Pages, if you are hit by similar spam.

New Version of wpxslgui

wpxslgui is a Windows application which converts an XML File generated by the WordPress Export function into an HTML or Word HTML document.

Due to an undeclared element (namespace) in the file generated by WordPress, the application caused an exception and required manually editing the XML file to remove these elements. (see technical details here)

This bug has been fixed now, and the new version of wpxslgui can be download here.

Export your WordPress blog to XML, and convert into any of the supported output formats.

wpxslgui – WordPress XML Export Converter

wpxslgui is a Windows application which converts an XML File generated by the WordPress Export function into an HTML or Word HTML document.

This new program is based the two XSL style sheets I created earlier to process WordPress XML Exports: the “Single HTML” XSL and the “Table of Contents” XSL.

The program’s features are:

  • Convert WordPress XML to HTML Table of Contents with links to the original blog
  • Convert WordPress XML to a single HTML file allowing filter by category (JavaScript)
  • Convert WordPress XML to Word HTML document (can be saved as .doc or .docx in Word)

Simply download wpxslgui, export your WordPress blog to XML, and convert into any of the supported output formats.

Let me know what you think about it 😉

Generating Table of Contents from WordPress Export

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

In a previous post I described an XSLT file to convert the exported XML into a single HTML file. A click on an article’s title displayed the whole article using a little JavaScript and CSS.

This time, I wanted to create an HTML table of contents displaying all the blog’s categories. Upon selection of a category, only the posting titles of that category should be displayed. The titles link to the original blog articles.

The new XSLT is available for download here.

Applied to this blog, the generated Table of Contents is here.

Converting WordPress Export XML to HTML

WordPress implements an Export function which allows bloggers to download the contents of their blog as a single XML file.

Based on my previous work in dbscript to generate an HTML documentation of a database schema using XML and XSL I modified the “Single HTML” XSL of dbscript to create an XSL to transform the WordPress Export XML into an HTML page.

This HTML page lists all article titles with dates, categories and link to the original URL in the blog.

Clicking the article’s title will expand the list to display the contents of the selected article.

To display paragraphs properly, I needed to replace newlines with <br /> elements, as described on this page implementing various replace operations in XSL.

The WordPress HTML XSL file is available for download here.