To analyse a couple of PDF files whether they contain only images, I used the latest release build of PDFsharp, version 1.32.
However, when processing a certain file (of unknown origin) using code found in an SO answer
public static IEnumerable ExtractText(this PdfPage page)
{
var content = ContentReader.ReadContent(page);
var text = content.ExtractText();
return text;
}
the ExtractText() function simply would not return.
I upgraded to the most current build 1.50 beta 3, included the source in my project, and ran it in Debug mode, where execution halted in the file PDFsharp\src\PdfSharp\Pdf.Content\CParser.cs line 163 failing an assertion:
#if DEBUG
default:
Debug.Assert(false);
break;
#endif
Without digging too deep into the analysis of PDF files, it was clear that the PDF contained a CSymbol that is not being handled by the library, and thus (most likely) ended up in an infinite loop inside CParser.ParseObject().
I fixed this by replacing the Debug.Assert statement with
throw new Exception("unhandled PDF symbol " + symbol);
which fixed the situation for me.