Markup languages: contemplated, categorized, and criticized


Playlists: 'glt24' videos starting here / audio

As part of my FOSS project in digital typesetting, I am interested in the depths of markup languages. How did markup languages emerge? How powerful and extensible shall they be? In this talk, I want to summarize my findings.

Markup languages like Markdown & MediaWiki are prevalent for software developers to specify formatted content. One might wonder whether all problems are solved because most software developer resort to “Markdown” when user-provided texts are written these days. But are they? At least once you discover the differences between Markdown dialects, you wonder whether Markdown is really a good choice.

In my investigations, I started with the simple question how to build a simple markup language parser. It turns out the simplicity in the parser is orthogonal to user friendliness. Simultaneously, I recognized how escaping mechanisms can - indeed - be designed simple and memorable. And they are too often forgotten. When I continued with the parsing topic, I recognized that markup languages do not at all serve the same purpose and need to be categorized. But this seems nontrivial. What is a “lightweight” markup language? What is a “document” markup language? And how can the simplicity of syntax be measured? What makes “XML” extensible? And what limits the adoption of markup languages besides Markdown?

In this talk, I want to revisit how SGML turned into XML, remember the good old bbcodes of board software from the 2000s, how Wiki syntaxes were popularized and vanished, and Github made Markdown a big player. Afterwards I will present my findings and propose how to solve related questions.