In this post I want to talk about some of the misunderstandings I frequently encounter when discussing markdown as a format for authoring scholarly documents.
Microsoft Word is of course what almost all authors use in the life sciences and many other disciplines. One big reason for this is the file formats accepted my manuscript submission systems. By limiting the options to Microsoft Word (and maybe LaTeX), you make it impossible for authors to use other tools, even if they wanted to. Publishers should accept manuscripts in any reasonable file format, as I have argued before.
There are of course numerous alternatives to markdown, including Textile, AsciiDoc, MediaWiki Markup and reStructuredText. There will always be features that are better implemented in one of these languages, but I don’t think there is room for more than one major initiative for a scholarly markup language. And markdown has the right mix of features and broad support by tools and the community.
Related to this there is the argument against markdown that the format is a mess and that there are too many versions (or flavors) of it. While that is certainly a big problem with markdown, I would argue that with Pandoc we have a nice standard and reference implementation for Scholarly Markdown. Pandoc is constantly evolving, and the addition of support for arbitrary YAML metadata was the biggest new feature in 2013 for me.
LaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation.
The LaTeX Team, 2014
Although LaTeX has solved many of the problems Scholarly Markdown tries to tackle a long time ago, it is still something else. LaTeX at its core is a typesetting system, which is not something Scholarly Markdown cares about for two reasons: a) the focus is on authoring documents, which are then submitted to other systems at publishers and elsewhere that are specialized in producing the final document, and b) the focus is on HTML and the web as this is where we want most of the interactions with scholarly documents to take place. This means that
We have to be very careful that we keep the right balance of simplicity and features in Scholarly Markdown. This means that sometimes we should just include the LaTeX code, e.g. for math.
WYSIWYG - What You See Is What You Get - is a user interface metaphor that is both a blessing and a curse. We desperately need better writing tools, and this of course also means user interfaces that help with that task. But the focus on creating a new authoring environment that focusses too much on WYSIWYG creates several problems:
Version control via git is central to Scholarly Markdown, and this can also be challenging for a WYSIWYG environment. But there are many good examples of how to make this work.
At the end of the day most scholarly publications in the life sciences are converted into JATS XML. Unfortunately central aspects of the format (e.g. the required document structure or required attributes) are difficult to enforce in an authoring environment. Even if you build a tool that can nicely handle this, I’m not so sure we want to burden an author with this, especially since the manuscript will usually undergo a lot of changes before it is accepted and then published.
Although the future for consuming scholarly documents is clearly HTML (and ePub), and there are great HTML editors, I’m not so sure that HTML will become the default for authoring environments. This is the reason why markdown and related markup languages were invented, and even with modern WYSIWYG editors working directly with HTML is not always the best choice. HTML has two problems: a) it is not as human-readable as markdown and therefore requires an additional layer for authoring, and b) it is not as structured as XML, which makes it difficult to create some of the rigid document structure required for scholarly documents. O’Reilly is trying to get more structure into HTML for print and digital books with HTMLBook, but with too much structure you might run into similar problems for authoring as discussed above for JATS XML. And of course you can include HTML in markdown documents.