Markdown Language Support



Markdown and Visual Studio Code. Working with Markdown files in Visual Studio Code is simple, straightforward, and fun. Besides VS Code's basic editing, there are a number of Markdown specific features that will help you be more productive. Marked is a popular markdown parser written in JavaScript. I looked into its source code and found that it was actually not hard to.

Marked is a popular markdown parser written in JavaScript. I looked into its source code and found that it was actually not hard to understand. All the important code is in a single JavaScript file which contains merely a thousand lines.

The source code is made up of the following parts:

  1. Rules/Grammar for block level components (such as headings, blockquotes) as well as a Lexer.
  2. Rules/Grammar for inline components (such as strong tag, links) as well as an InlineLexer.
  3. A Renderer that outputs final html code
  4. A Parser that uses the results from lexer and pass them to Renderer.
  5. Some helper functions.

Alright, I know there are some words that might sounds unfamiliar. I will explain in the section below.

A Brief and Incomplete Introduction of Some Compiler Concepts

I have never taken a compiler design class. So I can't go into details of how a compiler work. Basically, a program needs to go through 5 stages before they can executed.

Lexer

Lexer converts a program into a list of tokens. Tokens are anything that has meaning in a program, for example, variables, operators, semicolons, or even indentations. A simple program like below,

can be converted into the following tokens: a, =, 12, ;, b, =, 2, ;, c, =, a, +, b, ;.

If you want to write a new language, most likely you don't need to implement a Lexer. You can find a lexer implementation in any language of your choice.

How does the lexer work internally? Intuitively, it just read through the code from beginning to end, try each pattern, see if one of them match, if it does, spit out a token and move the read head to the next position.

For markdown, the tokens are the elements, for example

Markdown

are made of three tokens, a level 1 heading (value: 'Beautiful Day'), a paragraph (value: 'This is a beautiful day, because finally someone reads my blog') and a blockquote (value: 'Thank You').

Parser

A parser will try to understand the program in a higher level than lexer. Based on the tokens, parser can split the program into different sections. For example, if it sees a semicolon token, it knows that the tokens before and after are two different statements. If it sees a operator surrounded by variables, it's probably an expression. Some section might contain multiple smaller sections. For example, a if block is composed of a condition expression, the then clause, and possibly the else clause. We can view the structure of a program as a tree. The root is the whole program, and the leaves are variables, constants, operators, etc. This tree is called AST (Abstract Syntax Tree). Below is an example of AST from Wikipedia

In terms of markdown, a parser doesn't need to build a AST. This is how the parser in Marked.js is written:

Yeah, that's right, since markdown doesn't have many nested structures, it just parse the token one by one in order.

Lexer and parser are all we need in order to parse a simple language as markdown. If you are interested in learning further steps in compiling/interpreting a programing language, I would suggest watching this introductory video.

Lexer VS InlineLexer.

In Marked.js, there are two kind of lexers, their implementations are very similar.

Lexer breaks the input string into tokens like heading, code, table, blockquotes, etc.

After lexer generates those tokens, a parser will try to convert each token into html output. If a token is very simple, such as space, parser will just do the conversion itself (just return an ' ').

If a token is more complicated, such as hr (horizontal rule), parser will call a renderer to produce a html string. If a token might contain some inline elements, such as a paragraph might contain bold tags, parser will call inlineLexer to lex the token further more.

It's interesting that inlineLexer actually reduces the steps from a raw string to html output. When it finds an match with a rule, it will simply return a html output instead of returning a token.

The Process

So to sum up, this is how Marked.js turns a markdown in string form into html output (also in a string):

  1. Construct Lexer and inlineLexer with regex rules that matches different element of markdown.
  2. Call Lexer on the input string, turning it into an array of tokens.
  3. Parser iterate over the tokens, convert the tokens into html strings.
  4. If a token might contain inline elements, call inlineLexer on it to find the smaller elements, then concat the output html for those smaller elements.
  5. Concat the outputs for all tokens.

Why Do We Care

Why do we even bother looking into the source code of an library? I think it's interesting to see how things work. As inexperience developers, we tend to view some libraries as pure magic and try to avoid it's internal. But as soon as we dig into them, we will find that they are actually very reasonable.

What interests me about this particular library is that, since it breaks the parsing of markdown into several steps, what if we change the renderer to output React elements instead of html strings? I will do more research on it and hopefully have a post about it. Stay tuned.

Markdown is a lightweight markup language for adding formatting elements to plain text. PyCharm recognizes Markdown files, provides a dedicated editor with highlighting, completion, and formatting, and shows the rendered HTML in a live preview pane.

Create a new Markdown file

By default, PyCharm recognizes any file with the .md or .markdown extension as a Markdown file.

  1. Right-click a directory in the Project tool window Alt+1 and select New | File.

    Alternatively, you can select the necessary directory, press Alt+Insert, and then select File.

  2. Enter a name for your file with a recognized extension, for example: readme.md.

Support

The Markdown editor provides several basic formatting actions in the toolbar:

Markdown Language Support Login

  • : Bold

  • : Strikethrough

  • : Italic

  • : Code

  • : Decrease heading level

  • : Increase heading level

  • : Convert an inline link to a reference link

You can use the preview pane to see the rendered HTML.

There is also completion for links to files in the current project, for example, if you need to reference source code, images, or other Markdown files.

Code blocks

To insert a fenced code block, use triple backticks (```) before and after the code block. If you specify the language for the code block, by default, the Markdown editor injects the corresponding language. This enables syntax highlighting and other coding assistance features for the specified language: code completion, inspections, and intention actions.

Disable coding assistance in code blocks

If your code blocks are not meant to be syntactically correct, you may want to disable code injection and syntax errors in code blocks.

  1. In the Settings/Preferences dialog Ctrl+Alt+S, select Languages & Frameworks | Markdown.

  2. Configure the following options:

    Disable automatic language injection in code fencesDo not inject any coding assistance for code blocks.
    Hide errors in code fencesDo not check the syntax for errors.
  3. Click OK to apply the changes.

Diagrams

The Markdown editor can render diagrams defined with Mermaid and PlantUML. This is disabled by default and requires the corresponding Markdown extensions.

Enable diagram support

  1. In the Settings/Preferences dialog Ctrl+Alt+S, select Languages & Frameworks | Markdown.

  2. Enable either Mermaid or PlantUML under Markdown Extensions.

  3. After PyCharm downloads the relevant extensions, click OK to apply the changes.

HTML preview

Markdown language support tool

By default, the Markdown editor shows a preview pane next to it for rendered HTML code based on the Markdown file. You can use or in the top right corner of the Markdown editor to show only the editor or the preview pane.

The scrollbars in the editor and in the preview pane are synchronized, meaning that the location in the preview pane corresponds to the location in the source. To disable this, click in the top right corner of the Markdown editor.

To split the editor and preview pane horizontally (top and bottom) instead of the default vertical split, in the Settings/Preferences dialog Ctrl+Alt+S, select Languages & Frameworks | Markdown, and then select Split horizontally under Editor and Preview Panel Layout.

Custom CSS

PyCharm provides default style sheets for rendering HTML in the preview pane. These style sheets were designed to be consistent with the default UI themes. You can configure specific CSS rules to make small presentation changes (for example, change the font size for headings or line spacing in lists) or you can provide an entirely new CSS to better match your expected output (for example, if you want to replicate the GitHub Markdown style).

  1. In the Settings/Preferences dialog Ctrl+Alt+S, select Languages & Frameworks | Markdown.

  2. Configure the settings under Custom CSS:

    • Select Load from URI to specify the location of a custom CSS file.

    • Select Add CSS rules rules to enter specific CSS rules that you want to override.

Reformat Markdown files

PyCharm can format Markdown files with proper line wrappings, blank lines, and indentation. For more information, see Reformat and rearrange code.

  • From the main menu, select Code | Reformat Code or press Ctrl+Alt+L.

PyCharm formats the contents according to the code style settings for Markdown files.

Configure Markdown code style settings

  • In the Settings/Preferences dialog Ctrl+Alt+S, select Editor | Code Style | Markdown.

Markdown code style settings include the following:

Configure the options for breaking lines.

Hard wrap atSpecify at which column to put a line break. PyCharm shows a vertical line at the specified column and breaks lines between words, not within words.
Wrap on typingAdd line breaks as you type. Disable this option to add line breaks only when PyCharm performs formatting.
Visual guidesShow an additional vertical line at the specified column.

Configure the options for nesting text blocks and alignment within a block.

Use tab characterUse the tab character for indentation. Disable this option to use spaces for indentation.
Smart tabsNest blocks with tabs and align with spaces. Disable this option to use only tabs and replace spaces that fit the specified tab size with tabs.
Tab sizeSpecify the number of spaces to render in place of one tab character.
IndentSpecify the number of spaces used for each indentation level.
Continuation indentSpecify the number of spaces used for continuing the same text block.
Keep indents on empty linesRetain tabs and spaces on empty lines. By default, this option is disabled and PyCharm removes tabs and spaces if there is nothing else on that line.

Set the maximum and minimum number of blank lines to keep for various text elements.

Around headerBefore and after chapter headings.
Around block elementsBefore and after code blocks.
Between paragraphsBetween two adjacent paragraphs.
Github markdown language support

Specify which elements should have exactly one space.

Between wordsRemove extra spaces between words.
After header symbolRemove extra spaces or add a missing space between the header symbol and the header title.
After list markerRemove extra spaces or add a missing space between the list item marker and the list item text.
After blockquote markerRemove extra spaces or add a missing space between the block quote marker and the text of the block quote.

Productivity tips

Customize highlighting for Markdown

PyCharm highlights various Markdown elements according to the color scheme settings.

  1. In the Settings/Preferences dialog Ctrl+Alt+S, select Editor | Color Scheme | Markdown.

  2. Select the color scheme, accept the highlighting settings inherited from defaults, or customize them as described in Configuring colors and fonts.

Navigate in a large Markdown file

  • Use the Structure tool window Alt+7 or the File Structure popup Ctrl+F12 to view and jump to the relevant headings.

Markdown does not have dedicated syntax for commenting out lines. However, it is possible to emulate a comment line using a link label without an address, like this:

There must be a blank line before the link label.

Markdown Code Language

  • Put the caret at the line that you want to comment out and press Ctrl+/.

    This will add a link label with the commented out text in parentheses and a blank line before it if necessary. Press the same shortcut to uncomment.

Python Markdown Language Support

Last modified: 06 April 2021