This repository has been archived on 2020-11-02. You can view files and clone it, but cannot push or open issues or pull requests.

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# remark-parse

[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]
[![Sponsors][sponsors-badge]][collective]
[![Backers][backers-badge]][collective]
[![Chat][chat-badge]][chat]

[Parser][] for [**unified**][unified].
Parses Markdown to [**mdast**][mdast] syntax trees.
Used in the [**remark** processor][remark] but can be used on its own as well.
Can be [extended][extend] to change how Markdown is parsed.

## Sponsors

<!--lint ignore no-html-->

<table>
  <tr valign="top">
    <td width="33.33%" align="center" colspan="2">
      <a href="https://www.gatsbyjs.org">Gatsby</a><br>🥇<br><br>
      <a href="https://www.gatsbyjs.org"><img src="https://avatars1.githubusercontent.com/u/12551863?s=900&v=4"></a>
    </td>
    <td width="33.33%" align="center" colspan="2">
      <a href="https://vercel.com">Vercel</a><br>🥇<br><br>
      <!--OC has a sharper image-->
      <a href="https://vercel.com"><img src="https://images.opencollective.com/vercel/d8a5bee/logo/512.png"></a>
    </td>
    <td width="33.33%" align="center" colspan="2">
      <a href="https://www.netlify.com">Netlify</a><br><br><br>
      <!--OC has a sharper image-->
      <a href="https://www.netlify.com"><img src="https://images.opencollective.com/netlify/4087de2/logo/512.png"></a>
    </td>
  </tr>
  <tr valign="top">
    <td width="16.67%" align="center">
      <a href="https://www.holloway.com">Holloway</a><br><br><br>
      <a href="https://www.holloway.com"><img src="https://avatars1.githubusercontent.com/u/35904294?s=300&v=4"></a>
    </td>
    <td width="16.67%" align="center">
      <a href="https://themeisle.com">ThemeIsle</a><br>🥉<br><br>
      <a href="https://themeisle.com"><img src="https://twitter-avatar.now.sh/themeisle"></a>
    </td>
    <td width="16.67%" align="center">
      <a href="https://boostio.co">BoostIO</a><br>🥉<br><br>
      <a href="https://boostio.co"><img src="https://avatars1.githubusercontent.com/u/13612118?s=300&v=4"></a>
    </td>
    <td width="16.67%" align="center">
      <a href="https://expo.io">Expo</a><br>🥉<br><br>
      <a href="https://expo.io"><img src="https://avatars1.githubusercontent.com/u/12504344?s=300&v=4"></a>
    </td>
    <td width="50%" align="center" colspan="2">
      <br><br><br><br>
      <a href="https://opencollective.com/unified"><strong>You?</strong></a>
    </td>
  </tr>
</table>

## Install

[npm][]:

```sh
npm install remark-parse
```

## Use

```js
var unified = require('unified')
var createStream = require('unified-stream')
var markdown = require('remark-parse')
var remark2rehype = require('remark-rehype')
var html = require('rehype-stringify')

var processor = unified()
  .use(markdown, {commonmark: true})
  .use(remark2rehype)
  .use(html)

process.stdin.pipe(createStream(processor)).pipe(process.stdout)
```

[See **unified** for more examples »][unified]

## Contents

*   [API](#api)
    *   [`processor().use(parse[, options])`](#processoruseparse-options)
    *   [`parse.Parser`](#parseparser)
*   [Extending the `Parser`](#extending-the-parser)
    *   [`Parser#blockTokenizers`](#parserblocktokenizers)
    *   [`Parser#blockMethods`](#parserblockmethods)
    *   [`Parser#inlineTokenizers`](#parserinlinetokenizers)
    *   [`Parser#inlineMethods`](#parserinlinemethods)
    *   [`function tokenizer(eat, value, silent)`](#function-tokenizereat-value-silent)
    *   [`tokenizer.locator(value, fromIndex)`](#tokenizerlocatorvalue-fromindex)
    *   [`eat(subvalue)`](#eatsubvalue)
    *   [`add(node[, parent])`](#addnode-parent)
    *   [`add.test()`](#addtest)
    *   [`add.reset(node[, parent])`](#addresetnode-parent)
    *   [Turning off a tokenizer](#turning-off-a-tokenizer)
*   [Security](#security)
*   [Contribute](#contribute)
*   [License](#license)

## API

[See **unified** for API docs »][unified]

### `processor().use(parse[, options])`

Configure the `processor` to read Markdown as input and process
[**mdast**][mdast] syntax trees.

##### `options`

Options can be passed directly, or passed later through
[`processor.data()`][data].

###### `options.gfm`

GFM mode (`boolean`, default: `true`).

```markdown
hello ~~hi~~ world
```

Turns on:

*   [Fenced code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks#fenced-code-blocks)
*   [Autolinking of URLs](https://help.github.com/articles/autolinked-references-and-urls)
*   [Deletions (strikethrough)](https://help.github.com/articles/basic-writing-and-formatting-syntax#styling-text)
*   [Task lists](https://help.github.com/articles/basic-writing-and-formatting-syntax#task-lists)
*   [Tables](https://help.github.com/articles/organizing-information-with-tables)

###### `options.commonmark`

CommonMark mode (`boolean`, default: `false`).

```markdown
This is a paragraph
    and this is also part of the preceding paragraph.
```

Allows:

*   Empty lines to split block quotes
*   Parentheses (`(` and `)`) around link and image titles
*   Any escaped [ASCII punctuation][escapes] character
*   Closing parenthesis (`)`) as an ordered list marker
*   URL definitions in block quotes

Disallows:

*   Indented code blocks directly following a paragraph
*   ATX headings (`# Hash headings`) without spacing after opening hashes or and
    before closing hashes
*   Setext headings (`Underline headings\n---`) when following a paragraph
*   Newlines in link and image titles
*   White space in link and image URLs in auto-links (links in brackets, `<` and
    `>`)
*   Lazy block quote continuation, lines not preceded by a greater than
    character (`>`), for lists, code, and thematic breaks

###### `options.pedantic`

⚠️ Pedantic was previously used to mimic old-style Markdown mode: no tables, no
fenced code, and with many bugs.
Its currently still “working”, but please do not use it, itll be removed in
the future.

###### `options.blocks`

Blocks (`Array.<string>`, default: list of [block HTML elements][blocks]).

```markdown
<block>foo
</block>
```

Defines which HTML elements are seen as block level.

### `parse.Parser`

Access to the [parser][], if you need it.

## Extending the `Parser`

Typically, using [*transformers*][transformer] to manipulate a syntax tree
produces the desired output.
Sometimes, such as when introducing new syntactic entities with a certain
precedence, interfacing with the parser is necessary.

If the `remark-parse` plugin is used, it adds a [`Parser`][parser] constructor
function to the `processor`.
Other plugins can add tokenizers to its prototype to change how Markdown is
parsed.

The below plugin adds a [tokenizer][] for at-mentions.

```js
module.exports = mentions

function mentions() {
  var Parser = this.Parser
  var tokenizers = Parser.prototype.inlineTokenizers
  var methods = Parser.prototype.inlineMethods

  // Add an inline tokenizer (defined in the following example).
  tokenizers.mention = tokenizeMention

  // Run it just before `text`.
  methods.splice(methods.indexOf('text'), 0, 'mention')
}
```

### `Parser#blockTokenizers`

Map of names to [tokenizer][]s (`Object.<Function>`).
These tokenizers (such as `fencedCode`, `table`, and `paragraph`) eat from the
start of a value to a line ending.

See `#blockMethods` below for a list of methods that are included by default.

### `Parser#blockMethods`

List of `blockTokenizers` names (`Array.<string>`).
Specifies the order in which tokenizers run.

Precedence of default block methods is as follows:

<!--methods-block start-->

*   `blankLine`
*   `indentedCode`
*   `fencedCode`
*   `blockquote`
*   `atxHeading`
*   `thematicBreak`
*   `list`
*   `setextHeading`
*   `html`
*   `definition`
*   `table`
*   `paragraph`

<!--methods-block end-->

### `Parser#inlineTokenizers`

Map of names to [tokenizer][]s (`Object.<Function>`).
These tokenizers (such as `url`, `reference`, and `emphasis`) eat from the start
of a value.
To increase performance, they depend on [locator][]s.

See `#inlineMethods` below for a list of methods that are included by default.

### `Parser#inlineMethods`

List of `inlineTokenizers` names (`Array.<string>`).
Specifies the order in which tokenizers run.

Precedence of default inline methods is as follows:

<!--methods-inline start-->

*   `escape`
*   `autoLink`
*   `url`
*   `email`
*   `html`
*   `link`
*   `reference`
*   `strong`
*   `emphasis`
*   `deletion`
*   `code`
*   `break`
*   `text`

<!--methods-inline end-->

### `function tokenizer(eat, value, silent)`

There are two types of tokenizers: block level and inline level.
Both are functions, and work the same, but inline tokenizers must have a
[locator][].

The following example shows an inline tokenizer that is added by the mentions
plugin above.

```js
tokenizeMention.notInLink = true
tokenizeMention.locator = locateMention

function tokenizeMention(eat, value, silent) {
  var match = /^@(\w+)/.exec(value)

  if (match) {
    if (silent) {
      return true
    }

    return eat(match[0])({
      type: 'link',
      url: 'https://social-network/' + match[1],
      children: [{type: 'text', value: match[0]}]
    })
  }
}
```

Tokenizers *test* whether a document starts with a certain syntactic entity.
In *silent* mode, they return whether that test passes.
In *normal* mode, they consume that token, a process which is called “eating”.

Locators enable inline tokenizers to function faster by providing where the next
entity may occur.

###### Signatures

*   `Node? = tokenizer(eat, value)`
*   `boolean? = tokenizer(eat, value, silent)`

###### Parameters

*   `eat` ([`Function`][eat]) — Eat, when applicable, an entity
*   `value` (`string`) — Value which may start an entity
*   `silent` (`boolean`, optional) — Whether to detect or consume

###### Properties

*   `locator` ([`Function`][locator]) — Required for inline tokenizers
*   `onlyAtStart` (`boolean`) — Whether nodes can only be found at the beginning
    of the document
*   `notInBlock` (`boolean`) — Whether nodes cannot be in block quotes or lists
*   `notInList` (`boolean`) — Whether nodes cannot be in lists
*   `notInLink` (`boolean`) — Whether nodes cannot be in links

###### Returns

*   `boolean?`, in *silent* mode — whether a node can be found at the start of
    `value`
*   [`Node?`][node], In *normal* mode — If it can be found at the start of
    `value`

### `tokenizer.locator(value, fromIndex)`

Locators are required for inline tokenizers.
Their role is to keep parsing performant.

The following example shows a locator that is added by the mentions tokenizer
above.

```js
function locateMention(value, fromIndex) {
  return value.indexOf('@', fromIndex)
}
```

Locators enable inline tokenizers to function faster by providing information on
where the next entity *may* occur.
Locators may be wrong, its OK if there actually isnt a node to be found at the
index they return.

###### Parameters

*   `value` (`string`) — Value which may contain an entity
*   `fromIndex` (`number`) — Position to start searching at

###### Returns

`number` — Index at which an entity may start, and `-1` otherwise.

### `eat(subvalue)`

```js
var add = eat('foo')
```

Eat `subvalue`, which is a string at the start of the [tokenized][tokenizer]
`value`.

###### Parameters

*   `subvalue` (`string`) - Value to eat

###### Returns

[`add`][add].

### `add(node[, parent])`

```js
var add = eat('foo')

add({type: 'text', value: 'foo'})
```

Add [positional information][position] to `node` and add `node` to `parent`.

###### Parameters

*   `node` ([`Node`][node]) - Node to patch position on and to add
*   `parent` ([`Parent`][parent], optional) - Place to add `node` to in the
    syntax tree.
    Defaults to the currently processed node

###### Returns

[`Node`][node] — The given `node`.

### `add.test()`

Get the [positional information][position] that would be patched on `node` by
`add`.

###### Returns

[`Position`][position].

### `add.reset(node[, parent])`

`add`, but resets the internal position.
Useful for example in lists, where the same content is first eaten for a list,
and later for list items.

###### Parameters

*   `node` ([`Node`][node]) - Node to patch position on and insert
*   `parent` ([`Node`][node], optional) - Place to add `node` to in
    the syntax tree.
    Defaults to the currently processed node

###### Returns

[`Node`][node] — The given node.

### Turning off a tokenizer

In some situations, you may want to turn off a tokenizer to avoid parsing that
syntactic feature.

Preferably, use the [`remark-disable-tokenizers`][remark-disable-tokenizers]
plugin to turn off tokenizers.

Alternatively, this can be done by replacing the tokenizer from
`blockTokenizers` (or `blockMethods`) or `inlineTokenizers` (or
`inlineMethods`).

The following example turns off indented code blocks:

```js
remarkParse.Parser.prototype.blockTokenizers.indentedCode = indentedCode

function indentedCode() {
  return true
}
```

## Security

As Markdown is sometimes used for HTML, and improper use of HTML can open you up
to a [cross-site scripting (XSS)][xss] attack, use of remark can also be unsafe.
When going to HTML, use remark in combination with the [**rehype**][rehype]
ecosystem, and use [`rehype-sanitize`][sanitize] to make the tree safe.

Use of remark plugins could also open you up to other attacks.
Carefully assess each plugin and the risks involved in using them.

## Contribute

See [`contributing.md`][contributing] in [`remarkjs/.github`][health] for ways
to get started.
See [`support.md`][support] for ways to get help.
Ideas for new plugins and tools can be posted in [`remarkjs/ideas`][ideas].

A curated list of awesome remark resources can be found in [**awesome
remark**][awesome].

This project has a [code of conduct][coc].
By interacting with this repository, organization, or community you agree to
abide by its terms.

## License

[MIT][license] © [Titus Wormer][author]

<!-- Definitions -->

[build-badge]: https://img.shields.io/travis/remarkjs/remark.svg

[build]: https://travis-ci.org/remarkjs/remark

[coverage-badge]: https://img.shields.io/codecov/c/github/remarkjs/remark.svg

[coverage]: https://codecov.io/github/remarkjs/remark

[downloads-badge]: https://img.shields.io/npm/dm/remark-parse.svg

[downloads]: https://www.npmjs.com/package/remark-parse

[size-badge]: https://img.shields.io/bundlephobia/minzip/remark-parse.svg

[size]: https://bundlephobia.com/result?p=remark-parse

[sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg

[backers-badge]: https://opencollective.com/unified/backers/badge.svg

[collective]: https://opencollective.com/unified

[chat-badge]: https://img.shields.io/badge/chat-spectrum-7b16ff.svg

[chat]: https://spectrum.chat/unified/remark

[health]: https://github.com/remarkjs/.github

[contributing]: https://github.com/remarkjs/.github/blob/HEAD/contributing.md

[support]: https://github.com/remarkjs/.github/blob/HEAD/support.md

[coc]: https://github.com/remarkjs/.github/blob/HEAD/code-of-conduct.md

[ideas]: https://github.com/remarkjs/ideas

[awesome]: https://github.com/remarkjs/awesome-remark

[license]: https://github.com/remarkjs/remark/blob/main/license

[author]: https://wooorm.com

[npm]: https://docs.npmjs.com/cli/install

[unified]: https://github.com/unifiedjs/unified

[data]: https://github.com/unifiedjs/unified#processordatakey-value

[remark]: https://github.com/remarkjs/remark/tree/main/packages/remark

[blocks]: https://github.com/remarkjs/remark/blob/main/packages/remark-parse/lib/block-elements.js

[mdast]: https://github.com/syntax-tree/mdast

[escapes]: https://spec.commonmark.org/0.29/#backslash-escapes

[node]: https://github.com/syntax-tree/unist#node

[parent]: https://github.com/syntax-tree/unist#parent

[position]: https://github.com/syntax-tree/unist#position

[parser]: https://github.com/unifiedjs/unified#processorparser

[transformer]: https://github.com/unifiedjs/unified#function-transformernode-file-next

[extend]: #extending-the-parser

[tokenizer]: #function-tokenizereat-value-silent

[locator]: #tokenizerlocatorvalue-fromindex

[eat]: #eatsubvalue

[add]: #addnode-parent

[remark-disable-tokenizers]: https://github.com/zestedesavoir/zmarkdown/tree/HEAD/packages/remark-disable-tokenizers

[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting

[rehype]: https://github.com/rehypejs/rehype

[sanitize]: https://github.com/rehypejs/rehype-sanitize