he 
 
 
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.
Installation
Via npm:
npm install heVia Bower:
bower install heVia Component:
component install mathiasbynens/heIn a browser:
<script src="he.js"></script>In Node.js, io.js, Narwhal, and RingoJS:
var he = require('he');In Rhino:
load('he.js');Using an AMD loader like RequireJS:
require(
  {
    'paths': {
      'he': 'path/to/he'
    }
  },
  ['he'],
  function(he) {
    console.log(he);
  }
);API
he.version
A string representing the semantic version number.
he.encode(text, options)
This function takes a string of text and encodes (by default) any
symbols that aren’t printable ASCII symbols and &,
<, >, ", ',
and `, replacing them with character references.
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz 𝌆 qux'As long as the input string contains allowed code points only, the return value of this function is always valid HTML. Any (invalid) code points that cannot be represented using a character reference in the input are not encoded:
he.encode('foo \0 bar');
// → 'foo \0 bar'However, enabling the
strict option causes invalid code points to throw an
exception. With strict enabled, he.encode
either throws (if the input contains invalid code points) or returns a
string of valid HTML.
The options object is optional. It recognizes the
following properties:
useNamedReferences
The default value for the useNamedReferences option is
false. This means that encode() will not use
any named character references (e.g. ©) in the
output — hexadecimal escapes (e.g. ©) will be used
instead. Set it to true to enable the use of named
references.
Note that if compatibility with older browsers is a concern, this option should remain disabled.
// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly disallow named references:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'useNamedReferences': false
});
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly allow named references:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'useNamedReferences': true
});
// → 'foo © bar ≠ baz 𝌆 qux'decimal
The default value for the decimal option is
false. If the option is enabled, encode will
generally use decimal escapes (e.g. ©) rather than
hexadecimal escapes (e.g. ©). Beside of this
replacement, the basic behavior remains the same when combined with
other options. For example: if both options
useNamedReferences and decimal are enabled,
named references (e.g. ©) are used over decimal
escapes. HTML entities without a named reference are encoded using
decimal escapes.
// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly disable decimal escapes:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'decimal': false
});
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly enable decimal escapes:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'decimal': true
});
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly allow named references and decimal escapes:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'useNamedReferences': true,
  'decimal': true
});
// → 'foo © bar ≠ baz 𝌆 qux'encodeEverything
The default value for the encodeEverything option is
false. This means that encode() will not use
any character references for printable ASCII symbols that don’t need
escaping. Set it to true to encode every symbol in the
input string. When set to true, this option takes
precedence over allowUnsafeSymbols (i.e. setting the latter
to true in such a case has no effect).
// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly encode all symbols:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'encodeEverything': true
});
// → 'foo © bar ≠ baz 𝌆 qux'
// This setting can be combined with the `useNamedReferences` option:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'encodeEverything': true,
  'useNamedReferences': true
});
// → 'foo © bar ≠ baz 𝌆 qux'strict
The default value for the strict option is
false. This means that encode() will encode
any HTML text content you feed it, even if it contains any symbols that
cause parse
errors. To throw an error when such invalid HTML is encountered, set
the strict option to true. This option makes
it possible to use he as part of HTML parsers and HTML
validators.
// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.encode('\x01');
// → ''
// Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
he.encode('\x01', {
  'strict': false
});
// → ''
// Passing an `options` object to `encode`, to explicitly enable strict mode:
he.encode('\x01', {
  'strict': true
});
// → Parse errorallowUnsafeSymbols
The default value for the allowUnsafeSymbols option is
false. This means that characters that are unsafe for use
in HTML content (&, <,
>, ", ', and `)
will be encoded. When set to true, only non-ASCII
characters will be encoded. If the encodeEverything option
is set to true, this option will be ignored.
he.encode('foo © and & ampersand', {
  'allowUnsafeSymbols': true
});
// → 'foo © and & ampersand'Overriding default
encode options globally
The global default setting can be overridden by modifying the
he.encode.options object. This saves you from passing in an
options object for every call to encode if you
want to use the non-default setting.
// Read the global default setting:
he.encode.options.useNamedReferences;
// → `false` by default
// Override the global default setting:
he.encode.options.useNamedReferences = true;
// Using the global default setting, which is now `true`:
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz 𝌆 qux'he.decode(html, options)
This function takes a string of HTML and decodes any named and numerical character references in it using the algorithm described in section 12.2.4.69 of the HTML spec.
he.decode('foo © bar ≠ baz 𝌆 qux');
// → 'foo © bar ≠ baz 𝌆 qux'The options object is optional. It recognizes the
following properties:
isAttributeValue
The default value for the isAttributeValue option is
false. This means that decode() will decode
the string as if it were used in a
text context in an HTML document. HTML has different rules for parsing
character references in attribute values — set this option to
true to treat the input string as if it were used as an
attribute value.
// Using the global default setting (defaults to `false`, i.e. HTML text context):
he.decode('foo&bar');
// → 'foo&bar'
// Passing an `options` object to `decode`, to explicitly assume an HTML text context:
he.decode('foo&bar', {
  'isAttributeValue': false
});
// → 'foo&bar'
// Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
he.decode('foo&bar', {
  'isAttributeValue': true
});
// → 'foo&bar'strict
The default value for the strict option is
false. This means that decode() will decode
any HTML text content you feed it, even if it contains any entities that
cause parse
errors. To throw an error when such invalid HTML is encountered, set
the strict option to true. This option makes
it possible to use he as part of HTML parsers and HTML
validators.
// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.decode('foo&bar');
// → 'foo&bar'
// Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
he.decode('foo&bar', {
  'strict': false
});
// → 'foo&bar'
// Passing an `options` object to `decode`, to explicitly enable strict mode:
he.decode('foo&bar', {
  'strict': true
});
// → Parse errorOverriding default
decode options globally
The global default settings for the decode function can
be overridden by modifying the he.decode.options object.
This saves you from passing in an options object for every
call to decode if you want to use a non-default
setting.
// Read the global default setting:
he.decode.options.isAttributeValue;
// → `false` by default
// Override the global default setting:
he.decode.options.isAttributeValue = true;
// Using the global default setting, which is now `true`:
he.decode('foo&bar');
// → 'foo&bar'he.escape(text)
This function takes a string of text and escapes it for use in text
contexts in XML or HTML documents. Only the following characters are
escaped: &, <, >,
", ', and `.
he.escape('<img src=\'x\' onerror="prompt(1)">');
// → '<img src='x' onerror="prompt(1)">'he.unescape(html, options)
he.unescape is an alias for he.decode. It
takes a string of HTML and decodes any named and numerical character
references in it.
Using the he binary
To use the he binary in your shell, simply install
he globally using npm:
npm install -g heAfter that you will be able to encode/decode HTML entities from the command line:
$ he --encode 'föo ♥ bår 𝌆 baz'
föo ♥ bår 𝌆 baz
$ he --encode --use-named-refs 'föo ♥ bår 𝌆 baz'
föo ♥ bår 𝌆 baz
$ he --decode 'föo ♥ bår 𝌆 baz'
föo ♥ bår 𝌆 bazRead a local text file, encode it for use in an HTML text context, and save the result to a new file:
$ he --encode < foo.txt > foo-escaped.htmlOr do the same with an online text file:
$ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.htmlOr, the opposite — read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:
$ he --decode < foo-escaped.html > foo.txtOr do the same with an online HTML snippet:
$ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txtSee he --help for the full list of options.
Support
he has been tested in at least:
- Chrome 27-50
 - Firefox 3-45
 - Safari 4-9
 - Opera 10-12, 15–37
 - IE 6–11
 - Edge
 - Narwhal 0.3.2
 - Node.js v0.10, v0.12, v4, v5
 - PhantomJS 1.9.0
 - Rhino 1.7RC4
 - RingoJS 0.8-0.11
 
Unit tests & code coverage
After cloning this repository, run npm install to
install the dependencies needed for he development and testing. You may
want to install Istanbul globally using
npm install istanbul -g.
Once that’s done, you can run the unit tests in Node using
npm test or node tests/tests.js. To run the
tests in Rhino, Ringo, Narwhal, and web browsers as well, use
grunt test.
To generate the code coverage report, use
grunt cover.
Acknowledgements
Thanks to Simon Pieters (@zcorpan) for the many suggestions.
Author
| Mathias Bynens | 
License
he is available under the MIT license.