This repository has been archived on 2020-11-02. You can view files and clone it, but cannot push or open issues or pull requests.

[![Build Status](https://travis-ci.org/Borewit/strtok3.svg?branch=master)](https://travis-ci.org/Borewit/strtok3)
[![NPM version](https://badge.fury.io/js/strtok3.svg)](https://npmjs.org/package/strtok3)
[![npm downloads](http://img.shields.io/npm/dm/strtok3.svg)](https://npmcharts.com/compare/strtok3,token-types?start=1200&interval=30)
[![Dependabot Status](https://api.dependabot.com/badges/status?host=github&repo=Borewit/music-metadata)](https://dependabot.com)[![Coverage status](https://coveralls.io/repos/github/Borewit/strtok3/badge.svg?branch=master)](https://coveralls.io/github/Borewit/strtok3?branch=master)
[![DeepScan grade](https://deepscan.io/api/teams/5165/projects/8526/branches/103329/badge/grade.svg)](https://deepscan.io/dashboard#view=project&tid=5165&pid=8526&bid=103329)
[![Known Vulnerabilities](https://snyk.io/test/github/Borewit/strtok3/badge.svg?targetFile=package.json)](https://snyk.io/test/github/Borewit/strtok3?targetFile=package.json)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/Borewit/strtok3.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/Borewit/strtok3/alerts/)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/59dd6795e61949fb97066ca52e6097ef)](https://www.codacy.com/app/Borewit/strtok3?utm_source=github.com&utm_medium=referral&utm_content=Borewit/strtok3&utm_campaign=Badge_Grade)
[![Language grade: JavaScript](https://img.shields.io/lgtm/grade/javascript/g/Borewit/strtok3.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/Borewit/strtok3/context:javascript)
# strtok3

A promise based streaming [*tokenizer*](#tokenizer) for [Node.js](http://nodejs.org) and browsers.
This node module is a successor of [strtok2](https://github.com/Borewit/strtok2).

The `strtok3` contains a few methods to turn different input into a [*tokenizer*](#tokenizer). Designed to
*   Support a streaming environment
*   Decoding of binary data, strings and numbers in mind
*   Read [predefined](https://github.com/Borewit/token-types) or custom tokens.
*   Optimized [*tokenizers*](#tokenizer) for reading from [file](#method-strtok3fromfile), [stream](#method-strtok3fromstream) or [buffer](#method-strtok3frombuffer).

It can read from:
*   A file (taking a file path as an input)
*   A Node.js [stream](https://nodejs.org/api/stream.html).
*   A [Buffer](https://nodejs.org/api/buffer.html)
*   HTTP chunked transfer provided by [@tokenizer/http](https://github.com/Borewit/tokenizer-http).
*   Chunked [Amazon S3](https://aws.amazon.com/s3) access provided by [@tokenizer/s3](https://github.com/Borewit/tokenizer-s3).

## Installation

```sh
npm install strtok3
```

## API

Use one of the methods to instantiate an [*abstract tokenizer*](#tokenizer):
*   [strtok3.fromFile](#method-strtok3fromfile)
*   [strtok3.fromStream](#method-strtok3fromstream)
*   [strtok3.fromBuffer](#method-strtok3frombuffer)

### strtok3 methods

All of the strtok3 methods return a [*tokenizer*](#tokenizer), either directly or via a promise.

#### Method `strtok3.fromFile()`

| Parameter | Type                  | Description                |
|-----------|-----------------------|----------------------------|
| path      | Path to file (string) | Path to file to read from  |

Note that [file-information](#file-information) is automatically added.

Returns, via a promise, a [*tokenizer*](#tokenizer) which can be used to parse a file.

```js
const strtok3 = require('strtok3');
const Token = require('token-types');
    
(async () => {

  const tokenizer = await strtok3.fromFile("somefile.bin");
         try {
    const myNumber = await tokenizer.readToken(Token.UINT8);
    console.log(`My number: ${myNumber}`);
  } finally {
    tokenizer.close(); // Close the file
  } 
})();

```

#### Method `strtok3.fromStream()`

Create [*tokenizer*](#tokenizer) from a node.js [readable stream](https://nodejs.org/api/stream.html#stream_class_stream_readable).

| Parameter |  Optional | Type                                                                        | Description              |
|-----------|-----------|-----------------------------------------------------------------------------|--------------------------|
| stream    | no        | [Readable](https://nodejs.org/api/stream.html#stream_class_stream_readable) | Stream to read from      |
| fileInfo  | yes       | [IFileInfo](#IFileInfo)                                                     | Provide file information |

Returns a [*tokenizer*](#tokenizer), via a Promise, which can be used to parse a buffer.

```js
const strtok3 = require('strtok3');
const Token = require('token-types');

strtok3.fromStream(stream).then(tokenizer => {
  return tokenizer.readToken(Token.UINT8).then(myUint8Number => {
    console.log(`My number: ${myUint8Number}`);
  });
});
```

#### Method `strtok3.fromBuffer()`

| Parameter | Optional | Type                                         | Description              |
|-----------|----------|----------------------------------------------|--------------------------|
| buffer    | no       | [Buffer](https://nodejs.org/api/buffer.html) | Buffer to read from      |
| fileInfo  | yes      | [IFileInfo](#IFileInfo)                      | Provide file information |

Returns a [*tokenizer*](#tokenizer) which can be used to parse the provided buffer.

```js
const strtok3 = require('strtok3');
    
const tokenizer = strtok3.fromBuffer(buffer);

tokenizer.readToken(Token.UINT8).then(myUint8Number => {
  console.log(`My number: ${myUint8Number}`);
});
```

## Tokenizer
The tokenizer allows us to *read* or *peek* from the *tokenizer-stream*. The *tokenizer-stream* is an abstraction of a [stream](https://nodejs.org/api/stream.html), file or [Buffer](https://nodejs.org/api/buffer.html).
It can also be translated in chunked reads, as done in [@tokenizer/http](https://github.com/Borewit/tokenizer-http);

What is the difference with Nodejs.js stream?
*   The *tokenizer-stream* supports jumping / seeking in a the *tokenizer-stream* using [`tokenizer.ignore()`](#method-tokenizerignore)
*   In addition to *read* methods, it has *peek* methods, to read a ahead and check what is coming.

The [tokenizer.position](#attribute-tokenizerposition) keeps tracks of the read position.

### strtok3 attributes

#### Attribute `tokenizer.fileInfo`
Optional attribute describing the file information, see [IFileInfo](#IFileInfo)

#### Attribute `tokenizer.position`
Pointer to the current position in the [*tokenizer*](#tokenizer) stream.
If a *position* is provided to a *read* or *peek* method, is should be, at least, equal or greater than this value.

### Tokenizer methods 

There are to groups of methods

*   *read* methods: used to read a *token* of [Buffer](https://nodejs.org/api/buffer.html) from the [*tokenizer*](#tokenizer). The position of the *tokenizer-stream* will advance with the size of the token.
*   *peek* methods: same as the read, but it will *not* advance the pointer. It allows to read (peek) ahead.

#### Method `tokenizer.readBuffer()`

Read buffer from stream.
`readBuffer(buffer, options?)`

| Parameter  | Type                                                           | Description                                                                                                                                                                                                                            |
|------------|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| buffer     | [Buffer](https://nodejs.org/api/buffer.html) | Uint8Array | Target buffer to write the data read to                                                                                                                                                                                                |
| options    | [IReadChunkOptions](#ireadchunkoptions)                        | An integer specifying the number of bytes to read                                                                                                                                                                                      |

Return value `Promise<number>` Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set.

#### Method `tokenizer.peekBuffer()`

Peek (read ahead) buffer from [*tokenizer*](#tokenizer)
`peekBuffer(buffer,  options?)`

| Parameter  | Type                                    | Description                                         |
|------------|-----------------------------------------|-----------------------------------------------------|
| buffer     | Buffer &#124; Uint8Array                | Target buffer to write the data read (peeked) to.   |
| options    | [IReadChunkOptions](#ireadchunkoptions) | An integer specifying the number of bytes to read.  |                                                                                                                           |

Return value `Promise<number>` Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set.

#### Method `tokenizer.readToken()`

Read a *token* from the tokenizer-stream.
`readToken(token, position?)`

| Parameter  | Type                    | Description                                                                                                           |
|------------|-------------------------|---------------------------------------------------------------------------------------------------------------------- |
| token      | [IGetToken](#IGetToken) | Token to read from the tokenizer-stream.                                                                              |
| position?  | number                  | Offset where to begin reading within the file. If position is null, data will be read from the current file position. |

Return value `Promise<number>`. Promise with number of bytes read. The number of bytes read maybe if less, *mayBeLess* flag was set.

#### Method `tokenizer.peekToken()`

Peek a *token* from the [*tokenizer*](#tokenizer).
`peekToken(token, position?)`

| Parameter  | Type                       | Description                                                                                                             |
|------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------|
| token      | [IGetToken<T>](#IGetToken) | Token to read from the tokenizer-stream.                                                                                |
| position?  | number                     | Offset where to begin reading within the file. If position is null, data will be read from the current file position.   |

Return value `Promise<T>` Promise with token value peeked from the [*tokenizer*](#tokenizer).

#### Method `tokenizer.readNumber()`

Peek a numeric [*token*](#token) from the [*tokenizer*](#tokenizer).
`readNumber(token)`

| Parameter  | Type                            | Description                                        |
|------------|---------------------------------|----------------------------------------------------|
| token      | [IGetToken<number>](#IGetToken) | Numeric token to read from the tokenizer-stream.   |

Return value `Promise<number>` Promise with number peeked from the *tokenizer-stream*.

#### Method `tokenizer.ignore()`

Peek a numeric [*token*](#token) from the tokenizer-stream.
`ignore(length)`

| Parameter  | Type   | Description                                                          |
|------------|--------|----------------------------------------------------------------------|
| ignore     | number | Numeric of bytes to ignore. Will advance the `tokenizer.position`    |

Return value `Promise<number>` Promise with number peeked from the *tokenizer-stream*.

#### Method `tokenizer.close()`
Clean up resources, such as closing a file pointer if applicable.

### IReadChunkOptions

Each attribute is optional:

| Attribute | Type    | Description                                                                                                                                                                                                                   |
|-----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| offset    | number  | The offset in the buffer to start writing at; if not provided, start at 0                                                                                                                                                     |
| length    | number  | Requested number of bytes to read.                                                                                                                                                                                            |
| position  | number  | Position where to peek from the file. If position is null, data will be read from the [current file position](#attribute-tokenizerposition). Position may not be less then [tokenizer.position](#attribute-tokenizerposition) |
| mayBeLess | boolean | If and only if set, will not throw an EOF error if less then the requested *mayBeLess* could be read.                                                                                                                         |

Example:
```js
  tokenizer.peekBuffer(buffer, {mayBeLess: true});
```

## IFileInfo

File information interface which describes the underlying file, each attribute is optional.

| Attribute | Type    | Description                                                                                       |
|-----------|---------|---------------------------------------------------------------------------------------------------|
| size      | number  | File size in bytes                                                                                | 
| mimeType  | number  | [MIME-type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) of file. |
| path      | number  | File path                                                                                         |
| url       | boolean | File URL                                                                                          |

## Token

The *token* is basically a description what to read form the [*tokenizer-stream*](#tokenizer). 
A basic set of *token types* can be found here: [*token-types*](https://github.com/Borewit/token-types).

A token is something which implements the following interface:
```ts
export interface IGetToken<T> {

  /**
   * Length in bytes of encoded value
   */
  len: number;

  /**
   * Decode value from buffer at offset
   * @param buf Buffer to read the decoded value from
   * @param off Decode offset
   */
  get(buf: Buffer, off: number): T;
}
```
The *tokenizer* reads `token.len` bytes from the *tokenizer-stream* into a Buffer.
The `token.get` will be called with the Buffer. `token.get` is responsible for conversion from the buffer to the desired output type.

## Browser compatibility
To exclude fs based dependencies, you can use a submodule-import from 'strtok3/lib/core'.

| function              | 'strtok3'           | 'strtok3/lib/core'  |
| ----------------------| --------------------|---------------------|
| `parseBuffer`         | ✓                   | ✓                   |
| `parseStream`         | ✓                   | ✓                   |
| `fromFile`            | ✓                   |                     |

### Working with Web-API readable stream
To convert a [Web-API readable stream](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader) into a [Node.js readable stream]((https://nodejs.org/api/stream.html#stream_readable_streams)), you can use [readable-web-to-node-stream](https://github.com/Borewit/readable-web-to-node-stream) to convert one in another.

Example submodule-import:
```js
const strtok3core = require('strtok3/lib/core'); // Submodule-import to prevent Node.js specific dependencies
const {ReadableWebToNodeStream} = require('readable-web-to-node-stream');

(async () => {

  const response = await fetch(url);
  const readableWebStream = response.body; // Web-API readable stream
  const nodeStream = new ReadableWebToNodeStream(readableWebStream); // convert to Node.js readable stream
  
  const tokenizer = strtok3core.fromStream(nodeStream); // And we now have tokenizer in a web environment
})();
```