csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than eve...

README

[tests]:  http://img.shields.io/travis/mafintosh/csv-parser.svg
[tests-url]: http://travis-ci.org/mafintosh/csv-parser

[cover]: https://codecov.io/gh/mafintosh/csv-parser/branch/master/graph/badge.svg
[cover-url]: https://codecov.io/gh/mafintosh/csv-parser

[size]: https://packagephobia.now.sh/badge?p=csv-parser
[size-url]: https://packagephobia.now.sh/result?p=csv-parser

csv-parser

[![tests][tests]][tests-url] [![cover][cover]][cover-url] [![size][size]][size-url]

Streaming CSV parser that aims for maximum speed as well as compatibility with
the csv-spectrum CSV acid test suite.

csv-parser can convert CSV into JSON at at rate of around 90,000 rows per
second. Performance varies with the data used; try `bin/bench.js `
to benchmark your data.

csv-parser can be used in the browser with browserify.

neat-csv can be used if aPromise
based interface to csv-parser is needed.

_Note: This module requires Node v8.16.0 or higher._

Benchmarks


⚡️ csv-parser is greased-lightning fast

  1. ```console
  2. npm run bench

  3.   Filename                 Rows Parsed  Duration
  4.   backtick.csv                       2     3.5ms
  5.   bad-data.csv                       3    0.55ms
  6.   basic.csv                          1    0.26ms
  7.   comma-in-quote.csv                 1    0.29ms
  8.   comment.csv                        2    0.40ms
  9.   empty-columns.csv                  1    0.40ms
  10.   escape-quotes.csv                  3    0.38ms
  11.   geojson.csv                        3    0.46ms
  12.   large-dataset.csv               7268      73ms
  13.   newlines.csv                       3    0.35ms
  14.   no-headers.csv                     3    0.26ms
  15.   option-comment.csv                 2    0.24ms
  16.   option-escape.csv                  3    0.25ms
  17.   option-maxRowBytes.csv          4577      39ms
  18.   option-newline.csv                 0    0.47ms
  19.   option-quote-escape.csv            3    0.33ms
  20.   option-quote-many.csv              3    0.38ms
  21.   option-quote.csv                   2    0.22ms
  22.   quotes+newlines.csv                3    0.20ms
  23.   strict.csv                         3    0.22ms
  24.   latin.csv                          2    0.38ms
  25.   mac-newlines.csv                   2    0.28ms
  26.   utf16-big.csv                      2    0.33ms
  27.   utf16.csv                          2    0.26ms
  28.   utf8.csv                           2    0.24ms
  29. ```

Install


Using npm:

  1. ```console
  2. $ npm install csv-parser
  3. ```

Using yarn:

  1. ```console
  2. $ yarn add csv-parser
  3. ```

Usage


To use the module, create a readable stream to a desired CSV file, instantiate
csv, and pipe the stream to csv.

Suppose you have a CSV file data.csv which contains the data:

  1. ```
  2. NAME,AGE
  3. Daffy Duck,24
  4. Bugs Bunny,22
  5. ```

It could then be parsed, and results shown like so:

  1. ``` js
  2. const csv = require('csv-parser')
  3. const fs = require('fs')
  4. const results = [];

  5. fs.createReadStream('data.csv')
  6.   .pipe(csv())
  7.   .on('data', (data) => results.push(data))
  8.   .on('end', () => {
  9.     console.log(results);
  10.     // [
  11.     //   { NAME: 'Daffy Duck', AGE: '24' },
  12.     //   { NAME: 'Bugs Bunny', AGE: '22' }
  13.     // ]
  14.   });
  15. ```

To specify options for csv, pass an object argument to the function. For
example:

  1. ``` js
  2. csv({ separator: '\t' });
  3. ```

API


csv([options | headers])


Returns: Array[Object]

options


Type: Object

As an alternative to passing an options object, you may pass an Array[String]
which specifies the headers to use. For example:

  1. ``` js
  2. csv(['Name', 'Age']);
  3. ```

If you need to specify options _and_ headers, please use the the object notation
with the headers property as shown below.

escape


Type: `String`
Default: "

A single-character string used to specify the character used to escape strings
in a CSV row.

headers


Type: Array[String] | Boolean

Specifies the headers to use. Headers define the property key for each value in
a CSV row. If no headers option is provided, csv-parser will use the first
line in a CSV file as the header specification.

If false, specifies that the first row in a data file does _not_ contain
headers, and instructs the parser to use the column index as the key for each column.
Using headers: false with the same data.csv example from above would yield:

  1. ``` js
  2. [
  3.   { '0': 'Daffy Duck', '1': 24 },
  4.   { '0': 'Bugs Bunny', '1': 22 }
  5. ]
  6. ```

_Note: If using the headers for an operation on a file which contains headers on the first line, specify skipLines: 1 to skip over the row, or the headers row will appear as normal row data. Alternatively, use the mapHeaders option to manipulate existing headers in that scenario._

mapHeaders


Type: Function

A function that can be used to modify the values of each header. Return a String to modify the header. Return null to remove the header, and it's column, from the results.

  1. ``` js
  2. csv({
  3.   mapHeaders: ({ header, index }) => header.toLowerCase()
  4. })
  5. ```

Parameters

**header** _String_ The current column header.
index _Number_ The current column index.

mapValues


Type: Function

A function that can be used to modify the content of each column. The return value will replace the current column content.

  1. ``` js
  2. csv({
  3.   mapValues: ({ header, index, value }) => value.toLowerCase()
  4. })
  5. ```

Parameters

**header** _String_ The current column header.
**index** _Number_ The current column index.
value _String_ The current column value (or content).

newline

Type: `String`
Default: \n

Specifies a single-character string to denote the end of a line in a CSV file.

quote


Type: `String`
Default: "

Specifies a single-character string to denote a quoted string.

raw


Type: `Boolean`

If true, instructs the parser not to decode UTF-8 strings.

separator


Type: `String`
Default: ,

Specifies a single-character string to use as the column separator for each row.

skipComments


Type: `Boolean | String`
Default: false

Instructs the parser to ignore lines which represent comments in a CSV file. Since there is no specification that dictates what a CSV comment looks like, comments should be considered non-standard. The "most common" character used to signify a comment in a CSV file is "#". If this option is set to true, lines which begin with # will be skipped. If a custom character is needed to denote a commented line, this option may be set to a string which represents the leading character(s) signifying a comment line.

skipLines


Type: `Number`
Default: 0

Specifies the number of lines at the beginning of a data file that the parser should
skip over, prior to parsing headers.

maxRowBytes


Type: `Number`
Default: Number.MAX_SAFE_INTEGER

Maximum number of bytes per row. An error is thrown if a line exeeds this value. The default value is on 8 peta byte.

strict


Type: `Boolean`
Default: false

If true, instructs the parser that the number of columns in each row must match
the number of headers specified or throws an exception.
if false: the headers are mapped to the column index
   less columns: any missing column in the middle will result in a wrong property mapping!
   more columns: the aditional columns will create a "_"+index properties - eg. "_10":"value"

Events


The following events are emitted during parsing:

data


Emitted for each row of data parsed with the notable exception of the header
row. Please see Usage for an example.

headers


Emitted after the header row is parsed. The first parameter of the event
callback is an Array[String] containing the header names.

  1. ``` js
  2. fs.createReadStream('data.csv')
  3.   .pipe(csv())
  4.   .on('headers', (headers) => {
  5.     console.log(`First header: ${headers[0]}`)
  6.   })
  7. ```

Readable Stream Events


Events available on Node built-in
are also emitted. The end event should be used to detect the end of parsing.

CLI


This module also provides a CLI which will convert CSV to
newline-delimited JSON. The following CLI flags can be
used to control how input is parsed:

  1. ```
  2. Usage: csv-parser [filename?] [options]

  3.   --escape,-e         Set the escape character (defaults to quote value)
  4.   --headers,-h        Explicitly specify csv headers as a comma separated list
  5.   --help              Show this help
  6.   --output,-o         Set output file. Defaults to stdout
  7.   --quote,-q          Set the quote character ('"' by default)
  8.   --remove            Remove columns from output by header name
  9.   --separator,-s      Set the separator character ("," by default)
  10.   --skipComments,-c   Skip CSV comments that begin with '#'. Set a value to change the comment character.
  11.   --skipLines,-l      Set the number of lines to skip to before parsing headers
  12.   --strict            Require column length match headers length
  13.   --version,-v        Print out the installed version
  14. ```

For example; to parse a TSV file:

  1. ```
  2. cat data.tsv | csv-parser -s $'\t'
  3. ```

Encoding


Users may encounter issues with the encoding of a CSV file. Transcoding the
source stream can be done neatly with a modules such as:
- [iconv-lite](https://www.npmjs.com/package/iconv-lite)
- [iconv](https://www.npmjs.com/package/iconv)

Or native [iconv](http://man7.org/linux/man-pages/man1/iconv.1.html) if part
of a pipeline.

Byte Order Marks


Some CSV files may be generated with, or contain a leading Byte Order Mark. This may cause issues parsing headers and/or data from your file. From Wikipedia:

>The Unicode Standard permits the BOM in UTF-8, but does not require nor recommend its use. Byte order has no meaning in UTF-8.

To use this module with a file containing a BOM, please use a module like strip-bom-stream in your pipeline:

  1. ``` js
  2. const fs = require('fs');

  3. const csv = require('csv-parser');
  4. const stripBom = require('strip-bom-stream');

  5. fs.createReadStream('data.csv')
  6.   .pipe(stripBom())
  7.   .pipe(csv())
  8.   ...
  9. ```

When using the CLI, the BOM can be removed by first running:

  1. ```console
  2. $ sed $'s/\xEF\xBB\xBF//g' data.csv
  3. ```

Meta