pdf2table

pdf2table is a node.js library that attempts to extract tables from a pdf.

The 'tables' are extracted as an array of rows.

It uses pdf2json to extract the pdf data.

Install

You can install pdf2table using the Node Package Manager (npm):

npm install pdf2table

Simple example

var pdf2table = require('pdf2table');
var fs = require('fs');

fs.readFile('./test.pdf', function (err, buffer) {
    if (err) return console.log(err);

    pdf2table.parse(buffer, function (err, rows, rowsdebug) {
        if(err) return console.log(err);

        console.log(rows);
    });
});

Getting raw table data

X-axis data is being stripped in the default mode, but may be necessary to reconstruct the table extracted from the pdf. This can be retrieved by passing 'true' as the raw argument, which is false by default.

pdf2table.parse(buffer, function, raw = false)

Note

Note that this is a simplistic implementation to extract tables. If your pdf contains other stuff that's not a table, pdf2table will still attempt to shape this data into a row. Feel free to improve and send pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pdf2table

Install

Simple example

Getting raw table data

Note

About

Uh oh!

Releases

Packages

Languages

License

pm-mic/pdf2table

Folders and files

Latest commit

History

Repository files navigation

pdf2table

Install

Simple example

Getting raw table data

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages