Skip to content

pm-mic/pdf2table

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf2table

pdf2table is a node.js library that attempts to extract tables from a pdf.

The 'tables' are extracted as an array of rows.

It uses pdf2json to extract the pdf data.

Install

You can install pdf2table using the Node Package Manager (npm):

npm install pdf2table

Simple example

var pdf2table = require('pdf2table');
var fs = require('fs');

fs.readFile('./test.pdf', function (err, buffer) {
    if (err) return console.log(err);

    pdf2table.parse(buffer, function (err, rows, rowsdebug) {
        if(err) return console.log(err);

        console.log(rows);
    });
});

Getting raw table data

X-axis data is being stripped in the default mode, but may be necessary to reconstruct the table extracted from the pdf. This can be retrieved by passing 'true' as the raw argument, which is false by default.

pdf2table.parse(buffer, function, raw = false)

Note

Note that this is a simplistic implementation to extract tables. If your pdf contains other stuff that's not a table, pdf2table will still attempt to shape this data into a row. Feel free to improve and send pull requests.

About

pdf2table is a node.js library that attempts to extract tables from a pdf.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%