Skip to content

ottosmops/pdftotext

Repository files navigation

Extract text from a PDF with pdftotext

codecov Software License Latest Stable Version Packagist Downloads

This package provides a class to extract text from a pdf.

\Ottosmops\Pdftotext\Extract::getText('/path/to/file.pdf') //returns the text from the pdf

Requirements

The Package uses pdftotext. Make sure that this is installed: which pdftotext

For Installation see: poppler-utils

If the installed binary is not found ("The command "which pdftotext" failed.") you can pass the full path to the _constructor (see below) or use putenv('PATH=$PATH:/usr/local/bin/:/usr/bin') (with the dir where pdftotext lives) before you call the class Extract.

Installation

composer require ottosmops/pdftotext

Usage

Extracting text from a pdf:

$text = (new Extract())
  ->pdf('file.pdf')
  ->text();

Security note: If you pass user input as options or filenames to the library, make sure to validate or escape them to avoid shell injection. The library uses symfony/process, which provides basic protection, but unsafe options could still cause issues.

You can set the binary and you can specify options:

$text = (new Extract('/path/to/pdftotext'))
  ->pdf('path/to/file.pdf')
  ->options('-layout')
  ->text();

Default options are: -eol unix -enc UTF-8 -raw

License

The MIT License (MIT). Please see License File for more information.

About

extract text from pdf (a PHP wrapper for pdftotext)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages