
Producer: Acrobat Distiller 9.2.0 (Windows)
PDFINFO APACHE PDF
An example of data returned by running it on a PDF document: Title: test1.pdf One of those files is pdfinfo (or pdfinfo.exe for Windows).
PDFINFO APACHE DOWNLOAD
You download a compressed file containing several little PDF-related programs. It is downloadable for Linux and Windows. So, what does work reliable and accurate?Ī simple command line executable called: pdfinfo. /\/N\s+(\d+)/ (looks for /N ) doesn't work either, as the documents can contain multiple values of /N most, if not all, not containing the pagecount./\/Page\W*(\d+)/ (looks for /Page) doesn't get the number of pages, mostly contains some other data./\/Count\s+(\d+)/ (looks for /Count ) doesn't work because only a few documents have the parameter /Count inside, so most of the time it doesn't return anything.If(preg_match_all( $regex, $content, $matches)) Regular Expressions found by Googling (all linked to SO answers): $regex = "/\/Count\s+(\d+)/" $content = fread ( $stream, filesize( $f)) This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar.

Opening a stream and search with a regular expression: It then returns an error:įPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.


PDFINFO APACHE INSTALL
Using FPDI (a PHP library)įPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. That was with both the getNumberImages() and identifyImage() methods. Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1 page in every document (haven't seen a working copy of Imagick so far), so I threw it away. Here are some of the answers I found insufficient or simply NOT working: Using Imagick (a PHP extension) PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. The solution is the accepted answer below. This is clearly not the case in general for all versions of pdfinfo, since it totally works fine on wikipedia and friends.Įrr, the flow post had the actual useful information.This question is for referencing and comparing. Using which pdfinfo and what version of it (There are multiple forks of the pdfinfo utility). and " -l 999999" or at least parameter "meta" (losing some metadata").Ĭould be solved by refactoring different information requests in separate function calls with differnt parameters! It helps to comment out the two lines in this function with "-meta". meta! Image size will be registered in database with 0 x 0 pixel and thumbnail/picture of pdf page won't be rendered.

Pdfinfo has changed its behavior, so trying to read image size in function retrieveMetaData() (in file ) you don't get expected result while using parameter.
