PDFToText (Function)

ONLINE HELP
WINDEV, WEBDEV AND WINDEV MOBILE

Version:

Home | Sign in | English

This content has been translated automatically. Click here to view the French version.

Help / WLanguage / WLanguage functions / Standard functions / PDF functions

Converting PDF to text
Special cases

WINDEV

WEBDEV

WINDEV Mobile

Others

See also

PDFToText (Function)

In french: PDFVersTexte

Extracts text from a PDF file.

New in version 2025

The syntax for manipulating pdfDocument variables is now available.

Example

MaChaîne is string
MaChaîne = PDFToText("C:\Temp\MonDocument.pdf")
// Affichage dans un champ de saisie multiligne
SAI_SaisieTexteMulti = MaChaîne

MonPDF is pdfDocument = PDFOpen("test.pdf")
MaChaîne = PDFToText(MonPDF, "1-2")
// Affichage dans un champ de saisie multiligne
SAI_SaisieTexteMulti = MaChaîne

Syntax

Extracting the content of a PDF using the file path Hide the details

<Result> = PDFToText(<PDF file> [, <Pages to extract> [, <Password> [, <Options>]]])

<Result>: Character string

Text of the PDF file.

<PDF file>: Character string

Name and path of the PDF file to be analyzed.

<Pages to extract>: Optional character string

Range of pages that the text will be extracted from. The format used is identical to that used in standard print boxes: individual page numbers or page ranges separated by semicolons.. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.
If this parameter is not specified or is an empty string (""), all pages are extracted.

<Password>: Optional string or Secret string

Password required to open the file if the PDF file is password protected.
New in version 2025
Secret strings: If you use the secret string vault, the type of secret string used for this parameter must be "ANSI or Unicode string".
To learn more about secret strings and how to use the vault, see Secret string vault.

<Options>: Integer constant

Text splitting mode:
pttCompatible Split PDF text using the algorithm from versions 24 and earlier.
pttDefault
(Default value) Split PDF text using an optimized algorithm. This splitting may be different from previous versions.

Extracting the content of a PDF document present in a pdfDocument variable Hide the details

<Result> = PDFToText(<PDF document> [, <Pages to extract>])

<Result>: Character string

Text of the PDF file.

<PDF document>: pdfDocument variable

Name of the pdfDocument variable to be used.

<Pages to extract>: Optional character string

Range of pages that the text will be extracted from. The format used is identical to that used in standard print boxes: individual page numbers or page ranges separated by semicolons.. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.
If this parameter is not specified or is an empty string (""), all pages are extracted.

Remarks

Converting PDF to text

When converting a PDF to text, the document formatting is lost.
Text is extracted in the order in which the PDF commands appear and is written sequentially in the resulting string. Text blocks and paragraphs are preserved (as well as carriage returns).
Unicode characters are not returned.
Data from a PDF form is not extracted (this data is not stored in the PDF file).

Special cases

PDFIsProtected is used to know if a password is required to open a PDF file.
PDFNumberOfPages returns the total number of pages in a PDF file.
Starting with version 28, this function is supported by 32-bit ARM processors only if the pvtCompatible constant is used. New PDF features require a 64-bit execution mode.
If an application is to be run on devices with 32-bit ARM processors, it must be generated with WINDEV Mobile 27.

Business / UI classification: Business Logic

Component: wd300wdpdf.dll