ONLINE HELP
 WINDEVWEBDEV AND WINDEV MOBILE

Help / WLanguage / WLanguage functions / Standard functions / PDF functions
  • Converting PDF to text
  • Special cases
WINDEV
WindowsLinuxUniversal Windows 10 AppJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadIOS WidgetApple WatchMac CatalystUniversal Windows 10 App
Others
Stored procedures
Extracts text from a PDF file.
Example
MyString is string
MyString = PDFToText("C:\Temp\MyDocument.pdf")
// Display text in a multiline Edit control
EDT_EditMultiText = MyString
WINDEVWEBDEV - Server codeiPhone/iPad
MyPDF is pdfDocument = PDFOpen("test.pdf")
MyString = PDFToText(MyPDF, "1-2")
// Display text in a multiline Edit control
EDT_EditMultiText = MyString
Syntax

Extracting the content of a PDF using the file path Hide the details

<Result> = PDFToText(<PDF file> [, <Pages to extract> [, <Password> [, <Options>]]])
<Result>: Character string
Text of the PDF file.
<PDF file>: Character string
Name and path of the PDF file to be analyzed.
<Pages to extract>: Optional character string
Range of pages that the text will be extracted from. Pages are selected in a window with the same range boxes as a standard print window: use semicolons to separate individual pages or ranges. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.
If this parameter is not specified or is an empty string (""), all pages are extracted.
<Password>: Optional character string
Password required to open the file if the PDF file is password protected.
<Options>: Integer constant
Text splitting mode:
pttCompatibleSplit PDF text using the algorithm from versions 24 and earlier.
pttDefault
(Default value)
Split PDF text using an optimized algorithm. This splitting may be different from previous versions.
Android Syntax not available in Android

Extracting the content of a PDF document present in a pdfDocument variable Hide the details

<Result> = PDFToText(<PDF document> [, <Pages to extract>])
<Result>: Character string
Text of the PDF file.
<PDF document>: pdfDocument variable
Name of the pdfDocument variable to be used.
<Pages to extract>: Optional character string
Range of pages that the text will be extracted from. Pages are selected in a window with the same range boxes as a standard print window: use semicolons to separate individual pages or ranges. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.
If this parameter is not specified or is an empty string (""), all pages are extracted.
Remarks

Converting PDF to text

  • When converting a PDF to text, the document formatting is lost.
  • Text is extracted in the order in which the PDF commands appear and is written sequentially in the resulting string. Text blocks and paragraphs are preserved (as well as carriage returns).
  • Unicode characters are not returned.
  • Data from a PDF form is not extracted (this data is not stored in the PDF file).

Special cases

  • PDFIsProtected is used to know if a password is required to open a PDF file.
  • PDFNumberOfPages returns the total number of pages in a PDF file.
  • Android Starting with version 28, this function is supported by 32-bit ARM processors only if the pvtCompatible constant is used. New PDF features require a 64-bit execution mode.
    If an application is to be run on devices with 32-bit ARM processors, it must be generated with WINDEV Mobile 27.
Business / UI classification: Business Logic
Component: wd300wdpdf.dll
Minimum version required
  • Version 14
This page is also available for…
Comments
Click [Add] to post a comment

Last update: 04/22/2024

Send a report | Local help