PC SOFT

WINDEVWEBDEV AND WINDEV MOBILE
ONLINE HELP

  • Conversion from PDF to text
  • Special cases
WINDEV
WindowsLinuxUniversal Windows 10 AppJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadApple WatchUniversal Windows 10 AppWindows Mobile
Others
Stored procedures
PDFToText (Function)
In french: PDFVersTexte
Extracts the text found in a PDF file.
Versions 25 and later
WEBDEV - Server codeLinux This function is now available for WEBDEV sites in Linux.
WINDEVLinux This function is now available for WINDEV applications in Linux.
Android This function is now available for Android applications.
New in version 25
WEBDEV - Server codeLinux This function is now available for WEBDEV sites in Linux.
WINDEVLinux This function is now available for WINDEV applications in Linux.
Android This function is now available for Android applications.
WEBDEV - Server codeLinux This function is now available for WEBDEV sites in Linux.
WINDEVLinux This function is now available for WINDEV applications in Linux.
Android This function is now available for Android applications.
Example
MyString is string
MyString = PDFToText("C:\Temp\MyDocument.pdf")
// Display in a multiline edit control
EDT_EditMultiText = MyString
Syntax
<Result> = PDFToText(<PDF file> [, <Pages to extract> [, <Password> [, <Options>]]])
<Result>: Character string
Text of PDF file.
<PDF file>: Character string (with quotes)
Name and path of PDF file to analyze.
<Pages to extract>: Optional character string (with quotes)
Range of pages the text must be extracted form. The format used is identical to the one used in the standard printout boxes: individual page numbers of range of pages separated by semi-colons. For example, "1;3;4;6-10;12" means that pages 1, 3, 4, 6 to 10, and 12 will be processed.
If this parameter is not specified or if it corresponds to an empty string (""), all the pages are extracted.
<Password>: Optional character string (with quotes)
Password required to open the file if the PDF file is password protected.
<Options>: Integer constant
Versions 25 and later
Text splitting mode:
pttCompatibleSplit PDF text using the algorithm from versions 24 and earlier.
pttDefault
(Default value)
Split PDF text using an optimized algorithm. This splitting may be different from previous versions.
New in version 25
Text splitting mode:
pttCompatibleSplit PDF text using the algorithm from versions 24 and earlier.
pttDefault
(Default value)
Split PDF text using an optimized algorithm. This splitting may be different from previous versions.
Text splitting mode:
pttCompatibleSplit PDF text using the algorithm from versions 24 and earlier.
pttDefault
(Default value)
Split PDF text using an optimized algorithm. This splitting may be different from previous versions.
Remarks

Conversion from PDF to text

  • The formatting of the document is lost when the conversion is performed from PDF to text.
  • The text is extracted in the order or appearance of the PDF commands and it is sequentially written into the result string. The organization of the text in paragraphs and in blocks is kept (as well as the CR characters).
  • The Unicode characters are not returned.
  • The data found in a PDF form is not extracted (this data is not stored in the PDF file).

Special cases

  • PDFIsProtected is used to find out whether a password is required to open a PDF file.
  • PDFNumberOfPages returns the total number of pages found in a PDF file.
Business / UI classification : Business Logic
Component : wd250img.dll
Minimum version required
  • Version 14
This page is also available for…
Comments
Example
PROCEDURE Pdf_Extrair_Txt()

IF YesNo("Total de Número de págs no PDF é de: " + EDT_Pag_Final + ", deseja prosseguir?") THEN

x is int

sTexto is string

LOOP (EDT_Pag_Final)
x++
IF x <= EDT_Pag_Final
sTexto += PDFToText(EDT_Path,""+x+"") + CR
ELSE
BREAK
END
END

fSaveText("c:\temp\extraido.txt",sTexto)

ShellExecute("c:\temp\extraido.txt")

END
BOLLER
05 Mar. 2020