OCRExtractText (Function)

ONLINE HELP
WINDEV, WEBDEV AND WINDEV MOBILE

Version: Your version: XXF260056D

Home | Sign in | English

New WINDEV, WEBDEV and WINDEV Mobile 26 feature!

Help / WLanguage / WLanguage functions / Standard functions / OCR functions

WINDEV

WEBDEV

WINDEV Mobile

Others

See also

OCRExtractText (Function)

In french: OCRExtraitTexte

Reads the text contained in an image.

Example

MyImage is Image
let MyString = OCRExtractText(MyImage)

MyImage is Image
r is Rectangle
r.X=346
r.Y=2258
r.Width = (2158-346)
r.Height = (2323-2258)
let sString = OCRExtractText(MyImage, r)
Trace(sString)

p is Polygon
p.Point[1]..X = 346
p.Point[1]..Y = 2258
p.Point[2]..X = 2158
p.Point[2]..Y = 2258
p.Point[3]..X = 2158
p.Point[3]..Y = 2323
p.Point[4]..X = 346
p.Point[4]..Y = 2323
let sString2 = OCRExtractText(MyImage, p)
Trace(sString2)

Syntax

<Result> = OCRExtractText(<Image to use> [, <Area to read>])

<Result>: Character string

Text extracted from the image.

<Image to use>: Control name, Image variable, character string (with quotes)

Image in which the text areas must be detected. This image can correspond to:
an Image control,
an Image variable,
an Image Memo item,
the path of an image file
the path of PDF file.

<Area to read>: Optional Rectangle or Polygon variable

Name of the Rectangle variable that represents the area containing the text to be extracted.
Name of the Polygon variable that represents the area containing the text to be extracted. In this case, the area read corresponds to the rectangle that contains the polygon.
By default, if this parameter is not specified, all the text in the image is extracted.

Remarks

On Windows, Linux and Android, the OCR engine allows using Legacy or LSTM models from Tesseract.
For PDF files:
- if the <Area to read> parameter is not specified, OCRExtractText will extract the text from all pages of the specified PDF file.
- if the <Area to read> parameter is specified, the desired page must be extracted as an image using PDFExtractPage (even if the PDF file has only one page). This image can then be used with OCRExtractText.

To get the best results possible, it is recommended to:
- Use a high-resolution image.
- Crop the image around the text if possible (avoid unnecessary areas).
- Limit text skew. If the image is slightly skewed, OCR may be able to detect the text, but the quality will be affected.
  Skewed images can be read.
- Limit the number of models/languages used.
Note that, if the image used corresponds to an Image control, the source image will be directly manipulated. Therefore, the changes made in the Image control (image size for example) will not be taken into account. To apply these changes, it is necessary to save the image.
Note that, if the image used (via an Image control or not) is a PDF file, its quality will be set to 300 DPI.
OCR can only detect printed text. It cannot recognize handwritten text.

If the image used corresponds to an Image control and the source image is smaller than the control, the <Area to read> parameter must be specified with the coordinates of the source image and not with the coordinates of the Image control. CoordinateImageControlToImage can be used to convert these coordinates.

Business / UI classification: Business Logic

Component: wd260ocr.dll