ONLINE HELP
 WINDEVWEBDEV AND WINDEV MOBILE

Help / WLanguage / WLanguage functions / Standard functions / OCR functions
WINDEV
WindowsLinuxUniversal Windows 10 AppJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadIOS WidgetApple WatchMac CatalystUniversal Windows 10 App
Others
Stored procedures
Reads the text contained in an image.
Example
MyImage is Image
let MyString = OCRExtractText(MyImage)
MyImage is Image
r is Rectangle
r.X=346
r.Y=2258
r.Width = (2158-346)
r.Height = (2323-2258)
let sString = OCRExtractText(MyImage, r)
Trace(sString)
p is Polygon
p.Point[1].X = 346
p.Point[1].Y = 2258
p.Point[2].X = 2158
p.Point[2].Y = 2258
p.Point[3].X = 2158
p.Point[3].Y = 2323
p.Point[4].X = 346
p.Point[4].Y = 2323
let sString2 = OCRExtractText(MyImage, p)
Trace(sString2)
Syntax
<Result> = OCRExtractText(<Image to use> [, <Area to read>])
<Result>: Character string
Text extracted from the image.
<Image to use>: Control name, Image variable, character string
Image in which the text areas must be detected. The image can correspond to:
  • an Image control,
  • an Image variable,
  • an Image Memo item,
  • the path of an image file
  • the path of PDF file.
<Area to read>: Optional Rectangle or Polygon variable
  • Name of the Rectangle variable that represents the area containing the text to be extracted.
  • Name of the Polygon variable that represents the area containing the text to be extracted. In this case, the area read corresponds to the rectangle that contains the polygon.
By default, if this parameter is not specified, all the text in the image is extracted.
Remarks
  • AndroidAndroid Widget The Legacy engine is used. Custom models (.traineddata files) must be compatible with this engine.
  • WindowsLinux Legacy and LSTM engines can be used in WINDEV applications (Windows and Linux). LSTM models are provided by default.
  • For PDF files:
    • if the <Area to read> parameter is not specified, OCRExtractText will extract the text from all pages of the specified PDF file.
    • if the <Area to read> parameter is specified, the desired page must be extracted as an image using PDFExtractPage (even if the PDF file has only one page). This image can then be used with OCRExtractText.
  • To get the best results possible, it is recommended to:
    • Use a high-resolution image.
    • Crop the image around the text if possible (avoid unnecessary areas).
    • Limit text skew. If the image is slightly skewed, OCR may be able to detect the text, but the quality will be affected.
      iPhone/iPad Skewed images can be read.
    • Limit the number of models/languages used.
  • Note that, if the image used corresponds to an Image control, the source image will be directly manipulated. Therefore, the changes made in the Image control (image size for example) will not be taken into account. To apply these changes, it is necessary to save the image.
  • Note that, if the image used (via an Image control or not) is a PDF file, its quality will be set to 300 DPI.
  • OCR can only detect printed text. It cannot recognize handwritten text.
  • "White" text is not recognized.
  • If the image used corresponds to an Image control and the source image is smaller than the control, the <Area to read> parameter must be specified with the coordinates of the source image and not with the coordinates of the Image control. CoordinateImageControlToImage can be used to convert these coordinates.
Related Examples:
OCR functions Unit examples (WINDEV): OCR functions
[ + ] This example shows how to use OCR functions in WINDEV.
Business / UI classification: Business Logic
Component: wd290ocr.dll
Minimum version required
  • Version 26
This page is also available for…
Comments
Click [Add] to post a comment

Last update: 07/04/2023

Send a report | Local help