PC SOFT

ONLINE HELP
 WINDEVWEBDEV AND WINDEV MOBILE

Home | Sign in | English EN
WINDEV
WindowsLinuxUniversal Windows 10 AppJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadIOS WidgetApple WatchMac CatalystUniversal Windows 10 App
Others
Stored procedures
Reads the text contained in an image.
Example
MyImage is Image
let MyString = OCRExtractText(MyImage)
MyImage is Image
r is Rectangle
r.X=346
r.Y=2258
r.Width = (2158-346)
r.Height = (2323-2258)
let sString = OCRExtractText(MyImage, r)
Trace(sString)
p is Polygon
p.Point[1]..X = 346
p.Point[1]..Y = 2258
p.Point[2]..X = 2158
p.Point[2]..Y = 2258
p.Point[3]..X = 2158
p.Point[3]..Y = 2323
p.Point[4]..X = 346
p.Point[4]..Y = 2323
let sString2 = OCRExtractText(MyImage, p)
Trace(sString2)
Syntax
<Result> = OCRExtractText(<Image to use> [, <Area to read>])
<Result>: Character string
Text extracted from the image.
<Image to use>: Control name, Image variable, character string
Image in which the text areas must be detected. This image can correspond to:
  • an Image control,
  • an Image variable,
  • an Image Memo item,
  • the path of an image file
  • the path of PDF file.
<Area to read>: Optional Rectangle or Polygon variable
  • Name of the Rectangle variable that represents the area containing the text to be extracted.
  • Name of the Polygon variable that represents the area containing the text to be extracted. In this case, the area read corresponds to the rectangle that contains the polygon.
By default, if this parameter is not specified, all the text in the image is extracted.
Remarks
  • On Windows, Linux and Android, the OCR engine allows using Legacy or LSTM models from Tesseract.
  • For PDF files:
    • if the <Area to read> parameter is not specified, OCRExtractText will extract the text from all pages of the specified PDF file.
    • if the <Area to read> parameter is specified, the desired page must be extracted as an image using PDFExtractPage (even if the PDF file has only one page). This image can then be used with OCRExtractText.
  • To get the best results possible, it is recommended to:
    • Use a high-resolution image.
    • Crop the image around the text if possible (avoid unnecessary areas).
    • Limit text skew. If the image is slightly skewed, OCR may be able to detect the text, but the quality will be affected.
    • Limit the number of models/languages used.
  • Note that, if the image used corresponds to an Image control, the source image will be directly manipulated. Therefore, the changes made in the Image control (image size for example) will not be taken into account. To apply these changes, it is necessary to save the image.
  • Note that, if the image used (via an Image control or not) is a PDF file, its quality will be set to 300 DPI.
  • OCR can only detect printed text. It cannot recognize handwritten text.
  • If the image used corresponds to an Image control and the source image is smaller than the control, the <Area to read> parameter must be specified with the coordinates of the source image and not with the coordinates of the Image control. CoordinateImageControlToImage can be used to convert these coordinates.
Business / UI classification: Business Logic
Component: wd270ocr.dll
Minimum version required
  • Version 26
This page is also available for…
Comments
Click [Add] to post a comment