|
|
|
|
|
OCRExtractText (Function) In french: OCRExtraitTexte Reads the text contained in an image. MyImage is Image let MyString = OCRExtractText(MyImage) MyImage is Image r is Rectangle r.X=346 r.Y=2258 r.Width = (2158-346) r.Height = (2323-2258) let sString = OCRExtractText(MyImage, r) Trace(sString) p is Polygon p.Point[1].X = 346 p.Point[1].Y = 2258 p.Point[2].X = 2158 p.Point[2].Y = 2258 p.Point[3].X = 2158 p.Point[3].Y = 2323 p.Point[4].X = 346 p.Point[4].Y = 2323 let sString2 = OCRExtractText(MyImage, p) Trace(sString2) Syntax
<Result> = OCRExtractText(<Image to use> [, <Area to read>])
<Result>: Character string Text extracted from the image. <Image to use>: Control name, Image variable, character string Image in which the text areas must be detected. The image can correspond to: - an Image control,
- an Image variable,
- an Image Memo item,
- the path of an image file
- the path of PDF file.
<Area to read>: Optional Rectangle or Polygon variable - Name of the Rectangle variable that represents the area containing the text to be extracted.
- Name of the Polygon variable that represents the area containing the text to be extracted. In this case, the area read corresponds to the rectangle that contains the polygon.
By default, if this parameter is not specified, all the text in the image is extracted.
Remarks - The Legacy engine is used. Custom models (.traineddata files) must be compatible with this engine.
- For PDF files:
- if the <Area to read> parameter is not specified, OCRExtractText will extract the text from all pages of the specified PDF file.
- if the <Area to read> parameter is specified, the desired page must be extracted as an image using PDFExtractPage (even if the PDF file has only one page). This image can then be used with OCRExtractText.
- To get the best results possible, it is recommended to:
- Use a high-resolution image.
- Crop the image around the text if possible (avoid unnecessary areas).
- Limit text skew. If the image is slightly skewed, OCR may be able to detect the text, but the quality will be affected.
Skewed images can be read. - Limit the number of models/languages used.
- Note that, if the image used corresponds to an Image control, the source image will be directly manipulated. Therefore, the changes made in the Image control (image size for example) will not be taken into account. To apply these changes, it is necessary to save the image.
- Note that, if the image used (via an Image control or not) is a PDF file, its quality will be set to 300 DPI.
- OCR can only detect printed text. It cannot recognize handwritten text.
- "White" text is not recognized.
- If the image used corresponds to an Image control and the source image is smaller than the control, the <Area to read> parameter must be specified with the coordinates of the source image and not with the coordinates of the Image control. CoordinateImageControlToImage can be used to convert these coordinates.
Related Examples:
|
Unit examples (WINDEV): OCR functions
[ + ] This example shows how to use OCR functions in WINDEV.
|
Business / UI classification: Business Logic
This page is also available for…
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|