PC SOFT

ONLINE HELP
 WINDEVWEBDEV AND WINDEV MOBILE

Home | Sign in | English EN
  • Conversion rules
  • Supported tags
  • Managing the character set
  • Limitations
WINDEV
WindowsLinuxUniversal Windows 10 AppJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadApple WatchUniversal Windows 10 AppWindows Mobile
Others
Stored procedures
Converts an HTML string or an HTML buffer into text string. The following operations are performed during the conversion:
  • Deletion of HTML tags,
  • Conversion of HTML special characters,
  • Conversion of CR characters (Carriage Return) to spaces,
  • Conversion of multiple spaces to single spaces.
Versions 15 and later
PHP This function is now available for PHP sites.
Android This function is now available for Android applications.
New in version 15
PHP This function is now available for PHP sites.
Android This function is now available for Android applications.
PHP This function is now available for PHP sites.
Android This function is now available for Android applications.
Versions 18 and later
Android Widget This function is now available in Android Widget mode.
New in version 18
Android Widget This function is now available in Android Widget mode.
Android Widget This function is now available in Android Widget mode.
Versions 21 and later
iPhone/iPad This function is now available for iPhone/iPad applications.
Universal Windows 10 App This function is now available in Universal Windows 10 App mode.
New in version 21
iPhone/iPad This function is now available for iPhone/iPad applications.
Universal Windows 10 App This function is now available in Universal Windows 10 App mode.
iPhone/iPad This function is now available for iPhone/iPad applications.
Universal Windows 10 App This function is now available in Universal Windows 10 App mode.
Example
MyHTMLText is string
MyHTMLText = "<!--test-->&lt;b&gt;&lt;i&gt;&amp;quot;Hello!&amp;quot;&lt;/i&gt;&lt;/b&gt;"
Text is string = HTMLToText(MyHTMLText)
// Text is set to: "Hello"!
User code (UMC)
// If the HTML document is set to:
//<HTML>
// <HEAD>
//  <TITLE>This is a test for a Web page</TITLE>
//  <META http-equiv="content-type" content="text/html; charset=UTF-8">
// </HEAD>
//<BODY>
// <P>This is &nbsp;&nbsp;&nbsp;&nbsp; an HTML page in English</P>
// It contains 1 paragraph<BR /><DD>a tab<BR />and 3 line breaks
//  <BR /><A href="http://www.pcsoft.fr">This is a link</A>
// </BODY>
//</HTML>
 
Text = HTMLToText(MyHTMLText)
// Text will contain:
// This is        an HTML page   in English.
//
// It contains 1 paragraph
//   a tab
// and 3 line breaks
// This is a link
Syntax
<Result> = HTMLToText(<Text in HTML format> [, <Charset used>])
<Result>: Character string
Text corresponding to the result of the HTML conversion. The encoding used is the one of the current character set of WINDEV or WEBDEV.
<Text in HTML format>: Character string or buffer (with quotes)
Text to convert.
<Charset used>: Optional Integer constant
Constant identifying the character set used to write the <Text in HTML format>.
The current character set of WINDEV or WEBDEV is used by default (charsetCurrent constant).
If any information about the character set used is found in the <Text in HTML format>, this information has priority over this parameter.
For more details on these constants, see Correspondence between languages, sublanguages, character sets and nations.
AndroidAndroid Widget This parameter is not available
Remarks

Conversion rules

  • The HTML tags are analyzed to keep the best possible formatting in the output text (CR characters, spaces, tabs, etc.). The formatting is not kept: bold, italic, colors, ...
  • The following elements do not appear in the text output:
    • HTML tags
    • content of the "header" (information in the <HEAD> tag)
    • comments
    • the control texts
    • scripts
    • SSL definitions
    • CSS styles (except "color" attributes)
    • form elements
  • Management of CR characters
    • 2 Carriage Returns are inserted to replace the following tags: <P>, <H1> to <H6>, <TABLE>, <UL> or <OL>
    • 1 Carriage Return is inserted to replace the following tags: <BR>, <TR>, <LI>, <DD> or <DIV>
    • 1 single Carriage Return is inserted if several identical tags (<TR>, <LI>, <DD> or <DIV>) follow one another (except for <BR> tags)
  • Management of arrays
    • A CR character is inserted for each array row (<TR> tag).
    • A tab is inserted for each array column (<TD> tag).
  • Management of special characters
    A special character is a character defined in the HTML standard. For example, a space can be written as " ". This standard is automatically used.

Supported tags

The unsupported tags are ignored: their content is taken into account as text.
The supported tags are as follows:
  • <PRE>
  • <UL>: Line break + tab
  • <OL>: Line break + tab
  • <LI>: Tab
  • <H1>: Line break above and below
  • <H2>: Line break above and below
  • <H3>: Line break above and below
  • <H4>: Line break above and below
  • <H5>: Line break above and below
  • <H6>: Line break above and below
  • <P>: Line break above and below
  • <BR>: Line break
  • <DL>: Line break
  • <DT>: Line break
  • <DD>: Tab and line break
  • <TABLE>: Line break
  • <TR>: Line break
  • <TD>: Elements separated by a tab
  • <HEAD>: Content ignored, except for the parameters of the character set
  • <STYLE>: Content ignored
  • &lt;SCRIPT&gt: Content ignored
  • <!-- -->: Comments ignored

Managing the character set

To find out the character set used in the HTML text, HTMLToText uses the information found in the CONTENT attribute of a <META> tag.
If this tag is not found, the character set used to write the HTML text must be specified in <Charset used>.
If the HTML content uses an Arabic character set and WINDEV/WEBDEV uses a French character set by default, the output text will have invalid characters.
Remarks:
  • If the output text contains several question marks ("?"), it means that the characters of the character set used in the HTML document cannot be expressed with the characters of the current language.
  • The UTF-8 character set is commonly used to encode Web pages.
AndroidAndroid Widget

Limitations

The result produced by HTMLToText in Android may differ from the one produced in Windows. The mentioned conversion rules and the list of generated tags do not apply in Android.
Related Examples:
The HTMLTo functions Unit examples (WEBDEV): The HTMLTo functions
[ + ] This example explains how to use the HTMLToRTF and HTMLToText functions of WLanguage.
Switching from the RTF format to the HTML format Unit examples (WINDEV): Switching from the RTF format to the HTML format
[ + ] Using RTFToHTML and RTFToText.
WD Mail Complete examples (WINDEV): WD Mail
[ + ] This application is an email client developed in WINDEV. It is based on the Email objects.
This email client is used to retrieve and send emails by using the POP, IMAP and SMTP protocols.
You have the ability to apply filters to the incoming emails.

The application can also be used to manage several email accounts. The writing of an email is based on the HTML edit control.
HTML types (HTMLDocument, HTMLNode, HTMLAttribute) Unit examples (WINDEV): HTML types (HTMLDocument, HTMLNode, HTMLAttribute)
[ + ] This example shows how to use the HTMLXxx WLanguage types (HTMLDocument, HTMLNode, HTMLAttribute)
Component: wd260rtf.dll
Minimum version required
  • Version 12
This page is also available for…
Comments
Click [Add] to post a comment