PC SOFT

WINDEVWEBDEV AND WINDEV MOBILE
ONLINE HELP

  • Conversion rules
  • Supported tags
  • Managing the character set
WINDEV
WindowsLinuxUniversal Windows 10 AppJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadApple WatchUniversal Windows 10 AppWindows Mobile
Others
Stored procedures
Converts an HTML string or an HTML buffer into a string in RTF format. The following operations are performed during the conversion:
  • The HTML tags are deleted,
  • The special HTML characters are converted,
  • The CR characters (Carriage Return) are converted into space characters,
  • The multiple spaces are converted into unique spaces.
The formatting is kept "as best as possible".
Versions 21 and later
iPhone/iPad This function is now available for the iPhone/iPad applications.
Universal Windows 10 App This function is now available in Universal Windows 10 App mode.
New in version 21
iPhone/iPad This function is now available for the iPhone/iPad applications.
Universal Windows 10 App This function is now available in Universal Windows 10 App mode.
iPhone/iPad This function is now available for the iPhone/iPad applications.
Universal Windows 10 App This function is now available in Universal Windows 10 App mode.
Example
MyHTMLText is string = "<!--test-->&quote;Hello!&quote;"
Text is string = HTMLToRTF(MyHTMLText)
// Text is set to: "Hello"!
Syntax
<Result> = HTMLToRTF(<Text in HTML Format> [, <Charset Used>])
<Result>: Character string
RTF text corresponding to the result of the HTML conversion. The encoding used is the one of the current character set of WINDEV or WEBDEV.
<Text in HTML Format>: Character string or buffer (with quotes)
Text to convert.
<Charset Used>: Optional Integer constant
Constant identifying the character set used to write the <Text in HTML Format>. The current character set of WINDEV or WEBDEV is used by default (charsetCurrent constant). If information about the character set used is found in the <Text in HTML Format>, this information has priority over this parameter.
See Correspondence between languages, sub-languages, character sets and nations for more details.
Remarks

Conversion rules

  • The HTML tags are analyzed in order to keep the best possible formatting in the output text (CR characters, space characters, tabulations). The formatting is kept as best as possible: bold, italic, colors, ...
  • Do not appear in the RTF output:
    • the HTML tags
    • the content of the "header" (information found in the <HEAD> tag)
    • the comments
    • the control texts
    • the scripts
    • the SSL definitions
    • the CSS styles (except the "color" attributes)
  • Management of CR characters
    • 2 CR characters are inserted to replace the following tags: <P>, <H1> to <H6>, <TABLE>, <UL> or <OL>
    • 1 CR character is inserted to replace the following tags: <BR>, <TR>, <LI>, <DD> or <DIV>
    • 1 single CR character is inserted if several identical tags (<TR>, <LI>, <DD> or <DIV>) are found one after another (except for <BR> tags)
  • Management of arrays
    • A CR character is inserted for each array row (<TR> tag).
    • A tabulation is inserted for each array column (<TD> tag).
  • Management of special characters
    A special character is a character defined in the HTML standard. For example, a space character can be written as " ". This standard is automatically used.

Supported tags

The unsupported tags are ignored: their content is taken into account as text.
The supported tags are as follows:
  • <PRE>
  • <UL>: Line break + tabulation
  • <OL>: Line break + tabulation
  • <LI>: Tabulation
  • <H1>: Line break before and line break after, bold and size of the font applied
  • <H2>: Line break before and line break after, bold and size of the font applied
  • <H3>: Line break before and line break after, bold and size of the font applied
  • <H4>: Line break before and line break after, bold and size of the font applied
  • <H5>: Line break before and line break after, bold and size of the font applied
  • <H6>: Line break before and line break after, bold and size of the font applied
  • <P>: Line break before and line break after
  • <BR>: Line break
  • <B>: Bold
  • <STRONG>: Bold
  • <I>: Italics
  • <EM>: Italics
  • <FONT>: Size and color
  • <A HREF>: Hypertext link
  • <SPAN>: Style: Color
  • <DL>: Line break
  • <DT>: Line break
  • <DD>: Tabulation and line break
  • <TABLE>: Line break
  • <TR>: Line break
  • <TD>: Elements separated by a tabulation
  • <HEAD>: Content ignored, except for the parameters of the character set
  • <STYLE>: Content ignored
  • &lt;SCRIPT&gt: Content ignored
  • <!-- -->: Comments ignored

Managing the character set

To find out the character set used in the HTML text,
HTMLToRTF is using the information found in the CONTENT attribute of a <META> tag.
If this tag is not found, the character set used to write the HTML text must be specified in <Charset Used>.
Indeed, if the HTML content uses an Arabic character set while WINDEV/WEBDEV use a French character set by default, invalid characters will be found in the output text.
Notes:
  • If the output text contains several "?" characters, it means that the character of the character set used in the HTML document cannot be expressed with a character of the current language.
  • The UTF8 character set is commonly used to encode the Web pages.
Related Examples:
The HTMLTo functions Unit examples (WEBDEV): The HTMLTo functions
[ + ] This example explains how to use the HTMLToRTF and HTMLToText functions of WLanguage.
Component : wd250rtf.dll
Minimum version required
  • Version 12
This page is also available for…
Comments
Click [Add] to post a comment