ONLINE HELP
 WINDEVWEBDEV AND WINDEV MOBILE

This content has been translated automatically.  Click here  to view the French version.
Help / WLanguage / WLanguage functions / Standard functions / String functions
  • Conversion rules
  • Supported tags
  • Managing the character set
WINDEV
WindowsLinuxJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadIOS WidgetApple WatchMac Catalyst
Others
Stored procedures
Converts an HTML string or an HTML buffer to an RTF string. The following operations are performed during the conversion:
  • Deletion of HTML tags
  • Conversion of HTML special characters
  • Conversion of CR characters (Carriage Return) to spaces
  • Conversion of multiple spaces to single spaces
The formatting is kept "as best as possible".
Example
MonTexteHTML is string = "<!--test--><b><i>&quote;Bonjour !&quote;</i>"
Texte is string = HTMLToRTF(MonTexteHTML)
// Texte vaut : "Bonjour" !
Reports and Queries
// Si le document HTML vaut:
//<HTML>
// <HEAD>
//  <TITLE>Ceci est un essai de page Web</TITLE>
//  <META http-equiv="content-type" content="text/html; charset=UTF-8">
// </HEAD>
//<BODY>
// <H2>Ceci est      une page HTML   en Français</H2>
//  <A href="http://www.pcsoft.fr">Ceci est un lien</A>
// </BODY>
//</HTML>

Texte = HTMLToRTF(MonTexteHTML)
// Texte contiendra le code RTF correspondant au texte suivant : 
// Ceci est     une page HTML   en Français.
//
// Ceci est un lien
Syntax
<Result> = HTMLToRTF(<Text in HTML format> [, <Charset used>])
<Result>: Character string
RTF text corresponding to the result of the HTML conversion. The encoding used is the one of the current character set of WINDEV or WEBDEV.
<Text in HTML format>: String or buffer
Text to convert.
<Charset used>: Optional Integer constant
Constant identifying the character set used to write the <Text in HTML format>. The current character set of WINDEV or WEBDEV is used by default (charsetCurrent constant). If any information about the character set used is found in the <Text in HTML format>, this information has priority over this parameter.
For more details on these constants, see Correspondence between languages, sub-languages, character sets and nations.
Remarks

Conversion rules

  • The HTML tags are analyzed to keep the best possible formatting in the output text (CR characters, spaces, tabs, etc.). Formatting is preserved as far as possible: bold, italics, colors, etc.
  • Do not appear in RTF output:
    • HTML tags
    • content of the "header" (information in the <HEAD> tag)
    • comments
    • control texts
    • scripts
    • SSL definitions
    • CSS styles (except "color" attributes)
  • Management of CR characters
    • 2 Carriage Returns are inserted to replace the following tags: <P>, <H1> to <H6>, <TABLE>, <UL> or <OL>
    • 1 Carriage Return is inserted to replace the following tags: <BR>, <TR>, <LI>, <DD> or <DIV>
    • 1 single Carriage Return is inserted if several identical tags (<TR>, <LI>, <DD> or <DIV>) follow one another (except for <BR> tags)
  • Management of arrays
    • A CR character is inserted for each array row (<TR> tag).
    • A tab is inserted for each array column (<TD> tag).
  • Management of special characters
    A special character is a character defined in the HTML standard. For example, a space can be written as " ". This standard is automatically used.

Supported tags

Unmanaged tags are ignored: their content is treated as text.
The supported tags are as follows:
  • <PRE>
  • <UL>: Line feed + Tab
  • <OL>: Line feed + Tab
  • <LI>: Tabulation
  • <H1>: Line feed before and line feed after, bold and font size applied.
  • <H2>: Line feed before and line feed after, bold and font size applied.
  • <H3>: Line feed before and line feed after, bold and font size applied.
  • <H4>: Line feed before and line feed after, bold and font size applied.
  • <H5>: Line feed before and line feed after, bold and font size applied.
  • <H6>: Line feed before and line feed after, bold and font size applied.
  • <P>: Line feed before and line feed after
  • <BR>: Line jump
  • <B>: Bold
  • <STRONG>: Bold
  • <I>: Italic
  • <EM>: Italic
  • <FONT>: Size and color
  • <A HREF>: Hypertext link
  • <SPAN>: Style: Color
  • <DL>: Line jump
  • <DT>: Line jump
  • <DD>: Tabulation and line feeds
  • <TABLE>: Line jump
  • <TR>: Line jump
  • <TD>: Tab-separated elements
  • <HEAD>: Content ignored, except for character set parameters
  • <STYLE> Content ignored
  • <SCRIPT>: Content ignored
  • <!-- --> Comments ignored

Managing the character set

To identify the character set used in the HTML text, HTMLToRTF uses the information in the CONTENT attribute of a <META> tag.
If this tag is not found, the character set used to write the HTML text must be specified in <Charset used>.
If the HTML content uses an Arabic character set and WINDEV/WEBDEV uses a French character set by default, the output text will have invalid characters.
Remarks:
  • If the output text contains several question marks ("?"), it means that the characters of the character set used in the HTML document cannot be expressed with the characters of the current language.
  • The UTF-8 character set is commonly used to encode Web pages.
Related Examples:
The HTMLTo functions Unit examples (WEBDEV): The HTMLTo functions
[ + ] This example explains how to use the HTMLToRTF and HTMLToText functions of WLanguage.
Component: wd300rtf.dll
Minimum version required
  • Version 12
This page is also available for…
Comments
Click [Add] to post a comment

Last update: 03/27/2025

Send a report | Local help