|
|
|
|
- Remark on the syntax "Searching for substrings between two given separators"
- ExtractStringBetween and UNICODE
ExtractStringBetween (Function) In french: ExtraitChaineEntre Allows you to: - extract a substring between two given separators from a character string.
- search for substrings between two given separators in a character string.
For example, this function allows you to extract data between two tags in HTML, XML or JSON. Remarks: - Searching for substrings takes less time than extracting substrings.
- You can use arrays of separators. This allows you to use several pairs of separators at the same time.
New in version 28Country is string = [ <country>France</country><country>Italy</country> <country>Germany</country><country>Spain</country> ] ExtractStringBetween(Country, 1, "<country>", "</country>") // Returns "France" ExtractStringBetween(Country, 2, "<country>", "</country>") // Returns "Italy" ExtractStringBetween(Country, 3, "<country>", "</country>") // Returns "Germany" ExtractStringBetween(Country, 4, "<country>", "</country>") // Returns "Spain" ExtractStringBetween(Country, 5, "<country>", "</country>") // Returns EOT
MyString is string = [ <red fruit>Strawberry</red fruit> <red fruit>Raspberry</red fruit> <exotic fruit>Cacao</exotic fruit> <exotic fruit>Banana</exotic fruit> ] ExtractStringBetween(MyString, 1, ["<red fruit>" , "<exotic fruit>"], ... ["</red fruit>" , "</exotic fruit>"]) // Returns "Strawberry" ExtractStringBetween(MyString, 2, ["<red fruit>" , "<exotic fruit>"], ... ["</red fruit>" , "</exotic fruit>"]) // Returns "Raspberry" ExtractStringBetween(MyString, 3, ["<red fruit>" , "<exotic fruit>"], ... ["</red fruit>" , "</exotic fruit>"]) // Returns "Cacao" ExtractStringBetween(MyString, 4, ["<red fruit>" , "<exotic fruit>"], ... ["</red fruit>" , "</exotic fruit>"]) // Returns "Banana"
// Search for all the substrings Country is string = [ <country>France</country><country>Italy</country> <country>Germany</country><country>Spain</country> ] SubString is string = ExtractStringBetween(Country, firstRank, "<country>", "</country>") WHILE SubString <> EOT Trace(SubString) // Returns "France", "Italy", "Germany", "Spain" SubString = ExtractStringBetween(Country, nextRank, "<country>", "</country>") END
// Search for all the substrings // The separators are in arrays sString is string = [ <red fruit>Strawberry</red fruit> <red fruit>Raspberry</red fruit> <exotic fruit>Cacao</exotic fruit> <exotic fruit>Banana</exotic fruit> ] sResult is string = ExtractStringBetween(sString, firstRank, ["<red fruit>" , ... "<exotic fruit>"], ["</red fruit>" , "</exotic fruit>"]) WHILE sResult <> EOT Trace(sResult) sResult = ExtractStringBetween(sString, nextRank, ["<red fruit>" , "<exotic fruit>"], ... ["</red fruit>" , "</exotic fruit>"]) END
Syntax
Extracting a substring between two given separators from a string Hide the details
<Result> = ExtractStringBetween(<Initial string> , <Index> , <Start separator> [, <End separator> [, <Options>]])
<Result>: Character string Corresponds to:- The substring between <Start separator> at <Index> and <End separator> if the FromEnd constant is not specified.
- The substring between <Start separator> at <Index> from the end of the string and the next end separator if the FromEnd constant is specified.
- The EOT constant in one of the following cases:
- if <Index> is greater than the number of start separators followed by end separators in the string,
- if all separators are empty strings ("").
<Initial string>: Character string Character string (up to 2 GB) containing the string to extract. <Index>: Integer Position of start separator followed by an end separator. Remark: a start separator is not taken into account if there is no end separator between it and the previous start separator, unless the following conditions are met: - <Index> is set to 1,
- the FromEnd option is not specified.
<Start separator>: String or Array of strings This parameter can correspond to:- The string that delimits the beginning of substrings. This string is not included in the result.
- An array of strings. The different strings in the array allow delimiting the beginning of substrings. The separators are not included in the result.
<End separator>: Optional character string or optional array of strings This parameter can correspond to:- The string that delimits the end of substrings. This string is not included in the result.
- An array of strings. The different strings in the array delimit the substrings. The separators are not included in the result.
If this parameter is not specified, the end separator will be identical to <Start separator>. <Options>: Optional Integer constant Search direction and characteristics: | | FromBeginning (Default value) | Search performed from the first character of the string to the last one. | FromEnd | Search performed from the last character of the string to the first one. | IgnoreCase | Case and accent insensitive search (ignores uppercase/lowercase differences). | WholeWord | Searches for a whole word (between punctuation characters or spaces). |
Searching for substrings between two given separators in a character string Hide the details
<Result> = ExtractStringBetween(<Initial string> , <Search options> , <Start separator> [, <End separator> [, <Options>]])
<Result>: Character string Corresponds to:- the next or previous substring according to the specified search direction. <Result> contains no separator.
- the EOT constant at the end of the search.
<Initial string>: Character string Character string (up to 2 GB) containing the string to extract. <Search options>: Integer constant Search direction: | | firstRank | Starts searching for substrings separated by the specified separators from the beginning of the string. | lastRank | Starts searching for substrings separated by the specified separators from the end of the string. | nextRank | Continues the search started with firstRank | previousRank | Continues the search started with lastRank |
<Start separator>: String or Array of strings This parameter can correspond to:- The string that delimits the beginning of substrings. This string is not included in the result.
- An array of strings. The different strings in the array allow delimiting the beginning of substrings. The separators are not included in the result.
<End separator>: Optional character string or optional array of strings This parameter can correspond to:- The string that delimits the end of substrings. This string is not included in the result.
- An array of strings. The different strings in the array delimit the substrings. The separators are not included in the result.
If this parameter is not specified, the end separator will be identical to <Start separator>. <Options>: Optional Integer constant Search characteristics: | | IgnoreCase | Case and accent insensitive search (ignores uppercase/lowercase differences). | WholeWord | Searches for a whole word (between punctuation characters or spaces). |
Remarks Remark on the syntax "Searching for substrings between two given separators" - This type of search can only be used on constant strings. Therefore, an element of the project (variable, control, item, etc.) must be used as initial string.
- When a search is started with the firstRank or lastRank constants, the search information is stored in memory until all the substrings have been examined. Therefore, this type of search should be used only when all the substrings are to be examined.
ExtractStringBetween and UNICODE <Initial string>, <Start separator> and <End separator> can correspond to: - ANSI strings.
- UNICODE strings.
- buffers.
You have the ability to use ANSI strings, Unicode strings and buffers in the different parameters of the function. The following conversion rule is used for the Ansi systems (Windows or Linux): - If at least one of the strings is a buffer, all the strings are converted to buffers and the operation is performed with buffers.
- If the first condition is not met and there is at least one Unicode string, all the strings are converted to Unicode and the operation is performed in Unicode (the conversion is performed with the current character set, if necessary).
- Otherwise, the operation is performed in Ansi.
The conversion rule used for Unicode systems is as follows: - If at least one of the strings is a buffer, all the strings are converted to buffers and the operation is performed with buffers.
- Otherwise, the operation is performed in Unicode.
Reminder: The linguistic parameters used are defined during the call to ChangeCharset.
This page is also available for…
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|