Message Validation/Transformation Assertions

HTTP PUT and POST requests, as well as most HTTP responses, typically include a Content-Type header that declares the kind of content being returned. For text documents like XML and HTML, the Content-Type header can include an additional "encoding" parameter declaring how the characters in the content were encoded into bytes for transfer. For example, the most common Content-Type for XML documents is:
gateway83
The Message Validation/Transformation assertions configure the XML transformations and validation schemas applied to service messages.
(1) Depending on which
API Gateway
product you have installed, not all the assertions within this category may be available. See Product Summary for a list of which features are available for each product. (2) This category may also include custom-created encapsulated assertions. For more information, see Encapsulated Assertions.
About Character Encoding
HTTP PUT and POST requests, as well as most HTTP responses, typically include a Content-Type header that declares the kind of content being returned. For text documents like XML and HTML, the Content-Type header can include an additional "encoding" parameter declaring how the characters in the content were encoded into bytes for transfer. For example, the most common Content-Type for XML documents is:
Content-Type: text/xml; charset="utf-8"
Web servers often infer the Content-Type for static files based on the file extension; some may even read the first few bytes of the file to make a more informed deduction. Occasionally, systems will send HTTP requests or responses with a Content-Type header that doesn't match the contents, either because the system is unable to extrapolate the actual type of the content, or because it has guessed incorrectly.
The Evaluate Regular Expression assertion works with characters rather than bytes, so it needs to decode the content before it can evaluate a regular expression against it. In order to decode the content, this assertion needs to know the encoding scheme that was used originally.
If the Content-Type header is missing or has no "charset" parameter, the Gateway will assume the content was encoded with ISO8859-1 (as per the RFC2616 HyperText Transfer Protocol). For content that only contains 7-bit characters (i.e., code points between U+0000 and U+007F), both UTF-8 and ISO8859-1 will encode identical bytes, so this class of error will not cause problems. However, other encodings, such as UTF-16, will still have issues.
UTF-8 can encode any Unicode character, including those used in the vast majority of the world's languages, whereas ISO8859-1 is restricted to a small subset of characters, primarily ones that are relevant to Western European languages. There are many other non-Unicode character sets, each designed for use in different locales, but ISO8859-1 is the most common in North America and is the default for Microsoft Windows.
 
The following are examples of characters that cannot be encoded using 7 bits (ISO8859-1 encodes them using bytes with numeric values > 127, whereas UTF-8 encodes them using multiple bytes):
  •   "smart quotes" (also known as curly quotes)
  •   en and em dashes (not dashes or hyphens)
  •   copyright © and trademark ® ™ symbols
  •   accented characters
  •   currency symbols other than $
Summary
If the assumed or declared encoding is ISO8859-1, the Evaluate Regular Expression assertion will never fail due to a character conversion error, because any byte can be decoded into a valid ISO8859-1 character. However, if the content is assumed or declared to be ISO8859-1 but the content was actually encoded with UTF-8 and contains non-7-bit characters, the document may be silently corrupted.
In this case, enter "UTF-8" in the Override character encoding field to correctly decode the content.
On the other hand, if the content is assumed or declared to have been encoded with UTF-8, but the content actually contains 8-bit ISO8859-1 characters, the Gateway will likely throw an exception during the decoding process and the Evaluate Regular Expression assertion will fail, since UTF-8 has a prescribed syntax for non-7-bit characters that few ISO8859-1 sequences will match accidentally.  
In this case, enter "ISO8859-1" in the Override character encoding field to correctly decode the content.