Improving performance and compatibility by using UTF-8 in SQL Server 2019 - Ronen Ariely

Опубликовано: 12 Октябрь 2024
на канале: PASS Data Community Summit

457

SQL Server started using Unicode with UCS-2 encoding (mapping range 0-65535, using 2 bytes). In time, support to UTF-16 was added (mapping additional range up to 1114111, constructed from two UCS-2 Code Points using 4 bytes). Traditionally Unicode data types like NVARCHAR/NCHAR are marked with "National Character" N.

SQL Server 2019 introduced the Supplementary Character "_UTF8" to fully support UTF-8 under "non-National Character" Data types like CHAR/VARCHAR. UTF-8 potentially can reduce the data size dramatically up to 50% in some cases, but might result with the opposite behavior in other cases. Migrating to the new feature requires an in-depth understanding on how SQL Server stores and uses the data in order to prevent unwanted implications.

In this meeting we cover the different string data types and Collations, supplementary characters and surrogate pairs, Code Pages, encoding (ASCII, Unicode, UCS-2, UTF-16, and UTF-8), Emoji, and more. In addition at the end of the meeting, we dive into SQL Server internals and examine the way SQL Server stores the data behind the scenes.