QR-Codes and UTF-8 encoding
This article explains issues relating to the use of UTF-8 encoded data in QR-Codes.
There are several ways in which data can be stored in a QR-Code, one of which is called Byte Mode. This mode allows data to be encoded in a sequence of 8-bit byte values, in other words an array of integer values ranging from 0 to 255. The default interpretation of these values is from the ISO/IEC 8859-1 symbol set. In this symbol set each value from 0 to 255 equates to a specific symbol and the number of symbols available is exactly 256. To extend the range of symbols available it is possible to use alternative character sets such as UTF-8, however there are some issues that developers should be aware of….
Auto-detection of UTF-8
A barcode reader does not automatically know that data has been encoded using UTF-8 unless an ECI has been used (see below). But it can make a guess…
A test can be applied to a sequence of byte values to see whether or not it can be interpreted as UTF-8 data, however this does not mean that the sequence isn’t also valid ISO/IEC 8859-1 data. For example, the character æ can is represented by the UTF-8 byte sequence C3 A6 (hex values). But C3 A6 in ISO/IEC 8859-1 represents the characters Ã and |. So this test relies on the fact that it’s unlikely to see the combination of Ã| in text.
In practice this test works well but developers should be aware of the possibility of incorrect interpretation of valid ISO/IEC 8859-1 data.
Extended Chanel Interpretation (ECI)
A more reliable way to encode UTF-8 data in a QR-Code is to include an ECI block in the data to specifically inform the reader that the next block of bytes is using UTF-8 rather than the default ISO/IEC 8859-1 encoding. The ECI block should have the value 000026. Please refer to section 6.4.2 of ISO/IEC 18004 Second Edition 2006-09-01 for further information.
Versions of SoftekBarcode.dll
Versions of softekbarcode.dll starting from 126.96.36.199 support automatic detection of UTF-8 data and ECI/UTF-8 interpretation. An advanced parameter called QRCodeAutoUTF8 controls automatic detection of UTF-8 and can either be set to 1 (True) or 0 (False). If barcodes are using an ECI block then it is safe to set the QRCodeAutoUTF8 to False or 0. Please contact firstname.lastname@example.org if you require a version with these features.