Creating a file of UTF-8 data to pass to a hash or signature function in the CryptoSys PKI Toolkit is tricky. The "text" file containing the input must contain exactly the correct bytes with no Byte Order Mark (BOM) headers or trailing CR-LF characters. UTF-8 files created by a some applications (like .NET) may have these additional bytes added to them whether you want them or not. If there are additional bytes - even just one - the signature will be wrong!
Worse, when you open these files in a UTF-8-aware text editor, you won't see these extra bytes, because the editor is expecting them and doesn't show them. And if you open and then save, your editor may add these extra bytes without telling you. Windows Notepad does this, for example.
Here is a simple command-line program based on the hexdump command in Linux. We will use it to examine the data files.
hexdump for Windows
Download the EXE file and put it in a directory that Windows will search. Open a command prompt window (Start > Run > cmd) or (Start > Programs > Accessories > Command Prompt).
> hexdump -C Muestra-v2_PipedString-UTF8.txt
Note the "-C" option (that's a capital letter C) and enclose the filename in quotes if it has spaces in it, e.g. "file name"
.
These three example files can be downloaded as a zipped file (1.6 kB).
000000 7c 7c 32 2e 30 7c 41 7c 31 7c 32 30 30 39 2d 30 ||2.0|A|1|2009-0 000010 38 2d 31 36 54 31 36 3a 33 30 3a 30 30 7c 31 7c 8-16T16:30:00|1| 000020 32 30 30 39 7c 69 6e 67 72 65 73 6f 7c 55 6e 61 2009|ingreso|Una 000030 20 73 6f 6c 61 20 65 78 68 69 62 69 63 69 c3 b3 sola exhibici.. 000040 6e 7c 33 35 30 2e 30 30 7c 35 2e 32 35 7c 33 39 n|350.00|5.25|39 000050 37 2e 32 35 7c 49 53 50 39 30 30 39 30 39 51 38 7.25|ISP900909Q8 000060 38 7c 49 6e 64 75 73 74 72 69 61 73 20 64 65 6c 8|Industrias del 000070 20 53 75 72 20 50 6f 6e 69 65 6e 74 65 2c 20 53 Sur Poniente, S 000080 2e 41 2e 20 64 65 20 43 2e 56 2e 7c 41 6c 76 61 .A. de C.V.|Alva 000090 72 6f 20 4f 62 72 65 67 c3 b3 6e 7c 33 37 7c 33 ro Obreg..n|37|3 0000a0 7c 43 6f 6c 2e 20 52 6f 6d 61 20 4e 6f 72 74 65 |Col. Roma Norte 0000b0 7c 4d c3 a9 78 69 63 6f 7c 43 75 61 75 68 74 c3 |M..xico|Cuauht. 0000c0 a9 6d 6f 63 7c 44 69 73 74 72 69 74 6f 20 46 65 .moc|Distrito Fe 0000d0 64 65 72 61 6c 7c 4d c3 a9 78 69 63 6f 7c 30 36 deral|M..xico|06 0000e0 37 30 30 7c 50 69 6e 6f 20 53 75 61 72 65 7a 7c 700|Pino Suarez| 0000f0 32 33 7c 43 65 6e 74 72 6f 7c 4d 6f 6e 74 65 72 23|Centro|Monter 000100 72 65 79 7c 4d 6f 6e 74 65 72 72 65 79 7c 4e 75 rey|Monterrey|Nu 000110 65 76 6f 20 4c c3 a9 6f 6e 7c 4d c3 a9 78 69 63 evo L..on|M..xic 000120 6f 7c 39 35 34 36 30 7c 43 41 55 52 33 39 30 33 o|95460|CAUR3903 000130 31 32 53 38 37 7c 52 6f 73 61 20 4d 61 72 c3 ad 12S87|Rosa Mar.. 000140 61 20 43 61 6c 64 65 72 c3 b3 6e 20 55 72 69 65 a Calder..n Urie 000150 67 61 73 7c 54 6f 70 6f 63 68 69 63 6f 7c 35 32 gas|Topochico|52 000160 7c 4a 61 72 64 69 6e 65 73 20 64 65 6c 20 56 61 |Jardines del Va 000170 6c 6c 65 7c 4d 6f 6e 74 65 72 72 65 79 7c 4d 6f lle|Monterrey|Mo 000180 6e 74 65 72 72 65 79 7c 4e 75 65 76 6f 20 4c 65 nterrey|Nuevo Le 000190 c3 b3 6e 7c 4d c3 a9 78 69 63 6f 7c 39 35 34 36 ..n|M..xico|9546 0001a0 35 7c 31 30 7c 43 61 6a 61 7c 56 61 73 6f 73 20 5|10|Caja|Vasos 0001b0 64 65 63 6f 72 61 64 6f 73 7c 32 30 2e 30 30 7c decorados|20.00| 0001c0 32 30 30 7c 31 7c 70 69 65 7a 61 7c 43 68 61 72 200|1|pieza|Char 0001d0 6f 6c 61 20 6d 65 74 c3 a1 6c 69 63 61 7c 31 35 ola met..lica|15 0001e0 30 2e 30 30 7c 31 35 30 7c 49 56 41 7c 31 35 2e 0.00|150|IVA|15. 0001f0 30 30 7c 35 32 2e 35 30 7c 7c 00|52.50||
Note that
> hexdump -C Muestra-v2_PipedString-Latin1.txt
000000 7c 7c 32 2e 30 7c 41 7c 31 7c 32 30 30 39 2d 30 ||2.0|A|1|2009-0 000010 38 2d 31 36 54 31 36 3a 33 30 3a 30 30 7c 31 7c 8-16T16:30:00|1| 000020 32 30 30 39 7c 69 6e 67 72 65 73 6f 7c 55 6e 61 2009|ingreso|Una 000030 20 73 6f 6c 61 20 65 78 68 69 62 69 63 69 f3 6e sola exhibici.n 000040 7c 33 35 30 2e 30 30 7c 35 2e 32 35 7c 33 39 37 |350.00|5.25|397 000050 2e 32 35 7c 49 53 50 39 30 30 39 30 39 51 38 38 .25|ISP900909Q88 000060 7c 49 6e 64 75 73 74 72 69 61 73 20 64 65 6c 20 |Industrias del 000070 53 75 72 20 50 6f 6e 69 65 6e 74 65 2c 20 53 2e Sur Poniente, S. 000080 41 2e 20 64 65 20 43 2e 56 2e 7c 41 6c 76 61 72 A. de C.V.|Alvar 000090 6f 20 4f 62 72 65 67 f3 6e 7c 33 37 7c 33 7c 43 o Obreg.n|37|3|C 0000a0 6f 6c 2e 20 52 6f 6d 61 20 4e 6f 72 74 65 7c 4d ol. Roma Norte|M 0000b0 e9 78 69 63 6f 7c 43 75 61 75 68 74 e9 6d 6f 63 .xico|Cuauht.moc 0000c0 7c 44 69 73 74 72 69 74 6f 20 46 65 64 65 72 61 |Distrito Federa 0000d0 6c 7c 4d e9 78 69 63 6f 7c 30 36 37 30 30 7c 50 l|M.xico|06700|P 0000e0 69 6e 6f 20 53 75 61 72 65 7a 7c 32 33 7c 43 65 ino Suarez|23|Ce 0000f0 6e 74 72 6f 7c 4d 6f 6e 74 65 72 72 65 79 7c 4d ntro|Monterrey|M 000100 6f 6e 74 65 72 72 65 79 7c 4e 75 65 76 6f 20 4c onterrey|Nuevo L 000110 e9 6f 6e 7c 4d e9 78 69 63 6f 7c 39 35 34 36 30 .on|M.xico|95460 000120 7c 43 41 55 52 33 39 30 33 31 32 53 38 37 7c 52 |CAUR390312S87|R 000130 6f 73 61 20 4d 61 72 ed 61 20 43 61 6c 64 65 72 osa Mar.a Calder 000140 f3 6e 20 55 72 69 65 67 61 73 7c 54 6f 70 6f 63 .n Uriegas|Topoc 000150 68 69 63 6f 7c 35 32 7c 4a 61 72 64 69 6e 65 73 hico|52|Jardines 000160 20 64 65 6c 20 56 61 6c 6c 65 7c 4d 6f 6e 74 65 del Valle|Monte 000170 72 72 65 79 7c 4d 6f 6e 74 65 72 72 65 79 7c 4e rrey|Monterrey|N 000180 75 65 76 6f 20 4c 65 f3 6e 7c 4d e9 78 69 63 6f uevo Le.n|M.xico 000190 7c 39 35 34 36 35 7c 31 30 7c 43 61 6a 61 7c 56 |95465|10|Caja|V 0001a0 61 73 6f 73 20 64 65 63 6f 72 61 64 6f 73 7c 32 asos decorados|2 0001b0 30 2e 30 30 7c 32 30 30 7c 31 7c 70 69 65 7a 61 0.00|200|1|pieza 0001c0 7c 43 68 61 72 6f 6c 61 20 6d 65 74 e1 6c 69 63 |Charola met.lic 0001d0 61 7c 31 35 30 2e 30 30 7c 31 35 30 7c 49 56 41 a|150.00|150|IVA 0001e0 7c 31 35 2e 30 30 7c 35 32 2e 35 30 7c 7c |15.00|52.50||
This time, words with accented characters like "México" are shown with only one byte for the letter é with hex value 0xe9. This is not what we want. The file is in ISO-8859-1 or Latin-1 encoding and you need to convert it to UTF-8 or your hash value and signatures will be wrong.
> hexdump -C Muestra-v2_PipedString-UTF8-BOM.txt
000000 ef bb bf 7c 7c 32 2e 30 7c 41 7c 31 7c 32 30 30 ...||2.0|A|1|200 000010 39 2d 30 38 2d 31 36 54 31 36 3a 33 30 3a 30 30 9-08-16T16:30:00 000020 7c 31 7c 32 30 30 39 7c 69 6e 67 72 65 73 6f 7c |1|2009|ingreso| 000030 55 6e 61 20 73 6f 6c 61 20 65 78 68 69 62 69 63 Una sola exhibic 000040 69 c3 b3 6e 7c 33 35 30 2e 30 30 7c 35 2e 32 35 i..n|350.00|5.25 000050 7c 33 39 37 2e 32 35 7c 49 53 50 39 30 30 39 30 |397.25|ISP90090 000060 39 51 38 38 7c 49 6e 64 75 73 74 72 69 61 73 20 9Q88|Industrias 000070 64 65 6c 20 53 75 72 20 50 6f 6e 69 65 6e 74 65 del Sur Poniente 000080 2c 20 53 2e 41 2e 20 64 65 20 43 2e 56 2e 7c 41 , S.A. de C.V.|A 000090 6c 76 61 72 6f 20 4f 62 72 65 67 c3 b3 6e 7c 33 lvaro Obreg..n|3 0000a0 37 7c 33 7c 43 6f 6c 2e 20 52 6f 6d 61 20 4e 6f 7|3|Col. Roma No 0000b0 72 74 65 7c 4d c3 a9 78 69 63 6f 7c 43 75 61 75 rte|M..xico|Cuau 0000c0 68 74 c3 a9 6d 6f 63 7c 44 69 73 74 72 69 74 6f ht..moc|Distrito 0000d0 20 46 65 64 65 72 61 6c 7c 4d c3 a9 78 69 63 6f Federal|M..xico 0000e0 7c 30 36 37 30 30 7c 50 69 6e 6f 20 53 75 61 72 |06700|Pino Suar 0000f0 65 7a 7c 32 33 7c 43 65 6e 74 72 6f 7c 4d 6f 6e ez|23|Centro|Mon 000100 74 65 72 72 65 79 7c 4d 6f 6e 74 65 72 72 65 79 terrey|Monterrey 000110 7c 4e 75 65 76 6f 20 4c c3 a9 6f 6e 7c 4d c3 a9 |Nuevo L..on|M.. 000120 78 69 63 6f 7c 39 35 34 36 30 7c 43 41 55 52 33 xico|95460|CAUR3 000130 39 30 33 31 32 53 38 37 7c 52 6f 73 61 20 4d 61 90312S87|Rosa Ma 000140 72 c3 ad 61 20 43 61 6c 64 65 72 c3 b3 6e 20 55 r..a Calder..n U 000150 72 69 65 67 61 73 7c 54 6f 70 6f 63 68 69 63 6f riegas|Topochico 000160 7c 35 32 7c 4a 61 72 64 69 6e 65 73 20 64 65 6c |52|Jardines del 000170 20 56 61 6c 6c 65 7c 4d 6f 6e 74 65 72 72 65 79 Valle|Monterrey 000180 7c 4d 6f 6e 74 65 72 72 65 79 7c 4e 75 65 76 6f |Monterrey|Nuevo 000190 20 4c 65 c3 b3 6e 7c 4d c3 a9 78 69 63 6f 7c 39 Le..n|M..xico|9 0001a0 35 34 36 35 7c 31 30 7c 43 61 6a 61 7c 56 61 73 5465|10|Caja|Vas 0001b0 6f 73 20 64 65 63 6f 72 61 64 6f 73 7c 32 30 2e os decorados|20. 0001c0 30 30 7c 32 30 30 7c 31 7c 70 69 65 7a 61 7c 43 00|200|1|pieza|C 0001d0 68 61 72 6f 6c 61 20 6d 65 74 c3 a1 6c 69 63 61 harola met..lica 0001e0 7c 31 35 30 2e 30 30 7c 31 35 30 7c 49 56 41 7c |150.00|150|IVA| 0001f0 31 35 2e 30 30 7c 35 32 2e 35 30 7c 7c 0d 0a 0d 15.00|52.50||... 000200 0a 0d 0a 0d 0a 0d 0a 0d 0a 0d a0 ...........
This file is in UTF-8 encoding (see the double dots in "M..xico") but
0xef, 0xbb, 0xbf
. These are the UTF-8 Byte Order Marks to indicate to an application reading the file
that the data following is UTF-8 encoded.
0x0d, 0x0a, 0x0d, 0x0d, ...
bytes. These are CR-LF pairs or newline characters
giving the file a few extra "lines" at the end.
If you compute the hash value or signature on this file it will be wrong. The extra bytes added to the required data will cause the value to be different (i.e. wrong).
A Byte Order Mark (BOM) is used in Unicode to indicate the "endianness" of the data.
This is useful for UTF-16 (which is an extension of the old UCS-2) where characters are always stored as two bytes
(well, there can be more, but in practice you should almost always see just two).
Different computers store these pairs of bytes in different orders (big-endian or little-endian) depending on their architecture.
The BOM character for UTF-16 is U+FEFF and will be stored as either (0xFE, 0xFF
) or (0xFF, 0xFE
) depending on your machine.
This may show up as ÿþ
or þÿ
when you view the file.
The BOM for UTF-8 has three bytes 0xef, 0xbb, 0xbf
and may show up as 
when you view the file
or as ´╗┐
on the command-line console.
The use of a BOM for UTF-8 is not recommended. It does not give any indication about byte order (despite its name)
and UTF-8 data can be detected by a simple test anyway. Even so, Windows Notepad will add this BOM when it saves.
If you are using the tools from CryptoSys PKI to read in data from a file, the bytes will be read exactly as they are. It does not "test" for BOMs and ignore them. This is by design. Any message digest hash values or signatures computed from data with these extra bytes will be wrong.
The correct MD5 hash of the data in the first example above is (0x)4CD8ED248D7A02314C50778A37D1522D
.
To fix, either use a hex editor to make sure the file is correct (we use Frhed Free Hex Editor) or take extra care when creating your text files and check with the hexdump utility before using.
If you use Notepad++ then open your text file and use the menu options Encoding > Convert to UTF-8 without BOM then save again.
Alternatively, instead of reading from a file, put your data in a string and pass that instead. In VB6 replace this
strDataFile = "Muestra-v2_PipedString-UTF8.txt"
ReDim abDigest(PKI_MD5_BYTES - 1)
nRet = HASH_File(abDigest(0), PKI_MD5_BYTES, strDataFile, PKI_HASH_MD5)
with this
strData = "||2.0|A|1|2009-08-16T16:30:00 ... IVA|15.00|52.50||" ' Convert string to UTF-8. nLen = CNV_UTF8FromLatin1("", 0, strData) strDataUTF8 = String(nLen, " ") nLen = CNV_UTF8FromLatin1(strDataUTF8, Len(strDataUTF8), strData) ' Convert string to bytes abData = StrConv(strDataUTF8, vbFromUnicode) ' Compute the MD5 hash value ReDim abDigest(PKI_MD5_BYTES - 1) nRet = HASH_Bytes(abDigest(0), PKI_MD5_BYTES, abData(0), UBound(abData) + 1, PKI_HASH_MD5)
In VB.NET, it's much simpler:
strData = "||2.0|A|1|2009-08-16T16:30:00 ... IVA|15.00|52.50||" ' Convert string to bytes in UTF-8 encoding abData = System.Text.Encoding.UTF8.GetBytes(strData) abDigest = Hash.BytesFromBytes(abData, HashAlgorithm.Md5)
See also Creating the message digest of the piped-string on the SAT Mexico page.
For more information or to comment on this page, please send us a message.
This page last updated 15 August 2025