2010年8月2日星期一

MD5 asp.net unicode SQL server

In asp.net (C#) , all string are in unicode.

if you want to caculate a string's MD5 hash (ansi string submited from page), maybe you won't get the right result.

There is no way to deal with it within C# ( i haven't found way)

But if you want to use MD5 hash in SQL server, use stored procedure could slove this problem.

Param:
Declare @oldvar nvarchar(20)
Declare @newvar varchar(20)
select @newvar=oldvar would convert nvarchar into varchar

for detail, check this website:
http://weblogs.sqlteam.com/mladenp/archive/2009/04/28/Comparing-SQL-Server-HASHBYTES-function-and-.Net-hashing.aspx

Example in SQL Server

-- note the size 15 of all datatypes
declare @val1 varchar(15),
@val2 nvarchar(15),
@val3 char(15),
@val4 nchar(15)

-- example of 1 byte/char text
-- all variables are of same length
select @val1 = '1234567890',
@val2 = N'1234567890',
@val3 = '1234567890',
@val4 = N'1234567890'

-- all 4 return different results
select HASHBYTES('md5', @val1) as MD5_varchar, -- result = 0xE807F1FCF82D132F9BB018CA6738A19F
-- just to show that collation doesn't change the hash
HASHBYTES('md5', @val1 collate Cyrillic_General_BIN2) as MD5_varchar_collation, -- result = 0xE807F1FCF82D132F9BB018CA6738A19F
HASHBYTES('md5', @val2) as MD5_Nvarchar, -- result = 0xE15E31C3D8898C92AB172A4311BE9E84
HASHBYTES('md5', @val3) as MD5_char, -- result = 0x2120C3F3423F89BA8A65ABD933321884
HASHBYTES('md5', @val4) as MD5_Nchar -- result = 0x90DEF5840F3A31174CA44E2022F743B6

-- example of 2 bytes/char text
-- 中文 means Chinese in written text
-- converting 2 bytes/char text to varchar cuts the text in half
select @val1 = '中文', -- this cuts the text in half
@val2 = N'中文', -- this stores the whole text
@val3 = '中文', -- this cuts the text in half
@val4 = N'中文' -- this stores the whole text

-- all 4 return different results
select HASHBYTES('md5', @val1) as MD5_varchar, -- result = 0xEA03FCB8C47822BCE772CF6C07D0EBBB
HASHBYTES('md5', @val2) as MD5_Nvarchar, -- result = 0x73C6C8CD2F94355EF015E5265D5E65B1
HASHBYTES('md5', @val3) as MD5_char, -- result = 0xA13C45A38853677887B4839071537634
HASHBYTES('md5', @val4) as MD5_Nchar -- result = 0xEADEBD3BD72A481C43C828E0C550145C
The catch here is the data type difference. Nvarchar and nchar take twice more bytes to store data than varchar and char. Since char and nchar pad the right side of the string with spaces to fill the gap up to defined data type length the spaces are also used in MD5 calculation. A collation plays no part in calculating hash values since collations are only applied to sorts and comparisons.

However care must be taken with texts that need 2 bytes per char of storage space like Chinese text. Storing it in a varchar variable cuts it in half.

Example in .Net - C#

For generating MD5 hashes I’ve used the code from this site which turned up as the first result on Google for “MD5 in C#”:

public string GetMD5Hash(string input)
{
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
//byte[] bs = System.Text.Encoding.ASCII.GetBytes(input);
//byte[] bs = System.Text.Encoding.UTF7.GetBytes(input);
//byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
//byte[] bs = System.Text.Encoding.UTF32.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
string password = s.ToString();
return password;
}

// hashes for the simple “1234567890” text in all encodings
ASCII: e807f1fcf82d132f9bb018ca6738a19f
UTF7: e807f1fcf82d132f9bb018ca6738a19f
UTF8: e807f1fcf82d132f9bb018ca6738a19f
Unicode: e15e31c3d8898c92ab172a4311be9e84
UTF32: 6a57502c29a5081f03cb70e0ad38ecc7

// hashes for the complex “中文” text in all encodings
ASCII: ea03fcb8c47822bce772cf6c07d0ebbb
UTF7: eb02105e5c51a33f21e8da7f8102cfda
UTF8: a7bac2239fcdcb3a067903d8077c4a07
Unicode: 73c6c8cd2f94355ef015e5265d5e65b1


UTF32: 65fe91b81ed1107566f9f9f5ed4ccaf1

All strings in .Net store their chars in 2 bytes by default. When hashing values we have to take this into account and use proper text Encoding. .Net supports 5 encodings: ASCII (7 bits per char), UTF7 (7 bits per char), UTF8 (8 bits = 1 byte per char), Unicode (UTF-16) (16 bits = 2 bytes per char) and UTF32 (32 bits = 4 bytes per char).

For text with only first 127 chars in the ASCII table ASCII, UTF7 and UTF8 encodings all return the same hash, but with UTF16 and UTF32 comes endianness so they don’t return the same hash values. Also note that the ASCII and UTF7 encoding aren’t recommended to be used anymore except in legacy apps. UTF8 should be used instead.

For text with complex chars we have to use Unicode or UTF32 or we loose char information.

Putting .Net and SQL Server together

It turns out that such a simple thing as hashing can become a serious issue if we’re not careful. Trouble always awaits when dealing with text and encodings. :)

When using SQL Server’s varchar data type the .Net encoding to go with is UTF8 since it’s the fastest and most optimized of the three (ASCII, UTF7, UTF8). When using the nvarchar data type to go is Unicode (UTF16) but we also have to know the texts endianness to create correct hashes. UTF32 is practically useless in this case because SQL Server doesn’t have a data type that stores text in 4 bytes/char so we’ll never get the same results if we use it.

This advice only applies when creating hashes both in .Net and SQL server and comparing them. If we’re creating hashes in .Net and only store them in a database then we don’t have to worry about this.

=========================
examples by myself:

update computer set
md5machineid=SUBSTRING((master.dbo.fn_varbintohexstr(HASHBYTES('MD5',substring(MachineID,1,23)))),3,32)

stored procedure:
CREATE PROCEDURE CheckMachineID
-- Add the parameters for the stored procedure here
@MD5MachineID nvarchar(32),
@tmpRString nvarchar(32),
@MixedMD5MachineID nvarchar(32)

AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
declare @RString varchar(32)
select @RString=@tmpRString
select MachineID from computer
where MD5MachineID=@MD5MachineID and
SUBSTRING(
(master.dbo.fn_varbintohexstr(
HASHBYTES('MD5',MachineID+@RString))),3,32)=@MixedMD5MachineID
-- Insert statements for procedure here

END

没有评论: