Click here to Skip to main content
15,886,578 members
Articles / Database Development / SQL Server

ASCII/Multibyte to Unicode Conversion

Rate me:
Please Sign up or sign in to vote.
3.10/5 (8 votes)
16 Feb 2007CPOL3 min read 166.4K   26   23
An article about converting ASCII/Multibyte to Unicode using C# and SQL/CLR integration.

Sample image

Introduction

A common problem when converting legacy database applications is that customers have a lot of data and they don't want to loose it. Data strings are usually encoded in ASCII /Multibyte, but our new system uses NVARCHAR fields, so the problem is that we have to convert old data into Unicode. This is exactly what we'll do here.

I have to highlight that the approach I will present is 'oversized' for most cases: in a common scenario, you will deal with a single codepage; in that case, SQL Server built-in tools are enough. But there are situations that require a more advanced approach, such as when you have a legacy database that stores strings with different codepages.

As a complete example to show you capabilities of SQL-CLR integration, I've also decided to use a Win32 API to perform the conversion so that you can also see how to use P/Invoke from SQL. This can be useful if you have an old DLL and want to use it... but *beware* that it can be really dangerous... if you don't have full knowledge of what you're doing, you can keep down the entire SQL Server process !!!

Just a note: I do not provide a test project since the code I will show here is really simple, you can use copy & paste, and it's faster!

Solution

The solution I want to show is simple and powerful at the same time. It is made of two logical parts: first, we'll build the conversion routine as a standard C# function. Second, we'll integrate it into SQL Server as a function, so you can use with T-SQL.

Clearly, depending on your application scenario, this may not be the best approach, so once you have the conversion routine, you may choose to follow my approach or you may prefer to use it in an external application that performs a batch conversion, or you may think of something else.

Part #1: Create the conversion routine

I've used the following well known Win32 API:

C++
int MultiByteToWideChar(
  UINT CodePage, 
  DWORD dwFlags,         
  LPCSTR lpMultiByteStr, 
  int cbMultiByte,       
  LPWSTR lpWideCharStr,  
  int cchWideChar        
);

Here, there's the C# signature that can be used for P/Invoking:

C#
[DllImport("kernel32.dll")]
private static extern int MultiByteToWideChar(
  uint CodePage,
  uint dwFlags,
  [MarshalAs(UnmanagedType.LPArray)] Byte[] lpMultiByteStr,
  int cbMultiByte,
  [Out, MarshalAs(UnmanagedType.LPArray)] Byte[] lpWideCharStr,
  int cchWideChar);

Now it's quite easy to write a class that performs a conversion:

C#
namespace ConvUtils {

  public static class Unicode {

    public static SqlString ConvToUnicode(SqlInt32 codepage , SqlString multibyteString) {
      byte[] b = (byte[])iConvToMultibyteArray(multibyteString);
      return (SqlString)ToUnicode((uint)(int)codepage, b);
    }

    private static string ToUnicode(uint codepage, Byte[] lpMultiByteStr) { 
      Byte[] lpWideCharStr = new Byte[2*lpMultiByteStr.Length];
      MultiByteToWideChar(codepage, 0, lpMultiByteStr, lpMultiByteStr.Length,
         lpWideCharStr, 2*lpMultiByteStr.Length);
      return System.Text.Encoding.Unicode.GetString(lpWideCharStr);
    }


    private static SqlBinary iConvToMultibyteArray(SqlString multibyteString) {
      byte[] result = multibyteString.GetUnicodeBytes();
      return (SqlBinary)result;
    }
  }
}

The example is quite easy, and does not require any other explanation. The SQL types has been used because I will integrate it into SQL Server, but if you don't need it, you may replace them with strings and byte arrays.

I've also decided to let the codepage be a parameter since the codepage on your PC or on SQL Server can be different from the one needed for the conversion.

Part #2: SQL Server code

OK, now we have a C# code that converts ASCII/Multibyte into Unicode. The next step is to integrate it into SQL Server so that any database user can have access to this conversion routine:

First, create a DLL that can be hosted by SQL Server: all we have to do is to add the [SQLFunction] attribute to ConvToUnicode:

C#
[SQLFunction] 
public static SqlString ConvToUnicode( ...

Finally, build the DLL and integrate it into SQL Server with a script like the following:

SQL
use TESTDB 
go 
exec sp_configure "clr enabled", '1' 
go 
reconfigure 
go 
-- for test purpouses only, not recommended in production environments
ALTER DATABASE TESTDB SET TRUSTWORTHY ON 
go 

begin try 
  CREATE ASSEMBLY [asmUni] FROM 'c:\project_output_dir\uniconv.dll'
     with permission_set=UNSAFE 
end try 
begin catch 
  alter assembly [asmUni] FROM 'c:\project_output_dir\uniconv.dll' WITH UNCHECKED DATA 
end catch 
go

if exists(
    select name from sys.objects where name = 
    'csConvToUnicode') drop function [dbo].[csConvToUnicode] 
go 
CREATE FUNCTION [dbo].[csConvToUnicode] ( 
@codepage int, 
@multibytestr nvarchar(max) 
) returns nvarchar(max) 
AS EXTERNAL name [asmUni].[ConvUtils.Unicode].[ConvToUnicode] 
go

That's all!

Now you can use this function like any other, for example, in a classic Select statement, to create a View or to create a Trigger that automatically keeps your data updated.

Here is a final example of how we can use this function in a T-SQL statement (950 is the codepage for traditional Chinese):

SQL
select 
  description, 
  dbo.csConvToUnicode(950, description) as converted 
from testtable

description             converted
----------------------- -------------------
¨àµ£ºô¸ô¦w¥þ            兒童網路安全
°ê»Ú¸ê°T °T°T°T°T°T        國際資訊 訊訊訊訊訊
°ê»Ú¸ê°T °T°T°T°T°T        國際資訊 訊訊訊訊訊
a                    a
Áô¨pÃ…v¬Fµ¦            隱私權æ"¿ç­–
test c                test c

Conclusion

I have shown a really simple but powerful way to leverage CLR integration provided by SQL Server to convert old ASCII/Multibyte data into Unicode. I hope that this is a good starting point for your personal solution.

History

  • February 16, 2007 - Added more comments on introduction.
  • February 8, 2007 - First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Italy Italy
10 years of hard coding... and it's going on !

Comments and Discussions

 
QuestionNot work in my case Pin
Ruohong23-Nov-10 2:11
Ruohong23-Nov-10 2:11 
AnswerRe: Not work in my case Pin
carlop()23-Nov-10 10:08
carlop()23-Nov-10 10:08 
QuestionNot work with arabic Pin
Ali Habib11-Jan-09 23:27
Ali Habib11-Jan-09 23:27 
AnswerRe: Not work with arabic Pin
carlop()12-Jan-09 9:52
carlop()12-Jan-09 9:52 
GeneralRe: Not work with arabic Pin
Ali Habib12-Jan-09 9:56
Ali Habib12-Jan-09 9:56 
GeneralRe: Not work with arabic Pin
carlop()12-Jan-09 10:19
carlop()12-Jan-09 10:19 
GeneralIt's just what I want, would u please share your dll here? many thks! Pin
frx21-Dec-08 17:09
frx21-Dec-08 17:09 
GeneralRe: It's just what I want, would u please share your dll here? many thks! Pin
carlop()21-Dec-08 21:01
carlop()21-Dec-08 21:01 
GeneralRe: It's just what I want, would u please share your dll here? many thks! Pin
frx22-Dec-08 22:48
frx22-Dec-08 22:48 
GeneralPerhaps an alternative using System.Text.Encoding Pin
Cyrus Chan26-Mar-07 22:29
Cyrus Chan26-Mar-07 22:29 
QuestionWhy reinvent the wheel? Pin
Mihai Nita8-Feb-07 8:26
Mihai Nita8-Feb-07 8:26 
AnswerRe: Why reinvent the wheel? Pin
carlop()8-Feb-07 9:37
carlop()8-Feb-07 9:37 
GeneralRe: Why reinvent the wheel? [modified] Pin
Mihai Nita8-Feb-07 12:39
Mihai Nita8-Feb-07 12:39 
GeneralRe: Why reinvent the wheel? Pin
carlop()8-Feb-07 13:06
carlop()8-Feb-07 13:06 
GeneralRe: Why reinvent the wheel? Pin
Mihai Nita8-Feb-07 16:26
Mihai Nita8-Feb-07 16:26 
GeneralRe: Why reinvent the wheel? Pin
carlop()8-Feb-07 21:08
carlop()8-Feb-07 21:08 
GeneralRe: Why reinvent the wheel? Pin
Mihai Nita8-Feb-07 23:24
Mihai Nita8-Feb-07 23:24 
GeneralRe: Why reinvent the wheel? Pin
carlop()8-Feb-07 23:42
carlop()8-Feb-07 23:42 
GeneralRe: Why reinvent the wheel? Pin
pemb31-Jul-07 17:10
pemb31-Jul-07 17:10 
AnswerRe: It's still a good example of CLR integration Pin
Jcmorin8-Feb-07 11:26
Jcmorin8-Feb-07 11:26 
GeneralRe: It's still a good example of CLR integration Pin
carlop()8-Feb-07 11:33
carlop()8-Feb-07 11:33 
GeneralRe: It's still a good example of CLR integration Pin
Mihai Nita8-Feb-07 12:42
Mihai Nita8-Feb-07 12:42 
GeneralRe: It's still a good example of CLR integration Pin
carlop()8-Feb-07 21:11
carlop()8-Feb-07 21:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.