Click here to Skip to main content
15,891,184 members
Articles / Desktop Programming / Windows Forms

CodePage File Converter

Rate me:
Please Sign up or sign in to vote.
3.11/5 (15 votes)
13 Apr 2007CPOL2 min read 111.6K   4.7K   33   17
Used to convert Text Files to a different code-page

Introduction

There are some people on planet Earth who still don't understand the difference between a byte and a character. So, let's start with the basics.

A byte is information storage. 1 byte = 8 bits. That is it.

A character is any written symbol. It ranges from English letters to Chinese letters to any other special characters, or even scientific, or mathematical symbols. In order to let the computer store characters, there are many ways for encoding. If the encoding is 8-bit encoding, this means that every character will be stored in one single byte. But other encodings offer 7-bit encoding (like ANSI) or 16-bits encoding (like Unicode).

Code Page represents the encoding mechanism that is used to encode characters into a bit-stream. Here are some examples of mostly common Code Pages:

Code Page Name AKA
1200 utf-16 Unicode
1250 windows-1250 Central European (Windows)
1251 windows-1251 Cyrillic (Windows)
1252 Windows-1252 Western European (Windows)
1253 windows-1253 Greek (Windows)
1254 windows-1254 Turkish (Windows)
1255 windows-1255 Hebrew (Windows)
1256 windows-1256 Arabic (Windows)
20127 us-ascii US-ASCII
20936 x-cp20936 Chinese Simplified (GB2312-80)
20949 x-cp20949 Korean Wansung
28591 iso-8859-1 Western European (ISO)
65001 utf-8 Unicode (UTF-8)
65005 utf-32 Unicode (UTF-32)

Background

Files are a stream of bytes. If the file is a text file, then this stream of bytes should represent those characters in one Code Page as mentioned above. However what is not stored within the file is that piece of information that tells which Code Page was used to do the encoding (although we can write algorithms that try to find a best guess). So, if the file is written in a Code Page that is not supported by the system that interprets the file, a conversion will be needed to re-encode the file in the expected Code Page.

For the example in the screen shot below, if the Regional and Language Options in Windows XP sets the language to match non-Unicode programs to Arabic (Egypt), this means that the encoding used to encode TXT files will be (Windows-1256) .

If the text file is opened later on, a system that has a different setting in Regional and Language Options (like English (United States)). This will cause the file to be interpreted incorrectly.

Screenshot - screen_shot_1.jpg

Using the Code

The software requires .NET 2.0 to run. First provide a path for the input file, and the input code-page. And the path for the output files with the target Code-Page.

Screenshot - screen_shot_2.jpg

References

History

  • 13th April, 2007: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Egypt Egypt
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionArabic code page conversion Pin
peyman.a15-Apr-13 15:01
peyman.a15-Apr-13 15:01 
Generalthanks! Pin
Member 978286424-Jan-13 4:01
Member 978286424-Jan-13 4:01 
QuestionNeed Help Pin
Nazim Iqbal5-Oct-12 13:27
Nazim Iqbal5-Oct-12 13:27 
Hi

I need your help in,
I am working on a project, MYSQL and PHP based, with WAMP SERVER
I have a field storytext, (data collected in URDU) detail are:

Field Type Collation Null Default
storytext text utf8_unicode_ci Yes NULL


Retrieving URDU Data from MY SQL 5.5.8 running on wamp server with PHP is OK,

And the soft ware is only for Internal use not web base!
------------------------------------------------------------------------------------------
HERE IS THE PROBLEM:

One section of it is designed on Visual Studio.NET 2010 in which

I have a datagrid, on clicking any row I got the TEXT in details on text box,

which show URDU TEXT, from data base here is the problem with TEXT BOX and DATAGRID.

I got this..........

DESP. ITEM

پروگرام رانا مب
شر ايٹ پرائم ٹائم ميں
گ٠تگو کرتے Û ÙˆØ¦Û’ وزير٠اعظم Ú©Û’
وکيل Ú†ÙˆÛ Ø¯Ø±ÙŠ اعتزاز احسن
کا Ú©Û Ù†Ø§ تھا Ú©Û ØµØ¯Ø± Ú©Û’ Ø¹Û Ø¯Û’
Ú©Ùˆ استشني حاصل Û Û’
وزير٠اعظم Ø

-------------------------------------------------------------------------------------------------------
I also use in visual studio

cnString = "datasource=localhost;username=root;password=;database=spidernews;charset=utf8;"

How can I solve this problem,
can you help me out !
URDU NOT SHOWING,

VB
Here is Whole CODE of Visual Studio, I am using:
------------------------------------------------------------------------------------------------------------------------



Imports MySql.Data.MySqlClient

Public Class Form1

    Dim conn As Common.DbConnection
    Dim da As Common.DbDataAdapter
    Dim ds As DataSet = New DataSet
    Dim cnString As String
    Dim sqlQRY As String
    'Dim dt As DataTable
    'Dim strBuilder As Char




    '''' SCROLLONG TEXT START
    Dim scrollingText As String = "Spider NEWS "
    Dim txtStr(scrollingText.Length - 1) As String
    Dim txtPos As Integer = -1

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

        'strBuilder.ToString ("tblnews" & char(34))


        cnString = "datasource=localhost;username=root;password=;database=spidernews;charset=utf8;"

        sqlQRY = "SET NAMES UTF8; SET CHARACTER SET UTF8;Select slugid,slug,storytext,storyduration,segmentdescription,prioritydescription,mosstatus from tblnews"
        'sqlQRY = "SET NAMES UTF8; SET CHARACTER SET latin1;Select slugid,slug,storytext,storyduration,segmentdescription,prioritydescription,mosstatus from tblnews"
        conn = New MySqlConnection(cnString)

        Try
            conn.Open()
            da = New MySqlDataAdapter(sqlQRY, conn)
            Dim cb As MySqlCommandBuilder = New MySqlCommandBuilder(da)

            da.Fill(ds, "tblnews")

            DataGridView1.DataSource = ds
            DataGridView1.DataMember = "tblnews"
            '''''


        Catch ex As Common.DbException
            MsgBox(ex.ToString)
        Finally
            conn.Close()
        End Try

        '''' SCROLLONG TEXT START
        For idx As Integer = 0 To UBound(txtStr)
            Dim workedString As String = ""
            workedString = scrollingText.Substring(idx) & " " & scrollingText.Substring(0, idx)
            txtStr(idx) = workedString
        Next
        Timer1.Interval = 5000
        Timer1.Enabled = True
        Timer1.Start()
        '''' SCROLLONG TEXT END
        Label2.Text = System.DateTime.Now

    End Sub

    Private Sub Save_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)

        da.Update(ds, "" + TextBox1.Text + "")
        MsgBox("Data sent", MsgBoxStyle.OkOnly, "Sucess")

    End Sub

    Private Sub DataGridView1_CellContentClick(ByVal sender As System.Object, ByVal e As System.Windows.Forms.DataGridViewCellEventArgs) Handles DataGridView1.CellContentClick

        Dim i, j As Integer
        i = DataGridView1.CurrentRow.Index
        RichTextBox1.Text = DataGridView1.Item(2, i).Value

    End Sub

    Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick

        Dim i, j As Integer
        i = DataGridView1.CurrentRow.Index

        '''' SCROLLONG TEXT START
        txtPos += 1
        Dim timerStr As String
        timerStr = txtStr(txtPos)
        RichTextBox1.Text = DataGridView1.Item(2, i).Value

        Label1.Text = timerStr
        If txtPos = UBound(txtStr) Then txtPos = -1
        '''' SCROLLONG TEXT START

    End Sub

End Class

Questionvery usefull Pin
marsze8-Mar-12 7:27
marsze8-Mar-12 7:27 
Generalasmo 449 Pin
dice13725-May-09 21:49
dice13725-May-09 21:49 
QuestionError -problem of code page transaltion from 1256-1252 Pin
yusuf_kumar4418-May-08 4:02
yusuf_kumar4418-May-08 4:02 
GeneralJust Another Thank You Pin
bear00729-Sep-07 23:54
bear00729-Sep-07 23:54 
Questionconvert? Pin
sharp_k16-Aug-07 10:22
sharp_k16-Aug-07 10:22 
GeneralExplain & show source code Pin
Olivier Oswald18-Jul-07 1:42
Olivier Oswald18-Jul-07 1:42 
QuestionHow to Identify Mail Language ? Pin
ArunkumarSundaravelu4-Jun-07 4:18
ArunkumarSundaravelu4-Jun-07 4:18 
QuestionAnd? Pin
John R. Shaw14-Apr-07 18:21
John R. Shaw14-Apr-07 18:21 
GeneralImprovement idea Pin
Mihai Nita13-Apr-07 6:42
Mihai Nita13-Apr-07 6:42 
GeneralRe: Improvement idea Pin
emad_awad13-Apr-07 8:21
emad_awad13-Apr-07 8:21 
GeneralRe: Improvement idea Pin
unlimited13-Apr-07 14:26
professionalunlimited13-Apr-07 14:26 
GeneralRe: Improvement idea Pin
Mihai Nita13-Apr-07 22:03
Mihai Nita13-Apr-07 22:03 
GeneralRe: Improvement idea Pin
emad_awad14-Apr-07 2:44
emad_awad14-Apr-07 2:44 
GeneralRe: Improvement idea Pin
Mihai Nita14-Apr-07 22:43
Mihai Nita14-Apr-07 22:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.