Click here to Skip to main content
15,904,655 members
Home / Discussions / C#
   

C#

 
Questionftp server folders Pin
Priya Prk5-Apr-10 6:23
Priya Prk5-Apr-10 6:23 
AnswerRe: ftp server folders Pin
Not Active5-Apr-10 6:47
mentorNot Active5-Apr-10 6:47 
GeneralRe: ftp server folders Pin
Priya Prk5-Apr-10 21:16
Priya Prk5-Apr-10 21:16 
GeneralRe: ftp server folders Pin
Not Active6-Apr-10 2:47
mentorNot Active6-Apr-10 2:47 
Questionget the file type of a file Pin
aleroot5-Apr-10 4:32
aleroot5-Apr-10 4:32 
AnswerRe: get the file type of a file [modified] Pin
PIEBALDconsult5-Apr-10 4:53
mvePIEBALDconsult5-Apr-10 4:53 
GeneralRe: get the file type of a file Pin
aleroot5-Apr-10 7:15
aleroot5-Apr-10 7:15 
GeneralRe: get the file type of a file [modified] Pin
harold aptroot5-Apr-10 7:19
harold aptroot5-Apr-10 7:19 
Well you know, that is really not correct - or rather, too simplistic. UTF-8 doesn't require a byte order mark, for starters, and non-text can very easily start with an accidental byte order mark (since it is so short), you should really write a more complex test. I'll see what I can come up with, I may get back to you.

Edit: how about this?

static bool IsText(byte[] data)
{
    // this lower limit should be at least 4
    // for anything shorter, the chance that it is
    // binary but looks like text is far too big
    if (data.Length < 16)
        throw new Exception("Data is too short to guess the encoding");

    byte asciitester = 0;
    for (int i = 0; i < data.Length; i++)
        asciitester |= data[i];

    if (asciitester < 0x80)
        return true; //pure ASCII so probably text

    if (IsValidUTF8(data))
        return true; // valid UTF-8 is probably text

    if (IsValidUTF16(data))
        return true; // valid UTF-16 is probably text

    // possibly insert test for UTF-32, but it is exceedingly rare

    return false;
}
static bool IsValidUTF16(byte[] data)
{
    if ((data.Length & 1) == 1)
        return false;   // odd length can not be UTF-16
    bool LittleEndian;
    ushort BOM = BitConverter.ToUInt16(data, 0);
    if (BOM == 0xFFFE)
        LittleEndian = false;
    else if (BOM == 0xFEFF)
        LittleEndian = true;
    else
        return false;   // BOM is required for UTF-16

    bool SecondOfSurrogatePair = false;

    for (int i = 2; i < data.Length; i+=2)
    {
        int code;
        if (LittleEndian)
            code = data[i] | (data[i + 1] << 8);
        else
            code = (data[i] << 8) | data[i + 1];

        if (SecondOfSurrogatePair)
        {
            if (code < 0xDC00 || code > 0xDFFF)
                return false;
            SecondOfSurrogatePair = false;
        }
        else
        {
            if (code >= 0xD800 && code <= 0xDBFF)
                SecondOfSurrogatePair = true;
        }
    }

    return SecondOfSurrogatePair; // data ended but a second half a surrogate pair is expected
}
static bool IsValidUTF8(byte[] data)
{
    int index = 0;
    if (data[0] == 0xEF)
    {
        if (data[1] != 0xBB ||
            data[2] != 0xBF)
            return false;
        index = 3;
    }
    int mode = 0; // number of bytes it needs to complete the current code
    for (int i = index; i < data.Length; i++)
    {
        switch (mode)
        {
            case 0: // first byte
                if (data[i] < 0x80)
                    break;
                else if (data[i] < 0xC0)
                    return false; // 80 - C0 can not be a first byte
                else if (data[i] < 0xE0)
                    mode = 1; // start of 2-byte sequence
                else if (data[i] < 0xF0)
                    mode = 2; // start of 3-byte sequence
                else if (data[i] < 0xF5)
                    mode = 3; // start of 4-byte sequence
                else
                    return false; // invalid or restricted codes
                break;

            case 1: // some bytes needed
            case 2:
            case 3:
                if (data[i] < 0x80 || data[i] >= 0xC0)
                    return false; // incomplete code
                mode--;
                break;
            default:
                return false; // something got messed up
        }
    }
    // it will get here if there are no more bytes,
    // so if it still needs some, it can't be valid
    return mode == 0;
}


I suppose you could tests to see if there are any unprintable characters, too.
modified on Monday, April 5, 2010 2:02 PM

GeneralRe: get the file type of a file Pin
harold aptroot5-Apr-10 12:45
harold aptroot5-Apr-10 12:45 
GeneralRe: get the file type of a file Pin
Dave Kreskowiak5-Apr-10 7:54
mveDave Kreskowiak5-Apr-10 7:54 
GeneralRe: get the file type of a file Pin
PIEBALDconsult5-Apr-10 8:48
mvePIEBALDconsult5-Apr-10 8:48 
GeneralRe: get the file type of a file Pin
Dave Kreskowiak5-Apr-10 10:19
mveDave Kreskowiak5-Apr-10 10:19 
GeneralRe: get the file type of a file Pin
PIEBALDconsult5-Apr-10 12:02
mvePIEBALDconsult5-Apr-10 12:02 
AnswerRe: get the file type of a file Pin
harold aptroot5-Apr-10 5:11
harold aptroot5-Apr-10 5:11 
GeneralRe: get the file type of a file Pin
PIEBALDconsult5-Apr-10 9:29
mvePIEBALDconsult5-Apr-10 9:29 
GeneralRe: get the file type of a file Pin
harold aptroot5-Apr-10 12:35
harold aptroot5-Apr-10 12:35 
AnswerRe: get the file type of a file Pin
AspDotNetDev5-Apr-10 11:40
protectorAspDotNetDev5-Apr-10 11:40 
QuestionIs there any lnk parser in C#? Pin
newcoder19995-Apr-10 4:07
newcoder19995-Apr-10 4:07 
QuestionWhite Flicker When Observing an Application Using Double Buffer Over RDP Pin
Catfish5405-Apr-10 3:33
Catfish5405-Apr-10 3:33 
QuestionFileSystemWatcher HELP PLEASE !! Pin
Rikq4-Apr-10 21:21
Rikq4-Apr-10 21:21 
AnswerRe: FileSystemWatcher HELP PLEASE !! Pin
OriginalGriff4-Apr-10 21:54
mveOriginalGriff4-Apr-10 21:54 
AnswerRe: FileSystemWatcher HELP PLEASE !! Pin
FyreWyrm5-Apr-10 16:12
FyreWyrm5-Apr-10 16:12 
QuestionHow to send Email in C# through the Exchange Server? Pin
ravis194-Apr-10 21:00
ravis194-Apr-10 21:00 
AnswerRe: How to send Email in C# through the Exchange Server? Pin
Abhinav S4-Apr-10 21:49
Abhinav S4-Apr-10 21:49 
QuestionHow to ATL COM exe(out-of-proc) in .net? Pin
SRKSHOME4-Apr-10 20:32
SRKSHOME4-Apr-10 20:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.