Click here to Skip to main content
15,921,250 members
Home / Discussions / C#
   

C#

 
GeneralRe: Reflection with a List Pin
PIEBALDconsult5-Apr-10 15:30
mvePIEBALDconsult5-Apr-10 15:30 
Questionupdate twitter status Pin
Jassim Rahma5-Apr-10 11:58
Jassim Rahma5-Apr-10 11:58 
AnswerRe: update twitter status Pin
Not Active5-Apr-10 12:02
mentorNot Active5-Apr-10 12:02 
GeneralRe: update twitter status Pin
FyreWyrm5-Apr-10 15:52
FyreWyrm5-Apr-10 15:52 
AnswerRe: update twitter status Pin
Tony Richards5-Apr-10 12:02
Tony Richards5-Apr-10 12:02 
QuestionI hope this is a stupid question - Pulling CLOB data from a Stream, performing logic, and then spinning off a new Stream for further processing Pin
Alaric_5-Apr-10 10:16
professionalAlaric_5-Apr-10 10:16 
Questionsetting identity specification to yes from c# Pin
teknolog1235-Apr-10 8:14
teknolog1235-Apr-10 8:14 
AnswerRe: setting identity specification to yes from c# Pin
Dan Mos5-Apr-10 9:00
Dan Mos5-Apr-10 9:00 
GeneralRe: setting identity specification to yes from c# Pin
teknolog1235-Apr-10 9:06
teknolog1235-Apr-10 9:06 
GeneralRe: setting identity specification to yes from c# Pin
AspDotNetDev5-Apr-10 10:34
protectorAspDotNetDev5-Apr-10 10:34 
GeneralRe: setting identity specification to yes from c# Pin
teknolog1235-Apr-10 22:48
teknolog1235-Apr-10 22:48 
QuestionPass variable from GridViewRowEvent method to DataListItemEvent method in C# using ASP.NET Pin
chris.hagelstein5-Apr-10 6:43
chris.hagelstein5-Apr-10 6:43 
Questionftp server folders Pin
Priya Prk5-Apr-10 6:23
Priya Prk5-Apr-10 6:23 
AnswerRe: ftp server folders Pin
Not Active5-Apr-10 6:47
mentorNot Active5-Apr-10 6:47 
GeneralRe: ftp server folders Pin
Priya Prk5-Apr-10 21:16
Priya Prk5-Apr-10 21:16 
GeneralRe: ftp server folders Pin
Not Active6-Apr-10 2:47
mentorNot Active6-Apr-10 2:47 
Questionget the file type of a file Pin
aleroot5-Apr-10 4:32
aleroot5-Apr-10 4:32 
AnswerRe: get the file type of a file [modified] Pin
PIEBALDconsult5-Apr-10 4:53
mvePIEBALDconsult5-Apr-10 4:53 
GeneralRe: get the file type of a file Pin
aleroot5-Apr-10 7:15
aleroot5-Apr-10 7:15 
GeneralRe: get the file type of a file [modified] Pin
harold aptroot5-Apr-10 7:19
harold aptroot5-Apr-10 7:19 
Well you know, that is really not correct - or rather, too simplistic. UTF-8 doesn't require a byte order mark, for starters, and non-text can very easily start with an accidental byte order mark (since it is so short), you should really write a more complex test. I'll see what I can come up with, I may get back to you.

Edit: how about this?

static bool IsText(byte[] data)
{
    // this lower limit should be at least 4
    // for anything shorter, the chance that it is
    // binary but looks like text is far too big
    if (data.Length < 16)
        throw new Exception("Data is too short to guess the encoding");

    byte asciitester = 0;
    for (int i = 0; i < data.Length; i++)
        asciitester |= data[i];

    if (asciitester < 0x80)
        return true; //pure ASCII so probably text

    if (IsValidUTF8(data))
        return true; // valid UTF-8 is probably text

    if (IsValidUTF16(data))
        return true; // valid UTF-16 is probably text

    // possibly insert test for UTF-32, but it is exceedingly rare

    return false;
}
static bool IsValidUTF16(byte[] data)
{
    if ((data.Length & 1) == 1)
        return false;   // odd length can not be UTF-16
    bool LittleEndian;
    ushort BOM = BitConverter.ToUInt16(data, 0);
    if (BOM == 0xFFFE)
        LittleEndian = false;
    else if (BOM == 0xFEFF)
        LittleEndian = true;
    else
        return false;   // BOM is required for UTF-16

    bool SecondOfSurrogatePair = false;

    for (int i = 2; i < data.Length; i+=2)
    {
        int code;
        if (LittleEndian)
            code = data[i] | (data[i + 1] << 8);
        else
            code = (data[i] << 8) | data[i + 1];

        if (SecondOfSurrogatePair)
        {
            if (code < 0xDC00 || code > 0xDFFF)
                return false;
            SecondOfSurrogatePair = false;
        }
        else
        {
            if (code >= 0xD800 && code <= 0xDBFF)
                SecondOfSurrogatePair = true;
        }
    }

    return SecondOfSurrogatePair; // data ended but a second half a surrogate pair is expected
}
static bool IsValidUTF8(byte[] data)
{
    int index = 0;
    if (data[0] == 0xEF)
    {
        if (data[1] != 0xBB ||
            data[2] != 0xBF)
            return false;
        index = 3;
    }
    int mode = 0; // number of bytes it needs to complete the current code
    for (int i = index; i < data.Length; i++)
    {
        switch (mode)
        {
            case 0: // first byte
                if (data[i] < 0x80)
                    break;
                else if (data[i] < 0xC0)
                    return false; // 80 - C0 can not be a first byte
                else if (data[i] < 0xE0)
                    mode = 1; // start of 2-byte sequence
                else if (data[i] < 0xF0)
                    mode = 2; // start of 3-byte sequence
                else if (data[i] < 0xF5)
                    mode = 3; // start of 4-byte sequence
                else
                    return false; // invalid or restricted codes
                break;

            case 1: // some bytes needed
            case 2:
            case 3:
                if (data[i] < 0x80 || data[i] >= 0xC0)
                    return false; // incomplete code
                mode--;
                break;
            default:
                return false; // something got messed up
        }
    }
    // it will get here if there are no more bytes,
    // so if it still needs some, it can't be valid
    return mode == 0;
}


I suppose you could tests to see if there are any unprintable characters, too.
modified on Monday, April 5, 2010 2:02 PM

GeneralRe: get the file type of a file Pin
harold aptroot5-Apr-10 12:45
harold aptroot5-Apr-10 12:45 
GeneralRe: get the file type of a file Pin
Dave Kreskowiak5-Apr-10 7:54
mveDave Kreskowiak5-Apr-10 7:54 
GeneralRe: get the file type of a file Pin
PIEBALDconsult5-Apr-10 8:48
mvePIEBALDconsult5-Apr-10 8:48 
GeneralRe: get the file type of a file Pin
Dave Kreskowiak5-Apr-10 10:19
mveDave Kreskowiak5-Apr-10 10:19 
GeneralRe: get the file type of a file Pin
PIEBALDconsult5-Apr-10 12:02
mvePIEBALDconsult5-Apr-10 12:02 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.