I'm needing to pull a CLOB field off of an Oracle table. It just so happens that the character string that was input into that CLOB field conforms to an XML Schema, with a few character replacement tweaks that I have to scrub. less than signs have been replaced with amp lt; and greater than signs have been replaced with amp gt;
I mention that because it throws a significant wrinkle into the processing that I have to do. I have to pull a CLOB out of a DataReader's stream, scrub the string, and then dump the string back into a (MemoryStream???) to pass into the XML DOM objects. I have no idea if this is even close to what I need to be doing, as I have very little experience working directly with Streams in this way.
Can someone tell me how they would go about pulling the string data out of the DataReader stream and passing it into an XMLTextReader as a separate stream for parsing?
A clustered index will store the records in approximately the order of the index. That can be useful if you read a range of records because they can all be read from the hard drive without performing extra seeks. Since the data can only be ordered one way, you can only have one clustered index on each table. Typically, the primary key is chosen as the clustered index, unless you have a good reason to choose another one.
And I get just "0" for the "abc" value in my file. I know the GridViewEvent method works because the variable
correctly sums up all the data each time it is called by My_RowDataBound2, and inserted into the appropriate place in the GridView on my ASP.NET Page. But "abc" just shows 0. abc is just supposed to show the number of rows contained in the GridView.
Basically, I just need to get the "x" value from
method, pass it to the
event which then passes it as a label to the ASP.NET Page. But the label is just showing "0"
0) You could read all the bytes and see whether or not they are all reasonable for a text (ASCII) file.
1) Use a Process to open the file with Notepad or another text editor and see what happens ask the user whether or not it looks like text.
Well you know, that is really not correct - or rather, too simplistic. UTF-8 doesn't require a byte order mark, for starters, and non-text can very easily start with an accidental byte order mark (since it is so short), you should really write a more complex test. I'll see what I can come up with, I may get back to you.
Edit: how about this?
staticbool IsText(byte data)
// this lower limit should be at least 4// for anything shorter, the chance that it is// binary but looks like text is far too bigif (data.Length < 16)
thrownew Exception("Data is too short to guess the encoding");
byte asciitester = 0;
for (int i = 0; i < data.Length; i++)
asciitester |= data[i];
if (asciitester < 0x80)
returntrue; //pure ASCII so probably textif (IsValidUTF8(data))
returntrue; // valid UTF-8 is probably textif (IsValidUTF16(data))
returntrue; // valid UTF-16 is probably text// possibly insert test for UTF-32, but it is exceedingly rarereturnfalse;
staticbool IsValidUTF16(byte data)
if ((data.Length & 1) == 1)
returnfalse; // odd length can not be UTF-16bool LittleEndian;
ushort BOM = BitConverter.ToUInt16(data, 0);
if (BOM == 0xFFFE)
LittleEndian = false;
elseif (BOM == 0xFEFF)
LittleEndian = true;
elsereturnfalse; // BOM is required for UTF-16bool SecondOfSurrogatePair = false;
for (int i = 2; i < data.Length; i+=2)
code = data[i] | (data[i + 1] << 8);
code = (data[i] << 8) | data[i + 1];
if (code < 0xDC00 || code > 0xDFFF)
SecondOfSurrogatePair = false;
if (code >= 0xD800 && code <= 0xDBFF)
SecondOfSurrogatePair = true;
return SecondOfSurrogatePair; // data ended but a second half a surrogate pair is expected
staticbool IsValidUTF8(byte data)
int index = 0;
if (data == 0xEF)
if (data != 0xBB ||
data != 0xBF)
index = 3;
int mode = 0; // number of bytes it needs to complete the current codefor (int i = index; i < data.Length; i++)
case0: // first byteif (data[i] < 0x80)
elseif (data[i] < 0xC0)
returnfalse; // 80 - C0 can not be a first byteelseif (data[i] < 0xE0)
mode = 1; // start of 2-byte sequenceelseif (data[i] < 0xF0)
mode = 2; // start of 3-byte sequenceelseif (data[i] < 0xF5)
mode = 3; // start of 4-byte sequenceelsereturnfalse; // invalid or restricted codesbreak;
case1: // some bytes neededcase2:
if (data[i] < 0x80 || data[i] >= 0xC0)
returnfalse; // incomplete code
returnfalse; // something got messed up
// it will get here if there are no more bytes,// so if it still needs some, it can't be validreturn mode == 0;
I suppose you could tests to see if there are any unprintable characters, too.
Btw, could I get some feedback on code this, please? Specifically on whether the result is correct as far as it tests, this test is "supposed to" be very optimistic in what it accepts as text, so I'm especially interested in whether there are valid UTF-16 or UTF-8 strings that get rejected by the test. I'm not exactly a Unicode expert..
Last Visit: 31-Dec-99 18:00 Last Update: 10-Aug-22 13:10