Click here to Skip to main content
15,889,216 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
I'm reading a WAV file with non-standard data in the chunk (basically, the WAV file has, rightly or wrongly, ID tags in the chunk - which is something I want my code to handle so that it can clean up these kinds of files).

Sometimes, the following code is working fine when processing a non-stand chunk in a WAV file. The first two ReadChars statements always work, but in the first DO WHILE loop where I'm searching for 'fmt ', the binaryreader statement (
VB
SubChunk1ID = brObject.ReadChars(4)
) sometimes (not always, depending on which file I'm processing) generates this error:

The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback' Parameter name: chars

The DO WHILE loop does one successful ReadChars, bringing back the text "JUNK" (which is actually there in the file). On the second pass, it generates the error. The hex values that it attempts to read in the second pass are "F0 00 00 00".

Obviously there is some unusual encoding that is exceeding the length of a standard string variable (that's how it looks to me). No surprise, I guess, because this is binary data, not text.

I'm thinking I need to do something else besides ReadChar, but I'm not very experience in VB.Net and don't know what alternative would be the best to work with in getting encoded data out of a filestream.

Or perhaps there's something I'm missing about how ReadChars behaves in C# vs VB.Net, which I'm not accounting for.

Any thoughts and comments would be welcomed!

VB
Class WaveIO

    Public IsWaveFile As Boolean = False

    ' *** The "RIFF" chunk descriptor
    Public ChunkID As String
    Public ChunkSize As Integer
    Public Format As String

    ' *** The "fmt" sub-chunk descriptor
    Public SubChunk1ID As String
    Public Subchunk1Size As UInt32
    Public AudioFormat As UInt16
    Public Channels As UInt16
    Public SampleRate As UInt32
    Public ByteRate As UInt32
    Public BlockAlign As UInt16
    Public BitsPerSample As UInt16
    Public SubChunk2ID As String
    Public Subchunk2Size As UInt32

    Public FmtSubChunkOffset As Integer
    Public DataSubChunkOffset As Integer


    ' *** Size of the entire file in bytes, minus 8 bytes for the two fields
    ' *** not included in the ChunkSize field (ChunkID and ChunkSize).
    Public FileLength As UInt32


    ' *** Size of the data in bytes, minus 44 bytes length of the "Riff" chunk 
    ' *** and the "fmt" sub-chunk.
    Public DataLength As UInt32


    Public Sub WaveHeader(ByVal strPath As String)

        Dim fsObject As FileStream
        Dim brObject As BinaryReader


        fsObject = New FileStream(strPath, FileMode.Open, FileAccess.Read)
        brObject = New BinaryReader(fsObject)

        Dim byteData() As Byte
        Dim charData() As Char


        ' *** To be a valid WAV file, the first 12 bytes must be the 'RIFF' chunk descriptor.
        ' ***
        ' ***   RIFF Chunk - The first 4 characters must be 'RIFF' (Resource Interchange File Format).
        ChunkID = brObject.ReadChars(4)

        If ChunkID <> "RIFF" Then

            IsWaveFile = False
            Exit Sub

        End If


        ' ***   RIFF Chunk - The next 4 characters will be the file's total size, less 
        ' ***                the first 8 bytes.
        ChunkSize = brObject.ReadInt32

        ' ***   RIFF Chunk - The next 4 characters should be 'WAVE' (Waveform Audio File Format)
        Format = brObject.ReadChars(4)

        If Format <> "WAVE" Then

            IsWaveFile = False
            Exit Sub

        End If


        ' *** The 'RIFF' chunk descriptor looks to be valid.

        ' *** What needs to be identified next is the 'fmt ' (Format) sub-chunk descriptor.
        ' *** This may or may not immediately follow the 'RIFF' chunk. Some programs and
        ' *** applications will embed other 'chunks' like 'INFO', 'JUNK', etc. (most likely
        ' *** meta data, i.e. ID tags.)

        ' *** Look for a sub-chunk id of 'fmt '.
        '        Do While (brObject.PeekChar <> -1)
        ReDim byteData(3)
        ReDim charData(3)

        Do While 1 = 1

            Try

                SubChunk1ID = brObject.ReadChars(4)

            Catch Ex As ArgumentException

                MessageBox.Show(Ex.Message, "ArgumentException", MessageBoxButton.OK, MessageBoxImage.Information)

            Catch ex As Exception

                MessageBox.Show(ex.Message, "Unexpected", MessageBoxButton.OK, MessageBoxImage.Information)

                Exit Sub

            End Try

            MessageBox.Show("SubChunk1ID = " & SubChunk1ID, "Test", MessageBoxButton.OK, MessageBoxImage.Information)
            MessageBox.Show("Position = " & fsObject.Position.ToString, "Test", MessageBoxButton.OK, MessageBoxImage.Information)
            MessageBox.Show("Length = " & fsObject.Length.ToString, "Test", MessageBoxButton.OK, MessageBoxImage.Information)


            ' *** Found the target...
            If SubChunk1ID = "fmt " Then

                Exit Do

            End If

            ' *** Back up 3 positions in the file so that it is reading byte-to-btye.
            fsObject.Position -= 3


            ' *** If the 'fmt ' sub-chunk has not been found before the end of the 
            ' *** file will be reached, then this is not a valid 'WAV' file.
            If fsObject.Position >= fsObject.Length Then

                IsWaveFile = False
                Exit Sub

            End If

        Loop
Posted

It looks like a character (UTF) encoding issue. See if the code below works:

Byte() bytes = brObject.ReadBytes(4)
Format = ASCIIEncoding.ASCII.GetString(bytes)
If Format <> "WAVE" Then

    IsWaveFile = False
    Exit Sub

End If
 
Share this answer
 
Comments
TheBitSlinger 27-Apr-11 17:18pm    
That was the issue, and ReadBytes worked! I'd been playing around with ReadBytes, but could not figure out how to decode the byte array. I'd tried Base64 decoding with no joy.
Sergey Alexandrovich Kryukov 28-Apr-11 0:08am    
Great catch! My 5.
--SA
Since you opened a Binary Reader.
I think you should use the ReadBytes method.
A char is not always 1 bytes, in case of unicode, it is 2 bytes.
Reading 4 unicode chars, is the same as reading 8 bytes.
If the byte pattern looks like a unicode character,
the ReadChars method might read more than 4 bytes.

You should always read bytes.
And then convert the bytes to a string, if it represents a string.
When converting you must specify the encoding format.
 
Share this answer
 
Comments
TheBitSlinger 27-Apr-11 17:19pm    
You were right! I'm guessing I could face the same issue with ReadInt32, ReadInt16, etc.?
Sergey Alexandrovich Kryukov 28-Apr-11 0:08am    
Misleading! Unicode itself does not define bytes per character, UTFs do.
Many think Unicode is 16-bit code. Wrong! (read http://unicode.org/).
UTF-8 character size can be 1, 2, 3 or 4 bytes (surprise?).
Even UTF-16 is not 16-bit code: it is a combination of 16-bit character points (of BMP - Base Multilingual Plane) and all hight code points each expressed as a pair of 16-bit values (called surrogate pairs).

Please, before posting an answer, learn things yourself. If you don't trust my facts, read http://unicode.org/.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900