Click here to Skip to main content
15,867,308 members
Articles / Multimedia / DirectX

DirectShow Filters Development Part 3: Transform Filters

Rate me:
Please Sign up or sign in to vote.
4.83/5 (20 votes)
15 Mar 2011CPOL6 min read 127.5K   4.6K   51   57
A text overlay filter and a JPEG/JPEG2000 encoder using transform filters.

Introduction

Transform filters are probably the most interesting pieces of the DirectShow puzzle. They encapsulate complex image and video processing algorithms. From a filter development point of view, they are not harder to implement than others; however, they do require some additional coding and method overrides. As with rendering and source filters, transform filters also have base classes from which you should inherit when implementing your custom work.

Transform filters have at least two pins, one input pin and one output pin. Transform filters are divided into two categories - copy-transform filters and in-place transform filters. As their name implies, a copy-transform filter takes the data from the input pin, transforms it, and writes the outcome to the output pin, whereas an in-place filter performs its work on the input sample and passes it on to the output filter.

DirectShow provides three base classes for writing transform filters:

  1. CTransformFilter - base class for copy-transform filters
  2. CTransInPlaceFilter - base class for in-place transforms
  3. CVideoTransfromFilter - designed for video decoding and has built-in quality control management for dropping frames in case of flooding

I will cover the first two classes in this article: the CTransInPlace descendent will be used for a text overlay filter, and CTransformFilter will be used for a JPEG/JPEG2000 encoder.

Before we continue, you should take a look at part 1 of this series as the filter development prerequisites, filter registration, and filter debugging are all the same.

Text Overlay Filter

A text overlay filter adds some user defined text to each and every frame that goes through the filter. It can be used for displaying subtitles or a logo. Adding text to the video frame does not change its media subtype or format, therefore an in-place transform suits perfectly. I will be using GDI+ for overlays, as it provides a convenient API for creating in-place bitmaps and drawing characters on a bitmap.

C++
using namespace Gdiplus;
using namespace std;

class CTextOverlay : public CTransInPlaceFilter, public ITextAdditor
{
public:
        DECLARE_IUNKNOWN;

        CTextOverlay(LPUNKNOWN pUnk, HRESULT *phr);
        virtual ~CTextOverlay(void);

        virtual HRESULT CheckInputType(const CMediaType* mtIn);
        virtual HRESULT SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt);
        virtual HRESULT Transform(IMediaSample *pSample);

        static CUnknown *WINAPI CreateInstance(LPUNKNOWN pUnk, HRESULT *phr); 
        STDMETHODIMP NonDelegatingQueryInterface(REFIID riid, void ** ppv);

        STDMETHODIMP AddTextOverlay(WCHAR* text, DWORD id, RECT position, 
                     COLORREF color = RGB(255, 255, 255), float fontSize = 20);
        STDMETHODIMP Clear(void);
        STDMETHODIMP Remove(DWORD id);

private:
        ULONG_PTR m_gdiplusToken;
        VIDEOINFOHEADER m_videoInfo;
        PixelFormat m_pixFmt;
        int m_stride;
        map<DWORD, Overlay*> m_overlays;
};

The only pure virtual method is the Transform method and it must be implemented in your class. In addition, I have also overridden the CheckInputType called for each media type during the pin connection negotiation. Since a transform filter has two pins at least, SetMediaType has the direction argument which indicates whether the connection is performed on the input or the output pin. You may want to save both the input and output video headers. In this case, I only need the input video header since it is exactly the same as the output:

C++
HRESULT CTextOverlay::SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt)
{
   if(direction == PINDIR_INPUT)
   {
      VIDEOINFOHEADER* pvih = (VIDEOINFOHEADER*)pmt->pbFormat;
      m_videoInfo = *pvih;
      HRESULT hr = GetPixleFormat(m_videoInfo.bmiHeader.biBitCount, &m_pixFmt);
      if(FAILED(hr))
      {
             return hr;
      }

      BITMAPINFOHEADER bih = m_videoInfo.bmiHeader;
      m_stride = bih.biBitCount / 8 * bih.biWidth;
   }

   return S_OK;
}

The filter accepts RGB only formats with 15, 16, 24, and 32 bits per pixel, and using the GDI+ Bitmap class, it is possible to create in-place bitmap objects without any buffer copy. After that, I create a graphics object from that bitmap and call the Graphics::DrawString method to draw the user defined text on the bitmap:

C++
HRESULT CTextOverlay::Transform(IMediaSample *pSample)
{
       CAutoLock lock(m_pLock);

       BYTE* pBuffer = NULL;
       Status s = Ok;
       map<DWORD, Overlay*>::iterator it;

       HRESULT hr = pSample->GetPointer(&pBuffer);
       if(FAILED(hr))
       {
           return hr;
       }

       BITMAPINFOHEADER bih = m_videoInfo.bmiHeader;
       Bitmap bmp(bih.biWidth, bih.biHeight, m_stride, m_pixFmt, pBuffer);
       Graphics g(&bmp);
       
       for ( it = m_overlays.begin() ; it != m_overlays.end(); it++ )
       {
          Overlay* over = (*it).second;
          SolidBrush brush(over->color);
          Font font(FontFamily::GenericSerif(), over->fontSize);
          s = g.DrawString(over->text, -1, &font, over->pos, 
                           StringFormat::GenericDefault(), &brush);
          if(s != Ok)
          {
             TCHAR msg[100];
             wsprintf(L"Failed to draw text : %s", over->text);
             ::OutputDebugString(msg);
          }
       }

    return S_OK;
}

Using the ITextAditor interface, you can add a text overlay with ID, remove them by ID, or remove all. Each overlay contains the text, the bounding rectangle, color, and font size:

C++
DECLARE_INTERFACE_(ITextAdditor, IUnknown)
{
   STDMETHOD(AddTextOverlay)(WCHAR* text, DWORD id, RECT position, 
             COLORREF color, float fontSize) PURE;
   STDMETHOD(Clear)(void) PURE;
   STDMETHOD(Remove)(DWORD id) PURE;
};

Overlay objects are stored in a map in a thread safe manner so you can freely add and remove overlays during playback. Thread-safety in the DirectShow framework is achieved using Critical Sections and the CAutoLock class which is usually declared in the beginning of the method, and when going out of scope at the end of the method - the Critical Section is released.

JPEG / JPEG2000 Encoder

It took me a while to decide what type of video encoding to implement, and eventually I decided to make a simple intra-frame encoder - each video frame is encoded with no reference to the previous or next frame. This type of encoding is easier to implement than inter-frame encoding standards like MPEG4 or H264, but suffers from larger stream throughput since there is much redundant pixel information between neighbor frames. I also created a base class for other intra-frame encoder types, and you can easily swap the implementation by inheriting from CBaseCompressor and updating the Factory method which creates the concrete implementations:

C++
struct CBaseCompressor
{
     virtual HRESULT Init(BITMAPINFOHEADER* pBih) PURE;
 
     virtual HRESULT Compress(BYTE* pInput, DWORD inputSize, BYTE* pOutput, 
     DWORD* outputSize) PURE;
 
     virtual HRESULT SetQuality(BYTE quality) PURE;
 
     virtual HRESULT GetMediaSubTypeAndCompression(GUID* mediaSubType,
     DWORD* compression) PURE;
};

By default, the encoding standard is JPEG, and it is based on a code I found here on CodeProject. Using the IJ2KEncoder::SetEncoderType method, you can change the implementation to the JPEG2000 encoding standard which is based on the OpenJpeg library. Please note that if one of the filter's pins is connected, you cannot change the encoder implementation, so it is best to set the desired encoding algorithm right after filter creation.

JPEG2000 and Media Sub Types

When using a JPEG compressor, DirectShow provides a built-in media sub type called MEDIASUBTYPE_MJPG, and it is declared in the uuids.h file. Regarding JPEG2000, I could not find any appropriate GUID, so I created one using the following macro definition:

C++
DEFINE_GUID( MEDIASUBTYPE_MJ2C, MAKEFOURCC('M', 'J', '2', 'C'),
             0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 
             0x00, 0x38, 0x9b, 0x71);

When using the BITMAPINFOHEADER structure for compressed images, you have to set the biCompression field to MAKEFOURCC('M', 'J', '2', 'C'). This way, the filter can connect to JPEG2000 decoders, like this one.

MJ2C means a JPEG2000 code stream, and it is actually a motion JPEG2000 definition where each frame consists of compressed image data. Another standard is J2K, and it is usually used for still image encoding and also contains headers.

Although JPEG2000 provides better compression ratios and better image quality, especially at lower bit rates, it is more CPU intensive than JPEG and hence less suitable for large resolution videos. During a research I made on JPEG2000 implementations, I found a project called CUJ2K - a JPEG2000 implementation based on CUDA - a GPU based API developed by NVIDIA. Since this library was designed for still images located on the hard drive, it uses the command line to pass the source and destination paths for the images. To make use of it for in-memory buffers, it required some additional work, so I decided to go with OpenJpeg; however, it is worth looking at if you need better performance.

Filter Implementation

To implement a transform filter, you have to implement six methods:

  • Transform - receives input and output media samples.
  • CheckInputType - checks whether the input pin can connect to an upstream filter.
  • CheckTransform - checks whether a transformation is possible between input and output media types.
  • DecideBufferSixe - sets the memory buffer size for the output media samples.
  • GetMediaType - returns the media type used to connect the output pin with the downstream filter.
  • SetMediaType - called when the input and output pins are successfully connected.
C++
class CJ2kCompressor : public CTransformFilter, public IJ2KEncoder
{
public:
       DECLARE_IUNKNOWN;

       CJ2kCompressor(LPUNKNOWN pUnk, HRESULT *phr);
       virtual ~CJ2kCompressor(void);

       // CTransfromFilter overrides
       virtual HRESULT Transform(IMediaSample * pIn, IMediaSample *pOut);
       virtual HRESULT CheckInputType(const CMediaType* mtIn);
       virtual HRESULT CheckTransform(const CMediaType* mtIn, 
                       const CMediaType* mtOut);
       virtual HRESULT DecideBufferSize(IMemAllocator * pAlloc, 
                       ALLOCATOR_PROPERTIES *pProp);
       virtual HRESULT GetMediaType(int iPosition, CMediaType *pMediaType);
       virtual HRESULT SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt);

       static CUnknown * WINAPI CreateInstance(LPUNKNOWN pUnk, HRESULT *pHr);
       STDMETHODIMP NonDelegatingQueryInterface(REFIID riid, void ** ppv);

       // IJ2KEncoder
       STDMETHODIMP SetQuality(BYTE quality);
       STDMETHODIMP SetEncoderType(EncoderType encoderType);

private:
       VIDEOINFOHEADER m_VihIn;   
       VIDEOINFOHEADER m_VihOut; 
       CBaseCompressor* m_encoder;
};

The Transform method implementation is pretty straightforward: I get the buffer pointers from the input and output media samples and then pass them to the CBaseCompressor implementation. After that, I set the actual output media sample size and set the sync point to true since every frame is a reference frame:

C++
HRESULT CJ2kCompressor::Transform(IMediaSample* pIn, IMediaSample* pOut)
{
     HRESULT hr = S_OK;

     BYTE *pBufIn, *pBufOut;
     long sizeIn;
     DWORD sizeOut;

     hr = pIn->GetPointer(&pBufIn);
     if(FAILED(hr))
     {
        return hr;
     }

     sizeIn = pIn->GetActualDataLength();

     hr = pOut->GetPointer(&pBufOut);
     if(FAILED(hr))
     {
        return hr;
     }
 
     hr = m_encoder->Compress(pBufIn, sizeIn, pBufOut, &sizeOut);

     if(FAILED(hr))
     {
        return hr;
     }

     hr = pOut->SetActualDataLength(sizeOut);

     if(FAILED(hr))
     {
        return hr;
     }

     hr = pOut->SetSyncPoint(TRUE);

     return hr;
}

Filter Registration

Since this filter is a video encoder, it should be registered in the video compressor filters category, and this is done using the IFilterMapper object:

C++
STDAPI RegisterFilters( BOOL bRegister )
{
   HRESULT hr = NOERROR;
   WCHAR achFileName[MAX_PATH];
   char achTemp[MAX_PATH];
   ASSERT(g_hInst != 0);

   if( 0 == GetModuleFileNameA(g_hInst, achTemp, sizeof(achTemp))) 
   {
          return AmHresultFromWin32(GetLastError());
   }

   MultiByteToWideChar(CP_ACP, 0L, achTemp, lstrlenA(achTemp) + 1, 
                       achFileName, NUMELMS(achFileName));

   hr = CoInitialize(0);
   if(bRegister)
   {
          hr = AMovieSetupRegisterServer(CLSID_Jpeg2000Encoder, 
                  J2K_FILTER_NAME, achFileName, L"Both", L"InprocServer32");
   }

   if( SUCCEEDED(hr) )
   {
      IFilterMapper2 *fm = 0;
      hr = CoCreateInstance( CLSID_FilterMapper2, NULL, 
              CLSCTX_INPROC_SERVER, IID_IFilterMapper2, (void **)&fm);

      if( SUCCEEDED(hr) )
      {
         if(bRegister)
         {
           IMoniker *pMoniker = 0;
           REGFILTER2 rf2;
           rf2.dwVersion = 1;
           rf2.dwMerit = MERIT_DO_NOT_USE;
           rf2.cPins = 2;
            rf2.rgPins = psudPins;
           hr = fm->RegisterFilter(CLSID_Jpeg2000Encoder, J2K_FILTER_NAME, 
                &pMoniker, &CLSID_VideoCompressorCategory, NULL, &rf2);
         }
         else
         {
            hr = fm->UnregisterFilter(&CLSID_VideoCompressorCategory, 0, 
                                      CLSID_Jpeg2000Encoder);
         }
      }

      if(fm)
         fm->Release();
   }

   if( SUCCEEDED(hr) && !bRegister )
   {
          hr = AMovieSetupUnregisterServer( CLSID_Jpeg2000Encoder );
   }

   CoFreeUnusedLibraries();
   CoUninitialize();
   return hr;
}

STDAPI DllRegisterServer() 
{
       return RegisterFilters(TRUE);
}

STDAPI DllUnregisterServer()
{
       return RegisterFilters(FALSE);
}

References

  1. Programming DirectShow for Digital Video and TV
  2. Writing transform filters
  3. OpenJpeg library
  4. Tony Lin JPEG codec

History

  • 23.2.2011
    • Initial version
  • 13.3.2011
    • Changed source to use smart pointers
    • Fixed SetQuality implementation

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Israel Israel
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionDoes the Transform() function decrease the frame rate of real-time video? Pin
Member 1377552410-Jul-18 21:47
Member 1377552410-Jul-18 21:47 
QuestionProblems in vs2015 when compile textoverlay code Pin
Member 105667296-Dec-16 22:40
Member 105667296-Dec-16 22:40 
QuestionFFT Pin
Vaclav_17-Aug-13 5:38
Vaclav_17-Aug-13 5:38 
AnswerRe: FFT Pin
roman_gin17-Aug-13 8:42
roman_gin17-Aug-13 8:42 
GeneralRe: FFT Pin
Vaclav_17-Aug-13 16:08
Vaclav_17-Aug-13 16:08 
Questionwhat function does the for loop present in transform function do??? Pin
D-vyah1-Aug-13 23:59
D-vyah1-Aug-13 23:59 
GeneralOverlay continuous changing GPS data instead of user defined text Pin
D-vyah28-Jul-13 23:24
D-vyah28-Jul-13 23:24 
QuestionClass not registered, error Pin
sdancer755-Jul-13 23:11
sdancer755-Jul-13 23:11 
QuestionSecond time crashing. Pin
Chrisen133714-Dec-12 4:53
Chrisen133714-Dec-12 4:53 
Question4 problems for adding text by using this please help me Pin
Praveen S28-Sep-12 20:00
Praveen S28-Sep-12 20:00 
QuestionText Overlay Filter Pin
weirdProgrammer-225-Sep-12 5:07
weirdProgrammer-225-Sep-12 5:07 
AnswerRe: Text Overlay Filter Pin
weirdProgrammer-226-Sep-12 4:15
weirdProgrammer-226-Sep-12 4:15 
QuestionSo, what about inter-frame encoding? Pin
are_all_nicks_taken_or_what21-Sep-12 8:55
are_all_nicks_taken_or_what21-Sep-12 8:55 
AnswerRe: So, what about inter-frame encoding? Pin
roman_gin21-Sep-12 23:25
roman_gin21-Sep-12 23:25 
QuestionIt works fine but remove function get heap crruption errors? Pin
aliiz28-Mar-12 21:33
aliiz28-Mar-12 21:33 
Hi,

The text Overlay filter's AddTextOverlay function works fine but when I remove the id from this Overlay the compiler shows the error for heap curruption.

Can you tell me how to use the remove and clear function.

Thanks
AnswerRe: It works fine but remove function get heap crruption errors? Pin
roman_gin29-Mar-12 21:22
roman_gin29-Mar-12 21:22 
GeneralRe: It works fine but remove function get heap crruption errors? Pin
morley832-Apr-12 2:14
morley832-Apr-12 2:14 
GeneralRe: It works fine but remove function get heap crruption errors? Pin
roman_gin3-Apr-12 22:04
roman_gin3-Apr-12 22:04 
QuestionLooking for something very similar. Pin
ranma5006-Dec-11 1:21
ranma5006-Dec-11 1:21 
AnswerRe: Looking for something very similar. Pin
roman_gin6-Dec-11 2:15
roman_gin6-Dec-11 2:15 
GeneralRe: Looking for something very similar. Pin
ranma5006-Dec-11 2:21
ranma5006-Dec-11 2:21 
GeneralRe: Looking for something very similar. Pin
roman_gin6-Dec-11 23:15
roman_gin6-Dec-11 23:15 
GeneralRe: Looking for something very similar. Pin
ranma5006-Dec-11 23:31
ranma5006-Dec-11 23:31 
GeneralRe: Looking for something very similar. Pin
roman_gin12-Dec-11 7:14
roman_gin12-Dec-11 7:14 
QuestionI have some errors to run TextOverlay on my PC. Is it possible to send me upgraded projects or TextOverlayFilter.dll ? Pin
Zagros2115-Nov-11 9:35
Zagros2115-Nov-11 9:35 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.