First of all, "sequential order" makes no sense at all. This is just a matter of practical and tactical considerations. Besides, we have no idea on your current knowledge level and skills. You can do it all in parallel, in certain chunks, taking into account some dependencies. But learning those dependencies will be a part of general learning.
You need to learn the concepts of media file containers and codecs:
http://en.wikipedia.org/wiki/Container_format_%28digital%29[
^],
http://en.wikipedia.org/wiki/Comparison_of_container_formats[
^],
http://en.wikipedia.org/wiki/Codec[
^],
http://en.wikipedia.org/wiki/List_of_codecs[
^].
You can use different approaches to implementation of codecs and support for different containers. You can learn how codecs are registered in the system of your choice. Note that you did not even report the system you want to deal with (strictly speaking, C# does not 100% assume Windows). As an alternative, you can implement the system of your own which does not depend on the codec registration or implements your own registration or plug-in architecture. This is the way of such products as FFMpeg, libavcodec or VLC; see, for example:
http://en.wikipedia.org/wiki/Ffmpeg[
^],
http://ffmpeg.org/[
^],
http://en.wikipedia.org/wiki/Libavcodec[
^],
http://libav.org/[
^],
http://en.wikipedia.org/wiki/VLC_media_player[
^],
https://videolan.org/vlc/[
^].
The products referenced above are open-source, so you can take a look at them to see what's involved. It can also give you a good idea what do you need to learn.
So, when you know the basics mentioned above, you would need to pick some of containers and publicly available compression standards, you would need to learn them and try to implement at least on the parsing level. By this time, you should get a pretty good idea on how hard or easy this work is.
In parallel, you need to learn audio and video generation sub-systems for the OS you want to deal with. For Windows, it will also include DirectX. You may need to learn WPF if you want to use WPF for Windows. You need to make some decision on what OP APIs to use and develop some working prototypes to make sure you have enough performance and reliability. On Windows, you would need to learn MMSystem, and, in particular, multimedia timers.
On I did not even count general-purpose programming topics, such as UI development using some UI libraries and threading.
On top of it, you should also keep in mind that the players can play streaming media from a network. So, eventually you will need to use streaming protocols and good deal of networking:
http://en.wikipedia.org/wiki/Media_streaming[
^].
This is very preliminary and probably incomplete plan. You will probably need to add a lot more detail in it, which is only possible if you have done a big part of learning. Well, as always…
Good luck,
—SA