A solution to this problem is not going to be pretty, even if we pretend that the reference counting solution that you suggested is going to work as expected (you would need to use something similar to
InterlockedIncrement[
^] to avoid additional locking around the code that increments and decrements the
waiting
variable). The problem is that even if you are certain that no other thread is holding the lock, and also that nobody is waiting on it, you cannot be sure that there are no threads that are about to grab that lock. Additionally, other threads would need to check if the lock has been destroyed before acquiring it - an operation that needs to be synchronized, too.
Generally, I avoid creating and destroying locks dynamically. I create all locks that I need in a constructor, and destroy them in the destructor. For more dynamic situations requiring synchronization I use interlocked instructions to avoid locking altogether.