|
|
|
|
|
This is now solved.
First - it was a core switching issue. The code to confine to a single core has to be called in advance - I think possibly the thread has to sleep first before the system can assign it to a new core, at any rate doesn't work to just set it immediately before every time check and release it again afterwards. Also there was a bug in my code when checking to see if hte time was going backwards forgot to check if the previous time was recorded for the same thread, the dTimeWas should have been a thread local variable.
But - even after fixing all that, and tied to a single core, the timing was still inaccurate. It was monotonic but inaccurately timed, the reported time in ms sometimes passed more quickly than real time and sometimes passed more slowly.
I could check this by doing a real time recording to audio of the notes played by my app - which according to the high performance counter were played at equally spaced 100 ms intervals - but when you looked at the recording the actual recorded times were offset sometimes as much as 30 ms from the previously recorded note - had one recording that recorded an 80 ms note followed by a 110 ms note when the high precision timer said that they were all exactly 100 ms to within sub millisecond precision.
Finally fixed it by looking up the interrupt timer which is available to every user mode process as a volatile area of shared memory in a structure called KUSER_SHARED_DATA in the same location in the address space in every process.
This timer is highly accurate, not just sub-millisecond, it is also well sub-microsecond on my laptop anyway. It also records the time correctly. And there is almost no overhead involved in looking it up as it is only a memory look up, just like accessing any other area of memory.
Details here:
QueryPerformanceCounter-inaccurate-timing - SOLVED[^]
|
|
|
|
|
Hi there,
I'm using the high performance counter for timing musical notes - and have run into a problem. It doesn't time them accurately. What should be a regular rhythm is irregular.
When I debug to find out what is happening, then the numbers returned by QueryPerformanceCounter(..) sometimes change direction - in this code below, if I put a break point on the
if(ddwt<ddwt_was)
ddwt=ddwt;
then it gets triggered.
As you see, I have requested that the thread runs on core 1 of dual core machine. I'm testing it on a laptop with two processors and two cores in each processor.
Is SetThreadAffinityMask(..) perhaps not enough to ensure that the thread runs on a single core? Or is there some other reason for it?
Is there any way to fix it so I can get sub-millisecond timing working accurately on any computer?
Thanks for your help.
LARGE_INTEGER HPT_PerformanceCount,HPT_PerformanceFrequency;
int found_HighPerformanceTimer_capabilities;
BOOL bHasHighPerformanceTimer;
double HighPerformanceTimer(void)
{
DWORD threadAffMask=SetThreadAffinityMask(GetCurrentThread(),1);
DWORD threadAffMaskNew=0;
if(found_HighPerformanceTimer_capabilities==0)
{
bHasHighPerformanceTimer=QueryPerformanceFrequency(&HPT_PerformanceFrequency);
found_HighPerformanceTimer_capabilities=1;
}
if(HPT_PerformanceFrequency.QuadPart==0)
{
SetThreadAffinityMask(GetCurrentThread(),threadAffMask );
return timeGetTime();
}
QueryPerformanceCounter(&HPT_PerformanceCount);
threadAffMaskNew=SetThreadAffinityMask(GetCurrentThread(),threadAffMask );
{
__int64 count=(__int64)(HPT_PerformanceCount.QuadPart);
__int64 freq=(__int64)(HPT_PerformanceFrequency.QuadPart);
double dcount=(double)count;
double dfreq=(double)freq;
double ddwt=dcount*1000.0/dfreq;
#ifdef _DEBUG
static double ddwt_was;
if(ddwt_was!=0)
if(ddwt<ddwt_was)
ddwt=ddwt;
ddwt_was=ddwt;
#endif
return ddwt;
}
}
|
|
|
|
|
|
Just a hunch, but I believe it might be processor related.
Out-of-Order eXecution (OOX)
Code entering the pipeline isn't always executed sequentially. Most modern CPUs do it.
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan
That's what machines are for.
Got a problem?
Sleep on it.
|
|
|
|
|
Hi, sorry I didn't reply. I've got involved in a long discussion with some friends on Facebook and doing many tests, but nothing clear yet it is quite puzzling.
But - some things I have found out.
First - you get really big errors sometimes as much as 15 ms was the largest I got which seems to rule out things like OOX?
Also - it seems to be something to do with the cores the thread runs on - and seems that SetThreadAffinityMask is not quite doing what I expect it to do.
I've tested:
RTDSC
QueryPerformanceCounter
GetSystemTimeAsFileTime
timeGetTime
GetTickCount
All of them run backwards sometimes, typically you get glitches like that every second or so on the higher resolution timers (HPC and RTDSC) and somewhat less often but still frequent on the other timers.
If I use SetThreadAffinityMask before every single time check, that doesn't fix the issue.
I think that maybe after you use SetThreadAffinityMask(..) then the thread or process has to sleep before the OS assigns it to the desired core.
SOLUTION ATTEMPT - TIE RHYTHM THREAD TO A SINGLE CORE
So - I set the thread affinity for the entire rhythm playing thread to core 1.
When I do that then again all the timers are monotonic.
So it rather looks as if it is a cpu core issue.
BTW my laptop has turbo boost. So I wondered if it was something to do with that. Ran the turbo boost monitor and I can see it going up and down a lot from 2.7 to 2.8 Ghz say. I don't know if it is anything to do with that.
Anyway - but this still doesn't seem to have solved the issue.
NOTES ARE STILL NOT PLAYED TO SUB MILLISECOND PRECISION - ERRORS OF up to 20 MS
I tested this by recording to audio as the notes are played.
One test I did just now has notes that according to my program and the windows timers were sent at 100 ms intervals with maximum error 27.576 µs (this is with the program running at real time base priority and with the rhythm playing thread at time critical on top of that so should have nothing at all interrupting it).
But it was audibly well out - and when I look at the audio recording, I had for instance one note of 80 ms immediately followed by another note of 110 ms, or a variation in the timing of about 140 % over a single note, for 100 ms notes.
That's using QueryPerformanceCounter localized to a single core - it is monotonic now, but doesn't seem to be accurate.
It might alternatively be some delay in the synth, but seems unlikely that any synth could be so badly programmed that it causes delays of as much as 30 ms, though I can do more tests by testing different synths that way.
It is unlikely to be midi relaying causing these delays as I am using a virtual cable to do the midi relaying, and I can relay about 10 midi notes per ms around a loopback and back to my program again.
Also don't see how my app could be introducing these sorts of errors - I have very much simplified the code so that the code to play the notes is very lightweight.
Basically it calculates the notes to play well before they are needed, then sits in a realtime busy wait for a few ms until the note is ready to be sent out via midi out. So there are just a few lines of code at the moment the note is played. It exits the busy wait, checks the time (which it reports as the time the note was sent) and then in the next line of code, sends it using MidiOutShortMsg.
HARDWARE CLOCKS
Anyway I have also researched and found out that there are actually two hardware clocks on the processor, the real time clock (over 1 Mhz so would achieve microsecond precision), and the HPET timer. Windows however by default doesn't seem to use either of them for its timers which might be the reason for the performance issues..
You can force Windows to use the HPET timer by using
bcdedit /set useplatformclock true and rebooting (and also setting HPET in the bios)
(Which some people find causes other performance issues so not sure if it is a good solution).
But I don't seem to have any way to enable HPET in the bios on this computer so it might not have it.
I haven't tested this yet to see if it makes a difference
OTHER WAYS TO ACCESS THE RTC
I think you can access the real time clock in kernel mode while writing a driver. Not sure, but KeQueryPerformanceCounter might access the RTC (or HPET). In Windows CE you have OEMGetRealTime but that doesn't seem to be available in Windows 7.
In Windows 8 you have GetSystemTimePreciseAsFileTime which at least according to the description seems to be using one of the hardware timers because it claims less than microsecond precision - could be using HPET but could be using the RTC which would give sufficient precision too (just).
I only have Windows 8 within a virtual machine however (seems unlikely it would work there) and haven't tested that.
THE WAY AHEAD
If I can't solve this, the only solution I can think of is to create my own sample player- and that would work because it would use audio streaming and you can simply count the number of samples played to work out the time, which is guaranteed to be accurate so long as the audio is played without any breaking up of the sound. It is for a metronome so a reasonable enough sample player for that purpose doesn't seem to tricky to do, playing non melodic percussion.
Then - presumably will get cleared up for anyone running it on Windows 8 on native hardware with GetSystemTimePreciseAsFileTime() though I haven't tested that yet.
ANY THOUGHTS ANYONE?
So anyway that's where it is at now. Once more really interested to know if anyone has any other thoughts about this or other avenues to explore.
There are so many tests you can do - and can't seem to find any online account of this in detail, just some intriguing but not complete forum discussions.
Thanks!
|
|
|
|
|
If you're seeing 15ms then I'm pretty sure that's the thread 'quantum' of Windows scheduling. It used to be 10ms on Windows NT 4.0
I've encountered this problem too. The best you can do is to not bother with the high perf counters, as they're still at the mercy of scheduler. All I do is get the average of about 3-5 readings, but the trick seems to be to put the thread to sleep until the next exact second (1000-current_millisecond) and repeat with timing durations less than the scheduling quantum. Which means it will take 3 - 4 seconds to get your average. Put thread into realtime priority, which should reduce preemption. (if you're confident that your timing-code is free of bugs)
sleep until msec:000, timing loop < 15ms
repeat as many times as you need to, the cpu scaling of some processors creates problems with initial result, so be prepared to discard the first and second result or ramp up the cpu to 100%.
If it's any consolation it's the same problem with any pre-emptive OS, but also disappointing to know that xGHz machines can't give you a reliable reading from within a modern OS. Linux calculates the timing before it boots.
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan
That's what machines are for.
Got a problem?
Sleep on it.
modified 19-Mar-13 7:12am.
|
|
|
|
|
Thanks that does explain one glitch I got with GetTickCount and timeGetTime.
With the HPC though after dealing with the core changing issues, then it wasn't doing any sudden big jumps. Just measuring the time wrong.
I'm doing this to measure musical notes. But I've found a solution at last .
The key was discovery of yet another MS Windows timing routine, KeQueryUnbiasedInterruptTime which looked like the type of precise timing I need, but unfortunately only accessible in kernel mode
KeQueryUnbiasedInterruptTime[^]
However, then I found, that there is a little known structure called KUSER_SHARED_DATA which is added to the process address space of every user mode process at the address 0x7FFE0000 which is a system wide shared memory location. It is volatile, and gets updated with the kernel interrupt time continuously. This means that to check the kernel interrupt time, all you need to do is to look at the correct address location in your process address space.
This apparently is a hardware timer, not affected by changes in cycle rate of the processor cores. It might be using either the RTC or the HPET though I don't know which it is. And since it is just a memory look up then there is far less overhead involved than in any of the other timers used in Windows.
Combining this with the code I have already, which does a sleep that stops slightly short of the desired time followed by a busy wait time checking loop at real time time critical mode until the time is reached - and you get sub microsecond "sample perfect" timing of midi in Windows.
Here is the code:
typedef struct _KSYSTEM_TIME
{
UINT32 LowPart;
INT32 High1Time;
INT32 High2Time;
} KSYSTEM_TIME, *PKSYSTEM_TIME;
typedef struct _KUSER_SHARED_DATA
{
volatile ULONG TickCountLow;
UINT32 TickCountMultiplier;
volatile KSYSTEM_TIME InterruptTime;
volatile KSYSTEM_TIME SystemTime;
volatile KSYSTEM_TIME TimeZoneBias;
} KUSER_SHARED_DATA, *PKUSER_SHARED_DATA;
#define MM_SHARED_USER_DATA_VA 0x7FFE0000
#define USER_SHARED_DATA ((KUSER_SHARED_DATA * const)MM_SHARED_USER_DATA_VA)
double dInterruptTimer(void)
{
union
{
KSYSTEM_TIME SysTime;
__int64 CurrTime;
} ts;
ts.SysTime.High1Time = USER_SHARED_DATA->InterruptTime.High1Time;
ts.SysTime.LowPart = USER_SHARED_DATA->InterruptTime.LowPart;
return ts.CurrTime/(double)10000;
}
I got this by modifying source code available here
_glue.c[^]
which is part of Microsoft's "Invisible computing" real time operating system also known as MMLite. Microsoft Invisible Computing[^]
So it is documented Microsoft code and not just a hack using undocumented structures. It requires Windows NT or later.
This is an actual recording made in real time using my program to play the notes via midi on the Microsoft GS Wavetable synth:
This is a test of the use of the Windows interrupt timer to time notes in Bounce Metronome.[^]
Here is a screen shot of the recording where you can see the sample precise alignment of the notes. You can tell it is sample precise because all the details of the waveform are exactly the same - all the irregularities that you get which will look slightly different if you move the waveform by a sample or so and then look at it zoomed out like this.
screen shot of recording with sample precise timing[^]
This also seems like a great way to do performance testing too, without all that averaging we are used to, and with almost no overhead.
|
|
|
|
|
Glad you've finally solved it.
I was going to suggest the multimedia timers as another option. Kernel mode stuff is a bit beyond me to be quite honest.
I'd still like to find out why the timing stuff sometimes appear to go backward. It's not just windows, I've had similar problems with the Java VM.
Thanks for the invisible computing links, I'm always interested in Microsoft Research projects with accompanying source code.
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan
That's what machines are for.
Got a problem?
Sleep on it.
|
|
|
|
|
Yes I've tried the multimedia timers. The problem is that on Windows they aren't quite good enough for musicians, frankly. They don't let you quite achieve the 1 ms precision that musicians require. There is a Microsoft article about that here too:
Guidelines For Providing Multimedia Timer Support[^]
So, anyway, it actually turned out it is only partly solved. I had the HPET enabled as well - the HPET described in that article which I think is a normal feature of modern chips.
To enable it you open up an admin level command prompt and type:
bcdedit /set useplatformclock true
then you need to reboot.
This requires Vista or later.
You know it is enabled if QueryPerformanceFrequency gives you a frequency in the range of 14+ MHz. You might possibly need to enable it in the BIOS as well.
To disable you do the same and type
bcdedit /deletevalue useplatformclock
and reboot.
It's disabled by default. Which seems surprising since it improves multimedia performance - or should - but from the forums it seems some users have issues including reduced performance of some games and mouse pointer "ghosting" and some report freezes with it enabled in the forums. So they might have decided it is more trouble than it is worth to have it enabled by default.
So - with HPET enabled and using the interrupt timer I get this perfect timing.
Actually turns out the interrupt timer changes only once every ms on my computer (roughly) and stays steady in between those calls. So I suppose on some other computers might change less often.
So though very fast look up, you wouldn't probably use it for code performance testing. I'm planning to use it most of the time, but to use the QueryPerformanceCounter(..) for the last couple of ms of the loop just to time fractional ms increments, if required (or for longer if it turns out to have larger increments than 1 ms).
Also bug fix in the previous code, seems you have to do this every time you check it:
for(;;)
{
ts.SysTime.High1Time = USER_SHARED_DATA->InterruptTime.High1Time;
ts.SysTime.LowPart = USER_SHARED_DATA->InterruptTime.LowPart;
if(ts.SysTime.High1Time == USER_SHARED_DATA->InterruptTime.High2Time);
break;
ntests++;
}
On the backwards timers, there is one thing to be careful about - this caught me out - if you have a multi-threaded app with different threads calling the time simultaneously it might seem to run backwards just because you are using the previously recorded time of one thread and comparing it with the current time of the current time. I fixed that by using thread local variables to check the previously recorded time.
It turns out that that was the reason why I thought I had the time going backwards - not multiple cores or anything just a bug in the code for testing if the time was monotonic in a multi-threaded app.
Sorry about that!
Anyway - still have these timing issues for users who run my program on a computer with the OS set to not use the HPET - as it is by default. So - what I'd really like to know is if there is any way to access the HPET in the case where Windows isn't using it for timing itself? Is there some assembly language way for instance to get at its register even though Windows itself isn't using it?
Don't really think there is or surely someone would have posted a way to do it by now, and everyone would be doing it, but you never know .
|
|
|
|
|
Oh - still getting those reversed times
static Spec_Thread double ddwt_was;
if(ddwt_was!=0)
if(ddwt<ddwt_was)
ddwt=ddwt;
ddwt_was=ddwt;
ddwt 22589372.894629
ddwt_was 22589372.895075
And this is fixed using:
{
DWORD threadAffMask=SetThreadAffinityMask(GetCurrentThread(),1);
QueryPerformanceCounter(&HPT_PerformanceCount);
SetThreadAffinityMask(GetCurrentThread(),threadAffMask );
}
So - seems - that code actually does work, you don't need to do it for the entire thread, just with every call as in the example code.
Sorry for the confusion.
So - seems - here anyway - that you can get time reversals if you let the time be measured on any core.
You avoid them if you access the interrupt timer via KUSER_SHARED_DATA though that doesn't have the same resolution on my computer as the QueryPerformanceCounter (only changes every 1ms - though it seems likely that it reports the exact time at the moment that it changes - so you could time an exact ms by just waiting in a busy loop for it to change).
But to get the clocks running at a constant rate you need to force the OS to use HPET. Doesn't seem to be anyway for a program to access HPET if Windows isn't using it as far as I can see anyway.
HPET is guaranteed, on Intel machines anyway, to be accurate to 0.05 % over 1 ms and
http://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/software-developers-hpet-spec-1-0a.pdf[^]
|
|
|
|
|
I don't know if you're aware that HPET implementation on some motherboards are flaky, in particular AMD Chipsets, may also have problems with boards that have SMI incorrectly generating too many interrupts.
Take a look at the foot of the Wiki page for High Performance Event Timer
Linus Torvald's rant:
HPET grumbles
windows/hardware/gg463347.aspx
Lastly, some source code you might wish to take a look at ($0.00)
zip
I did wonder if you might want to look at the source code for VirtualBox with has options to enable HPET features for VMs.
--* OFF TOPIC & Previously touched upon *--
There have been problems with multi-core timer issues, and apparently there is serialized cpu instruction to avoid out-of-order execution affecting timer readings. RDTSC (not serialized) RDTSCP (new)
"It's true that hard work never killed anyone. But I figure, why take the chance." - Ronald Reagan
That's what machines are for.
Got a problem?
Sleep on it.
|
|
|
|
|
Oh right, maybe that is why some users have issues enabling HPET while others find it works just fine for them.
When it comes to musicians, if this is indeed the best way to achieve sub-millisecond precision on Windows with true exact timing - well they might well when buying new computers choose to buy one that can have the HPET enabled without causing problems. You often get exchange of notes and musicians asking each other which computer makes in their experience work best for music making. Normally ones that score well for silence, and latency and DPC. But this seems another thing to add to the list.
It is hard to over-stress how important it is for musicians to be able to play midi notes and record them with sub millisecond precision, and at least - not with multi-millisecond errors.
I just checked it up, I think from this post that RDTSCP is just like RDTSC except that it also tells you the processor id.
Example assembly code here The Terran Comedy[^]
But so long as you don't mind forcing your thread to go to core 1 whenever it checks the time, then SetThreadAffinityMask seems the way to go - I was wrong earlier when I said it seemed that the thread has to sleep first before it works, that was that bug in my own code.
It seems to work instantly somehow - does this mean the scheduler instantly moves the thread to the other core?
I suspect that probably it looks ahead in the code and when it sees a SetThreadAffinityMask in the near future moves it to another core in anticipation.
|
|
|
|
|
Hi
Please Please somebody post comprehensive series in win32 API programming with visual c++ 2012.
I need it urgently since without the knowledge of it iam having problem in my programming career.
Please make it simple and cover all topics from easy to intermediate to advanced.
regards
|
|
|
|
|
naseer861 wrote: post comprehensive series in win32 API
In a forum posting? Are you serious? Do you even realize how huge the Win32 API is?
|
|
|
|
|
Dude i am not talking about the huge API
atleast start from the basic.
Somethings are understood as obviuos.
|
|
|
|
|
(something is probably lost in translation)
You are asking for a C++ Win32 guide that covers ALL topics from easy to intermediate to advanced ?
What did your own research on the subject returned ? did you go to a library ? what did google/bing returned ?
Nihil obstat
|
|
|
|
|
|
Need a free image library that can handle png xresolution and yresolution.
I need it to be compatible with visual studio 6.0
I greatful for your help.
Sincerely
Andla
Need a free image library that can handle png xresolution and yresolution.
I need it to be compatible with visual studio 6.0
I greatful for your help.
Sincerely
Andla
|
|
|
|
|
|
This[^] one is pretty good ... hope it helps ...
|
|
|
|
|
Hi,
I have narrowed down my CfileDIalog problem as to have something to do with MSCOWRKS.DLL
Does anyone know the pourpose of this DLL can it be removed and is there a way of removing it from my
MFC C++ app
Thanks
|
|
|
|
|