If you're a Windows programmer, you may be familiar with a routine exposed by the Win32 API called GetTickCount() or GetTickCount64(). This function is exposed to the user through kernel32 module and it basically returns the amount of milliseconds since the system has booted.
As the description of the function tells, the function can be handy for timings, animations, transitions, basically every action where the time is crucial. However, how is this function actually implemented? Let's find out!
But first, let's dive into a kernel structure that contains some crucial data that GetTickCount() uses.
KUSER_SHARED_DATA
As you may know, accessing data that is located inside kernel address space from user space cannot be done so easily. For these reasons there's a predefined KUSER_SHARED_DATA struct. As the name suggests, its main purpose is to share global kernel data with user-space programs.
At some point, kernel allocates a single page at a fixed location dedicated for this data only. And thus, applications from user space can locate it easily. On 32-bit Windows the location is 0x7FFE0000. This means, that a perfectly valid code can be:
auto kusd = (KUSER_SHARED_DATA* const)0x7FFE0000;
And then, believe me or not however, following code outputs 'C:\Windows':
wprintf((wchar_t*)(0x7FFE0030));
Why? It's simple! If you look at the structure declaration, you can see that there's a NtSystemRoot array of type WCHAR.
typedef struct _KUSER_SHARED_DATA {
...
WCHAR NtSystemRoot[ 260 ]; // At offset 0x30
...
} KUSER_SHARED_DATA, *PKUSER_SHARED_DATA;
As stated before, the base address of this page (or struct) is 0x7FFE0000 and plus the 0x30 offset of the NtSystemRoot field we get the address on which the field data lies, i.e. pointer to NtSystemRoot. And since the base address never changes, we can access this fixed location anytime we want. Simple as that!
Now, we need to know that this structure exists because it contains two more members that are crucial for the correct functioning of GetTickCount().
TickCount & TickCountMultiplier
So these two, TickCount and TickCountMultiplier are the only things that GetTickCount() needs in order to work properly, that's it. Tick count and its multiplier are inside the KUSER_SHARED_DATA structure at offsets 0x320 and 0x4. So, what are they about and what is their purpose?
As the name of TickCountMultiplier suggests, it is a multiplier which is used to multiply TickCount that is constantly incrementing by a rate of 100ns-units based increment based on hardware clock. More on that later.
If we dive into the kernel itself, the 64-bit kernel32.dll module, and we see the pseudo code for GetTickCountKernel32() function i.e. in IDA, we can see following:
DWORD __stdcall GetTickCountKernel32()
{
return MEMORY[0x7FFE0320] * (uint64_t)MEMORY[0x7FFE0004] >> 24;
}
Things start to make sense. Remember when we were talking about offsets of TickCount and TickCountMultiplier? They're here!
MEMORY[0x7FFE0320] * (uint64_t)MEMORY[0x7FFE0004] >> 24;
And thus, these two are being multiplied together as we mentioned before. And the code above is identical to:
kusd->TickCountQuad * (uint64_t)kusd->TickCountMultiplier >> 24;
Why shifting to the right by 24?
The TickCountMultiplier is a 32-bit wide value. It is composed out of 24-bit wide fraction and 8-bit integer part. This is done because as you may know, in kernel-mode, there aren't any ordinary floating-point numbers, nor operations with them, done on the FPU. Basically the FPU is disabled completely.
However, the multiplier is casted to a uint64_t and the result of the multiplication of these two is also a 64-bit wide value. But still, after these two values are multiplied together, the fraction remains. In order to get rid of the 24-bit fraction, we shift it off. Then we're left with 32-bit wide integer part.
Since the returned value is only 32-bit wide, the timer eventually runs out after a while. To be more exact, after 2^32 milliseconds, i.e. 49.71 days. Then the function starts to count from zero again. And of course, in order to prevent such overflow, you can always use GetTickCount64(), which takes much longer to overflow.
Custom implementation
If we'd like to implement the function for ourself, that'd be pretty simple, right? We wouldn't do much more work than copying the pseudo code into our code and bang! We have a fully working GetTickCount() implementation in our code!
auto TickCountPtr = (uint64_t* const)(0x7FFE0320); // TickCountQuad
auto TickCountMultiplierPtr = (uint32_t* const)(0x7FFE0004); // TickCountMultiplier
// Custom implementation for GetTickCount() function.
uint32_t GetTickCountImpl()
{
return *TickCountPtr * (uint64_t)(*TickCountMultiplierPtr) >> 24;
}
Simple as that! And now, if we test our implementation against original GetTickCount function we get following results:
while (true)
{
printf("%10lu %10lu\n", GetTickCount(), GetTickCountImpl());
Sleep(1);
}
// Output:
137151593 137151593
137151609 137151609
137151625 137151625
137151640 137151640
137151656 137151656
137151671 137151671
137151687 137151687
137151703 137151703
137151718 137151718
... // It's the same every time!
So, this is the implementation of the function. Only one line long implementation! Now sure, one line function doesn't seem much complicated but, the logic why this even work lies even deeper inside the kernel.
The code that is responsible for actually setting the tick count and its multiplier lies very deep inside the kernel core... inside the ntoskrnl.exe.
Further implementation details
In the second part of the article I'd like to focus on the internal implementation of TickCount and TickCountMultiplier, how they're actually calculated or what they're.
TickCountMultiplier
Let's start with TickCountMultiplier. So, this field from KUSER_SHARED_DATA is constant, it doesn't change, ever. And because of that, and due to optimizations, it is pre-calculated inside the kernel at startup. To be more exact, inside InitBootProcessor() routine that is called at the very beginning of windows startup inside ntoskrnl.exe:
...
ExpTickCountMultiplier = ExComputeTickCountMultiplier(KeMaximumIncrement);
SharedUserData->TickCountMultiplier = ExpTickCountMultiplier;
...
As we can see, it is all happening inside ExComputeTickCountMultiplier() routine. The KeMaximumIncrement argument passed is architecture-dependent but in general, it is a clock increment value in 100ns units. Nonetheless, the TickCount clock basically represents 10 100ns clock increments (1ms / 100ns = 10 increments).
ExComputeTickCountMultiplier()
Anyway, let's dive into ExComputeTickCountMultiplier() function. The function itself is simple and its body can be divided into three groups:
let's now cover these three steps in detail. Starting with a general description of what happens with KeMaximumIncrement as input.
And also, it is important to note that the KeMaximumIncrement is a 32-bit integer that has an integer and a fraction part. The integer part covers 8 bits and the fraction part remaining 24 bits.
Step 1
As the first step, the integer part is computed by dividing KeMaximumIncrement by 10,000 (100ns * 10,000ns is 1ms). Using this formula, it is clear that we're getting the number of milliseconds, rather than nanoseconds. Let's store that into variable IntegerPart.
ULONG IntegerPart = KeMaximumIncrement / (10 * 1000);
Step 2
As the second step, the remainder is computed. Remember that the KeMaximumIncrement is a 'floating-point' value with base and fraction. We will store it inside variable Remainder.
Next up is a algorithm that calculates the binary fraction from the Remainder. The rules are as following. Multiply the fractioned remainder by two (left shift) and if the result is bigger than one, store 1 and subtract 1 from the result, store 0 otherwise. Repeat 24 times in this case, since we're calculating up to 24-bit precision. The result of stored ones and zeros then pack together into one sequence.
As a demonstration, let's take decimal number 6789 as Remainder and make fraction from it - 0.6789 using the same algorithm.
0.6789 * 2 = 1.3578 1 |
0.3578 * 2 = 0.7156 0 |
0.7156 * 2 = 1.4312 1 |
0.4312 * 2 = 0.8624 0 |
0.8624 * 2 = 1.7248 1 |
0.7248 * 2 = 1.4496 1 |
0.4496 * 2 = 0.8992 0 |
0.8992 * 2 = 1.7984 1 |
0.7984 * 2 = 1.5968 1 V
24 times total...
The result would be:
0.6789 in decimal is 0.101011011... in binary
The C equivalent would then be:
ULONG FractionPart = 0;
for (int n = 0; n < 24; n++)
{
FractionPart <<= 1; // Move to next bit (* 2)
Remainder <<= 1;
// Now, we aren't checking if the value is bigger
// than 1, but rather if it is bigger than 10k.
if (Remainder >= (10 * 1000))
{
Remainder -= (10 * 1000); // Subtract 10k instead of 1..
FractionPart |= 1; // Add 1 bit. Otherwise none (0) would be added.
}
}
Now we have the binary fraction completely calculated and we're near the end! Now the only thing that remains is to add the fraction part to the integer part.
Part 3
As stated before, the only thing left is to combine the fraction part together with the integer part:
ULONG Result = (IntegerPart << 24) | FractionPart;
And this is it! This is how the multiplier is calculated. In summary, a KeMaximumIncrement is supplied. It is a clock increment value in 100ns units. This value is then converted to the integer millisecond part. Then, the 24-bit binary fraction is computed for every bit. In the end, those two values are combined together to form the final multiplier. Simple as that!
TickCount
The tick count is, as the name says, basically a tick count. A 'tick' is an update in internal system time code that is represented in 100 nanosecond units. This increment is basically multiplied by a constant (or a multiplier, e.g. TickCountMultiplier) that converts it to a millisecond tick count.
The tick count is incremented inside KiUpdateSystemTime() function. And to be honest, there's nothing much to say about this. It is just a variable that is constantly incrementing each 100ns tick.
Conclusion
So, as you can see, the implementation of GetTickCount() is pretty simple if you know, how it works. I wrote this article because I found the ínternal implementation of the function very interesting. The way it works and how it works. Also, the way that the function works by simply multiplying two values together, forming a 1ms tick timer.