HKMP Server Performance and Multithreading
Added 2022-10-01 12:57:59 +0000 UTCIt has been a while, but there is finally a new release for HKMP and some major developments. This post is a long overdue update on the recent development of HKMP and will mainly cover the interesting journey of performance improvements.
In the Patreon posts, I have talked a lot about networking. And while you can take the discussions and explanations at face value, there has not been a good display of the capabilities of the networking protocol yet. That is until Youtuber fireb0rn decided to host an HKMP Tag game that was open to the public. This event happened almost over a month ago now and gave a great insight into the limits of HKMP and by extension the networking. Of course, such a large event is not without hiccups (and there were a decent number of issues that came up), but it did prove that the networking protocol had still yet to reach its limit.
Let me give some backstory to the people that did not participate in this event (which I assume is most of you). fireb0rn had created a mod-pack for players to use that contained HKMP, HKMP Tag, HKMP ModDiff, and some others. Most notably, HKMP ModDiff made sure that each player did not have any other mods apart from the mods in the mod pack to ensure an even playing field (i.e. no cheating). The mod pack also contained the infected skin for players that were tagged/infected. Then, after the stream had started, fireb0rn opened the server up to the public and players started rolling in. The initial rounds had approximately 20 players, however, this number quickly skyrocketed as people figured out how to join using the mod pack. Eventually, we reached 100 players and started to see the server slowly degrading. After this the rounds got very chaotic: players were becoming invisible, map icons were very inaccurate and sometimes entire rooms came to a standstill. Nevertheless, in some cases, the server performed very well and many players noted that they enjoyed the event immensely.
Yet, what piqued my interest in the event was that the server's network bandwidth was easily able to handle the amount of data. To my surprise, the main bottleneck of the event was not the networking, but rather the server's performance in processing data. This revelation was mostly deduced from the recording of the event where, on multiple occasions, the issues and bugs happening could be attributed to the server's inability to timely handle data and respond to users.
In the last month, I have been tackling these issues and trying to optimize the performance of the server. So let's break down the changes and give some background on why this should be more performant. One of the first things that come to mind when speeding up an application is multithreading. Multithreading allows a program to execute code in parallel, maintaining different execution paths. When I started developing HKMP, I immediately started using multithreading to handle server-side client connections and to do processing of the server state. Naively, I used a few threads here and there to share the workload of the program and thought that it would benefit performance. However, when I started researching multithreading in more detail I quickly concluded that I did not have enough knowledge of performance concerning threads to justify what I was doing. Let me explain the main idea behind multithreading and why it can aid or destroy performance.
As I said before, each thread maintains an execution path of code. It is possible to create any number of threads, but your processor is only able to execute the path of a single thread at a time. So why is multithreading (in some cases) still a good idea? There are in essence two reasons for this: modern processors have multiple cores and each core can physically execute the path of a thread simultaneously. The other reason is that sometimes a certain thread is waiting on something. Usually, the wait is caused by IO (input, output) operations. These operations are processed by the underlying operating system and thus will not take up the execution time of the thread. In this case, the language/OS can decide (while it is waiting) to switch execution to a different thread. In both these cases, you get more CPU time executing something if you have multiple threads. Beware though, because increasing the number of threads does not necessarily mean more performance. The language will decide when to switch execution to a different thread, which is called a context switch. This context switch is costly though, you don't want to do this too often. A good example of this is the HKMP code from the previous release. I used a thread for each client connection that would manage sending updates to that client. This thread would sleep between updates to make sure updates are sent on a specific period (60 updates per second). Each time a thread sleeps for a given time, the language will naturally do a context switch to get more CPU time. This means that I was artificially introducing context switches for no apparent gain other than timing the updates. Each thread would sleep for 1/60 of a second and each client had such a thread. Take a growing number of clients on a server and you can figure out why this is disastrous for performance.
In order to perfectly optimize an application for performance, you want to have your threads block/sleep as little as possible. Some operations (such as IO) will block inevitably, which is fine as long as you have another thread that can operate in this downtime. Additionally, the number of cores you have on the processor is the number of threads you can have running in parallel. Now you might be wondering that this is fairly easy to keep in mind and manage, but with code running asynchronously come a few problems. If multiple threads need to either read or write to the same data, you need to synchronize their access. The language/OS will arbitrarily decide when to switch threads, it will do this mostly when it thinks another thread is better spent running (if the thread is blocking for example). These arbitrary context switches will make data access inconsistent, resulting in "race conditions".
A race condition happens when two different threads access the same data at the same time and the result is undetermined. For example, if thread A only executes a section of code if variable C has value 5 and this section depends on variable C to be value 5 throughout, but thread B changes variable C while thread A is executing, this may result in behavior that was not expected by the programmer. The solution to solving these race conditions is by synchronizing access to shared data. This is done on an operating system level to ensure that different threads always agree on who is allowed access. This also means that while one thread is accessing the data, another thread is blocking execution. This type of synchronization is another reason for multiple threads to increase performance. If you have a lot of synchronization happening, it might be possible for another thread to make use of that downtime.
All these concepts combined make multithreading an incredibly challenging topic to tackle. Using too few threads may underutilize CPU time because of various blocking operations, while using too many threads may strain the CPU in making context switches. Unfortunately, there is no easy answer to how many threads would be optimal for a given application. Specifically for HKMP, network access is a blocking IO operation for which we can utilize more threads. Processing data from all clients and synchronizing this data in the server state can also lead to blocking, thus warranting more threads. But using a thread for each client connection and letting those artificially sleep for timing purposes is a clear misuse of threads and performance.