Performance Programming: Threads and Multi-threading
Hello and welcome back to the series. In the last post we introduced the concept of parallelism and concurrency — if you missed it check it out here .
Today I’ll be diving into threads and threading in programming. I plan to talk on threads as a whole to begin with, then focus a bit more on threads in python and maybe I’ll make mention of Java in the mix as well, this is very likely going to be a mini series inside the series.
So what exactly are threads?
“Well, they are those thin, rope like materials tailors use in sewing the most beautiful dresses you ever saw.”
However, in computer science the simplest definition of threads is:
“a small set of instructions designed to be scheduled and executed by the CPU independently of the parent process.”
Multi-threading typically falls under the concurrency paradigm.
It requires a form of scheduler which is often administrated by the operating system, these determines which of the sub-process —threads — run per time; till completion or till it reaches a halting condition.
Whenever a process starts, it also spins up a single new thread. Multi-threading is the ability to spin up multiple threads which run independently of each other as well as the parent process that spun them. This happens within a single process — you can also spin up multiple processes and then spin multiple threads within those multiple process, this is combining multiprocessing and multi-threading, we will explore this further in a later article so make sure to stay through to end of the series to hear more about it.
Threads are started inside of CPU processes and as such one process can have multiple threads running, with a scheduler in place to orchestrate the running of these various threads per time. As such you can expect faster running programs and because one process can house multiple threads even computers that have only a single core can still expect performance boosts from making use of the multi-threading paradigm.
Ideally, the use case for the threading paradigm is dealing with IO bound tasks — reading a file, reading from/connecting to a database, connecting to a server, etc.
To paint an over simplified picture of the underlying process one can consider the inner workings as such: when your program starts up the parent process spins up several threads instead of one and then the first thread runs until it hits an IO bound task or completion. The operating system typically hands over control to a different thread when the current thread:
1. Reaches a point where they have to carry out IO bound tasks
2. The task running on the thread gets completed or
3. The thread reaches a particular time limit set for the thread to hold control
While said thread waits for results from the IO bound task, another threads start up and carries out its own tasks.
In further posts we will be considering some important concepts to consider while working with threads so stay tuned