Video processing can be a computationally intensive task, and since computing power is often at a premium, the more we can speed up a video processing pipeline, the better. This is especially true for applications that require real-time processing, like processing a video stream from a webcam. While it’s important that the image processing portion of a video processing pipeline be streamlined, input/output (I/O) operations also tend to be a major bottleneck. One way to alleviate this is to split the computational load between multiple threads. It’s important to note that, in Python, there exists something called the global interpreter lock (GIL), which prevents multiple threads from accessing the same object. However, as the Python Wiki tells us:
…potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL … that the GIL becomes a bottleneck.
In other words, blocking I/O operations (which “block” the execution
of later code until the current code executes), like reading or displaying video
frames, are ripe for multithreaded execution. In this post, we’ll examine
the effect of putting calls to
their own dedicated threads.
All the code used in this post can be found on Github.
Measuring changes in performance
First, we must define “performance” and how we intend to evaluate
it. In a single-threaded video processing application, we might have the main
thread execute the following tasks in an infinitely looping while loop: 1) get a
frame from the webcam or video file with
cv2.VideoCapture.read(), 2) process
the frame as we need, and 3) display the processed frame on the screen with a
cv2.imshow(). By moving the reading and display operations to other
threads, each iteration of the while loop should take less time to execute.
Consequently, we’ll define our performance metric as the number of
iterations of the while loop in the main thread executed per second.
To measure iterations of the main while loop executing per second, we’ll
create a class and call it
which can be found on Github.
We import the
datetime module to track the elapsed time. At the end of each
iteration of the while loop, we’ll call
increment() to increment the
count. During each iteration, we’ll obtain the average iterations per
second for the video with a call to the
Performance without multithreading
Before examining the impact of multithreading, let’s look at performance
without it. Create a file named
grab it on Github and follow along.
We begin with some imports, including the
CountsPerSec class we made above. We
haven’t covered the
VideoShow classes yet, but these will
be used to execute the tasks of getting video frames and showing video frames,
respectively, in their own threads. The function
overlays text indicating the frequency of the main while loop, in iterations per
second, to a frame before displaying the frame. It accepts as arguments the
frame (a numpy array) and the iterations per second (a float), overlays the
value as text via
cv2.putText(), and returns the modified frame.
Next, we’ll define a function,
noThreading(), to get frames, compute and
overlay the iterations per second value on each frame, and display the frame.
The function takes the video source as its only argument. If given an integer,
source indicates that the video source is a webcam. 0 refers to the first
webcam, 1 would refer to a second connected webcam, and so on. If a string is
provided, it’s interpreted as the path to a video file. On lines
19-20, we create an OpenCV
VideoCapture object to grab and decode frames
from the webcam or video file, as well as a
CountsPerSec object to track the
main while loop’s performance. Line 22 begins the main while loop. On
line 23, we utilize the
read() method to get
and decode the next frame of the video stream; it returns a boolean,
indicating whether or not the frame was successfully grabbed and decoded, as
well as the frame itself in the form of a numpy array,
frame. On line 24,
we check if the frame was not successfully grabbed or if the user pressed the
“q” key to quit and exit the program. In either case, we halt
execution of the while loop with
break. Barring either condition, we continue
by simultaneously obtaining and overlaying the current “speed” of
the loop (in iterations per second) in the lower-left corner of the frame on
line 27. Finally, the frame is displayed on the screen on line 28 with a
cv2.imshow(), and the iteration count is incremented on line 29.
What do the results look like for both a webcam and a video file? These are the values I got on my hardware:
Reading from a webcam, the while loop executed about 28 iterations/second. Reading from an AVI file, about 240 iterations/second. These will serve as our baseline values.
A separate thread for getting video frames
What happens if we move the task of getting frames from the webcam or video file
into a separate thread? To do this, we’ll first define a class called
VideoGet in a file named
which can be found on Github.
We import the
threading module, which will allow us to spawn new threads. In
__init__() method, we initialize an OpenCV
VideoCapture object and
read the first frame. We also create an attribute called
stopped to act as a
flag, indicating that the thread should stop grabbing new frames.
start() method creates and starts the thread on line 16. The thread
get() function, defined on line 19. This function
continuously runs a while loop that reads a frame from the video stream and
stores it in the class instance’s
frame attribute, as long as the
stopped flag isn’t set. If a frame is not successfully read (which might
happen if the webcam is disconnected or the end of the video file is reached),
stopped flag is set True by calling the
stop() function, defined on
We’re now ready to implement this class. Returning to the file
thread_demo.py, we define a function
will use the
VideoGet object above to read video frames in a separate thread
while the main thread displays the frames.
The function is pretty similar to the
noThreading() function discussed in the
previous section, except we initialize the
VideoGet object and start the
second thread on line 37. On lines 41-43 of the main while loop, we
check to see if the user has pressed the “q” key or if the
stopped attribute has been set True, in which case
we halt the while loop. Otherwise, the loop gets the frame currently stored in
VideoGet object on line 45, then proceeds to process and display it as
How does the function perform on a webcam video stream and on a video file?
Quite the difference compared to the single-thread case! With the frame-getting task in a separate thread, the while loop executed 545 iterations/second for a webcam and 585 iterations/second for a video file. At this point, it’s important to note that these values do not correspond to framerate or FPS. The video FPS is largely limited by the camera hardware and/or the speed with which each frame can be grabbed and decoded. The iterations per second simply show that the main while loop is able to execute more quickly when some of the video I/O is off-loaded to another thread. It demonstrates that the main thread can do more processing when it isn’t also responsible for reading frames.
A separate thread for showing video frames
To move the task of displaying video frames to a separate thread, we follow a
procedure similar to the last section and define a class called
VideoShow in a
VideoShow.py, which, as before,
can be found on Github. The class definition begins just like the
This time, the new thread calls the
show() method, defined on line 17.
Note that checking for user input, on line 20, is achieved in the separate
thread instead of the main thread, since the OpenCV
doesn’t necessarily play well in multithreaded applications and I found it
didn’t work properly when placed in the main thread. Once again, returning
thread_demo.py, we define a function called
As before, this function is similar to the
noThreading() function, except we
VideoShow object, which I’ve named
(that’s “shower” as in “something that shows,” not
“shower” as in “water and shampoo”) and start the new
thread on line 58. Line 63 checks indirectly if the user has pressed
“q” to quit the program, since the
VideoShow object is actually
checking for user input and setting its
stopped attribute to True in the event
that the user presses “q”. Line 68 sets the
frame attribute to the current frame.
And the result?
This is interesting. The webcam performs at 30 iterations/second, only slightly faster than the 28 obtained in the case of a single thread. However, the video file performs at ~400 iterations/second—faster than its single-thread counterpart (240 iterations/second) but slower than the video file with video reading in a separate thread (585 iterations/second). This suggests that there’s a fundamental difference between reading from a camera stream and from a file, and that the primary bottleneck for a camera stream is reading and decoding video frames.
Separate threads for both getting and showing video frames
Finally, we’ll implement a function named
thread_demo.py that creates a thread for getting video frames via the
VideoGet class and another thread for displaying frames via the
class, with the main thread existing to process and pass frames between the two
This function is a mixture of the
functions, which turns out to have a very interesting result:
This seems to be at odds with the previous conclusion, which suggested that reading frames was the primary bottleneck for the webcam. For whatever reason, the combination of putting both frame-reading and frame-display in dedicated threads bumps the performance in both cases up to a whopping ~48000 iterations/second. Not being as well-versed in multithreading as I’d like to be, I can’t quite explain this result. Regardless, it appears fairly clear that using multithreading for video I/O can free up considerable resources for performing other image processing tasks.