Most internet users have, at one point, done a live stream on a free platform such as Facebook. But did you ever think how it’s possible to broadcast such high quality video over the internet? Let’s make some calculations – the average 1080p video has a resolution of 1920×1080, at 24 bits each and 30 frames per second, it equals 1.5 gigabits per second. How are we able to see this video if your internet download speed is only 10Mbps? Video compression is the answer.
How Video Compression Works
If you’re somewhat familiar with the way video gets sent over the internet, you might have heard of codecs. A codec is a software programme that decodes and encodes data, including video data. Encoding the data compresses the video file, making it much easier to transmit and store. The decoding aspect is the opposite, rebuilding the original data once it arrives at its destination. Codecs are not only used for videos but just about any type of signal.
Images are compressed in the same way as videos – the information that is less visible and relevant to the viewer is thrown out and the remaining data is stored more efficiently. Video compression is done by compressing images frame by frame. This technique is called intra-frame coding or spatial coding.
With intra-frame coding, each video frame is compressed as a separate entity. Each frame is independent from the previous frame and from the one after it. Although intra-frame coding generally uses more storage space and bandwidth, it allows more flexibility when editing the original video.
Even if this technique can be used to significantly reduce the file size, video compression takes it one step further. Your average video features many consecutive frames which are nearly identical to one another. Since the frames are nearly identical, this inter-frame redundancy can be used to compress the video even further.
Let’s think of an example – let’s say nothing moves in this video. Instead of storing each identical frame, the encoder will keep only the first frame and repeat it for an indefinite number of times – this in turn can save a lot of space and bandwidth.
Let’s take a more realistic example where only small parts of the video do not change over the course of many frames. We can apply the same technique but only to certain parts of the video by simply dividing the different frames into smaller blocks and repeating only the blocks that do not change.
What can be done in the situation where blocks do change between consecutive frames but some change only a little while others change a lot? In this scenario, the encoder searches for any changes of a block in the next frame within a neighbourhood. This technique is called block motion estimation.
This helps video compression by saving a reference frame and motion vectors for all the blocks which make the full image, instead of saving every frame individually. The motion vectors provide information on how the blocks need to be moved to match the next frames. This is called motion compensation.
Motion compensation can reduce differences between two frames, but it’s not enough to create the next frame. The differences between the actual and motion compensated frames are also saved – these are called residual frames.
When a viewer starts playing the video, the decoder will predict the current frame by looking at the previous reference frame, applying motion vectors and then adding the residual frame. Residual frames are highly compressible and can save a lot of bandwidth when viewing.
Although video compression algorithms have been a thing for quite a while, new technologies are being developed as we speak, especially ones focused on machine learning models that can outperform the standard, block-based encoding techniques.