Should you really be using AVIF?

A requirement of the IB Diploma Programme is a 4000 word extended essay on a topic of your choosing. Mine? Image codecs. You can read the full document here, or read just the introduction below:

Modern displays contain thousands of pixels, each of which are capable of displaying colours by varying the amount of red, green or blue that is output. In order to store an image digitally, each pixel is a colour represented by three numbers that correspond to an amount of red, green and blue. In a 24 bit colour system, a digital representation of a single pixel may encode each colour’s component with 8 successive bytes. For example, (in R→G→B order) is: 00110011 11100011 00010000.

In practice, this digital representation of images is inefficient. The same (or similar) pixel may be encoded many times in parts of an image where there are simple correlations between pixels. Image compression is a process where we attempt to minimise the amount of data used to encode an image, whilst preserving its visual quality as much as possible. We may do this ‘losslessly’, without any quality degradation, or ‘lossily’, with quality degradation. This essay will focus on the lossy image compression algorithms of JFIF and AVIF.

Lossy image compression may have two possible benefits based on its goal: to maximise visual quality whilst minimising file size. These benefits have a number of advantages making the research question worthy of investigation. Images often make up the majority of a website’s total size (Lu 2019). Reducing their size can thus improve the speed at which a website loads, which is proven to increase the number of users converting into customers on business websites (Cloudflare n.d.). Smaller image sizes may allow businesses to use less storage, saving costs in hardware and bandwidth. Computer science was chosen for this research question since image compression is a topic that features disciplinary concepts such as: algorithms and encoding; and, decoding.

JFIF and AVIF are two codecs worthy of consideration. JFIF is considered a de facto technology. Consequently, it must be improved upon considerably if it is to be ousted. Many standards have been created in an attempt to supersede JFIF but have failed due to: lack of industry support — in the case of JPEG XL (Sneyers 2022); providing little benefit over JFIF — in the case of WEBP (Mozilla 2013); or, due to complicated licences, royalty and patent requirements — in the case of HEIF (Chiariglione 2018). AVIF presents a greater challenge to JFIF’s omnipresent status because it is developed by an alliance of key industry players, has advanced encoding techniques that promise to challenge JFIF, and is royalty free. This makes a comparison between JFIF and AVIF worthy information for users making image compression technology decisions. Since JFIF provides the same licence benefits and has tremendous industry support, the comparison hinges on whether AVIF’s encoding techniques provide enough benefit over JFIF.

To answer the research question, JFIF and AVIF will be analysed using their respective specification documents, related literature and experimented upon using two metrics: quality and encode time. Previous comparisons of JFIF and AVIF have been conducted in order to provide context for examining other algorithms, rather than as a focus, or have been conducted in a casual manner, relying on qualitative observation or a low sample size. For example, On the hunt for the best image quality per byte (Fronius, 2020) uses a single image and qualitative observations to form a conclusion.

An experimental and theoretical process for evaluation was chosen because lossy image compression is an operation that produces results dependent on unpredictable input data. The two image compression algorithms considered could be made to output images that subvert theoretical conclusions by engineering the input data to suit (or not suit) a particular algorithm. So, an experimental approach where conclusions are made based upon common types of images to draw conclusions for ‘most’ images is critical to supplement theoretical conclusions.