Latency explained

If you want to build high-performance systems, it's crucial to understand latency. Read on to find out what various events' latencies scale to in a more relatable version and how to improve latency.

Jan 05, 2024

Latency is so important to master that Gmail’s creator, Paul Buchheit, even coined a rule in regards to it: the 100ms rule.

But more on that later. Let’s first start with the basics.

What is latency?

It refers to how fast a signal or data packet travels from one point to another in a network or system. It's essentially a delay measured in milliseconds (ms) or even microseconds (µs). Latency significantly impacts system performance and user experience.

When we talk latency, we can split it into different types:

Network Latency: The time taken for a data packet to travel across a network from sender to receiver.
Disk Latency: The time it takes for a disk to complete a read or write operation.
Processing Latency: The time it takes for a system to process a given input.

Now circling back to the 100ms rule.

It states that every interaction should be quicker than 100ms.

Why?

Because 100ms is the threshold “where interactions feel instantaneous.”

But how fast is 100ms really?

The chart illustrates various computer and network events with their latencies, presented in actual time (nanoseconds to seconds) and a scaled, more relatable version for our common understanding.

While the actual latencies might seem incredibly short, the scaled version provides a perspective on their relative impact.

Notes:

Actual Latency: Given in nanoseconds (ns), microseconds (µs), milliseconds (ms), or seconds (s). Represents the real-time it takes for the event.
Scaled Latency: A metaphorical translation to a human-understandable timescale, showing how we might perceive these latencies if they were stretched out.

Key Network Events and Latencies

CPU Cycle: The base of all computations. Lower cycle times mean faster processing but understanding the latency from memory access is crucial for optimizing performance.
Cache Access: Different levels of cache have varying access speeds. Optimizing which data is stored in each can vastly improve performance.
Memory Access: Accessing RAM is significantly slower than accessing CPU cache. Knowing this helps in designing systems that minimize memory access bottlenecks.
Disk I/O: Both SSDs and HDDs have latencies orders of magnitude higher than memory access. Systems designed to reduce disk access can greatly benefit.
Network Requests: The latency involved in sending data over a network can be unpredictable and varies greatly. Efficient network protocols and data caching strategies are vital.
Reboots and Timeouts: The longest latencies come from system reboots and timeouts, scaling to hundreds and even thousands of years, underscoring the relative slowness of these operations in the digital realm.

Why Latency Matters?

It's crucial across all tech sectors. For example:

Here are some examples:

Cloud Computing and Data Centers
Amazon found that every 100ms delay in web page loading time can lead to a 1% loss in sales. This means that for a company like Amazon, with sales of $514 billion in 2022, a mere 100ms delay could potentially cost around $5.14 billion in sales annually.
Online Gaming
Riot Games reported that when they improved the latency for "League of Legends" players in North America by moving servers, there was a notable increase in player engagement. Players in affected regions played up to 7% more games when they experienced a 10-30ms improvement in latency.
Streaming Services
Netflix's research revealed that a delay of 2 seconds in video start-up time increased the chance of abandonment by 6%. Moreover, for every 0.5 seconds of additional buffering, there are 1% fewer starts of any videos on their platform.
Financial Trading
In the world of high-frequency trading, a 1 millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm, as reported by TABB Group.
Healthcare and Telemedicine
According to a study by the American Telemedicine Association, the use of telestroke services (rapid response video consultations for stroke victims) in rural areas, where quick response is critical, resulted in as much as a 25% decrease in patient disability. This is highly dependent on low-latency connections for real-time video consultations.
Automotive and Transport
For autonomous vehicles, a study suggested that reducing latency from 100ms to 10ms in vehicle communication systems can decrease collision rates by 20-30%. This could potentially save thousands of lives considering there were an estimated 38,800 traffic fatalities in the US alone in 2019.
Virtual and Augmented Reality
In a study conducted by Nvidia, they found that reducing latency from 50ms to 20ms in VR systems reduced user discomfort significantly, with reports of nausea dropping by over 50%. This can lead to longer, more engaging VR experiences and higher user retention rates.

How can you mitigate latency

Profiling and Monitoring: Regularly profile and monitor applications to understand where latencies are occurring.
Caching Strategies: Implement effective caching to reduce the need to access slower storage.
Asynchronous Programming: Use asynchronous operations to prevent application blocking during long operations.
Optimize Data Transfer: Minimize data sent over the network and use efficient serialization methods.
Choose the Right Tools: Use databases, libraries, and frameworks known for performance and suited for your specific needs.
Optimize Code: Ensure that your application code is efficient and not causing unnecessary delays.
Upgrade Hardware: Use faster processors, more RAM, or SSDs instead of HDDs to reduce processing and disk latency. Yes, that translates to faster resources in the cloud.
Content Delivery Networks (CDN): For web applications, use CDNs to cache content closer to users, reducing network latency.
Load Balancing: Distribute traffic evenly across servers to avoid overloading any single resource.
Network Optimization: Use network optimization techniques like compression or faster protocols to reduce data transmission times.

Conclusion

While inherent in all systems, understanding and reducing latency can greatly enhance performance and user satisfaction. By recognizing the sources of latency and implementing strategies to mitigate it, you can ensure that your system remains efficient, responsive, and scalable.

Whether you're a network administrator, a software developer, or a system architect, managing latency effectively is a key part of optimizing the performance and reliability of your operations.

P.S. If you enjoyed this post, share it with your friends and colleagues.

Ionut Arhire

Mar 23, 2024

Great article Helen! The need for managing latency goes to show that programmers are as important as designers when it comes to user experience.

Indeed, latency should be kept low in general, but maybe there are specific exceptions to this rule. People tend to associate high loading times with high effective work from the program. For example, if I ran a scan of my pc using an antivirus software and the scan time would be just a few seconds, I would think that the scan was superficial and I wouldn't have a lot of trust in it, especially if no threats were found.

Expand full comment