I was at the airport recently, watching a check-in queue do what check-in queues do, which is mostly not move. There were four agents working, but each had their own little line in front of them. The lines were moving at completely different speeds. Someone who joined a different queue was checked in and gone before I had taken three steps forward. By the time I got to the counter I was curious. Why is it set up this way? Could I, the smug person standing in line with nothing to do, have designed it better?
It turns out the answer is yes, almost certainly. There is a whole field of maths called queueing theory that has been thinking about this since the early 1900s, originally for telephone exchanges, and the punchlines are surprisingly useful. This post is a short tour. By the end you’ll know why one long line is almost always better than several short ones, why a system that runs at 90% capacity is dramatically worse than one at 80%, and why a small amount of randomness in service times causes much more pain than you’d guess.
A few short interactive toys along the way; drag the sliders and see what happens.
Table of contents
Open Table of contents
Anatomy of a queue
Before we can analyse anything, we need to name the parts. Queueing theorists use a compact notation: A / S / s / K / N.
- A is the type of arrival process. How customers show up.
- S is the type of service process. How long each one takes.
- s is the number of servers.
- K is the max number of customers allowed in the system, if there’s a cap.
- N is the size of the customer population, if it’s small enough to matter.
The most famous queue, the one everyone starts with, is the M/M/1:
- Arrivals are Markovian, which is a fancy way of saying random and memoryless. Customers arrive at some average rate, but each one is independent of the last.
- Service times are also Markovian. Each customer takes a random amount of time, exponentially distributed.
- There is 1 server.
- No size limit, infinite population (those last two letters drop off when they’re infinity).
“Memoryless” deserves a footnote. If your average inter-arrival time is 4 minutes and no one has arrived for 3 minutes, your expected wait for the next person is still 4 minutes, not 1. Same goes for service. The fact that the person ahead of you has been at the counter for ages tells you nothing about how much longer they will take. This is wrong in everyday life, where waits feel “due” to end, but it is exactly right when arrivals and services are independent random events.
Here’s an M/M/1 in motion. Try it out.
The two knobs (and the cliff)
Every queue has two knobs. The arrival rate λ (lambda) is how fast customers show up. The service rate μ (mu) is how fast a server can clear them. The ratio of these, with the number of servers thrown in, is the most important number in queueing theory:
ρ = λ / (s · μ)
ρ is the utilisation: the fraction of your service capacity being used. If ρ = 0.5, your servers are busy half the time. If ρ = 0.9, they are slammed. If ρ ≥ 1, you have arrivals coming in faster than you can ever clear them and the line grows without bound forever. (At a real airport, the line stops growing because the airport closes, or because customers give up, but the maths says: not pretty.)
Here’s the part that surprised me. As you push ρ toward 1, the expected number of customers in the system does not grow linearly. It grows on a curve that gets vertical very fast. The M/M/1 formula is short:
L = ρ / (1 − ρ)
Plug in some numbers. At ρ = 0.5, L = 1; at ρ = 0.8, L = 4; at ρ = 0.9, L = 9; at ρ = 0.95, L = 19. The cost of running near full capacity is brutal and almost everyone underestimates it.
A real-world consequence worth burning in: a system running at 90% feels qualitatively different from one running at 80%, even though it’s “only 10% more loaded”.
One line or many?
Back to my airline counter. Four agents, four lines, all moving at different speeds because the customers in them take different amounts of time. The maths of why this setup is worse than it could be is short.
Imagine a simplified version. You have three agents and customers arriving at a total rate of 2.4 per minute. Each agent can serve 1 customer per minute. Two layouts:
- Three separate lines, three agents. Each agent is its own M/M/1, with arrival rate 0.8 and service rate 1, so ρ = 0.8 per line. Expected waiting queue per agent: 3.2; expected total per agent: 4. Across all three lines: 9.6 waiting, 12 in the system.
- One big line, three agents. Now it’s an M/M/3 with total arrival rate 2.4 and total capacity 3, so ρ = 0.8 system-wide. Expected waiting queue: about 2.6 people. Expected total in the system: about 5.0, because about 2.4 are being served on average.
Pooling the queue cuts the average number of waiting customers by roughly 4x, even though nothing about the agents or the customers has changed.
The intuition: when each agent has their own line, an agent who happens to get a string of fast customers can be sitting idle while another agent is buried, wasting capacity. With one shared line, no agent is ever idle while someone is waiting. It is the single biggest practical insight in this whole field, and it’s why every well-designed bank, every TSA checkpoint, and every In-N-Out drive-thru uses a single feeder line.
(Some airlines still do not. Now you know.)
Why TSA gets this right
The TSA checkpoint is the canonical real-world example. With separate counters, each lane behaves like its own M/M/1, statistically independent of the others. With one feeder line, the same agents become servers in a shared M/M/s queue, with the same arrival rate, the same per-agent service rate, and the same number of agents. Only the topology of the queue has changed.
Separate lines lose the correlation between local supply and local demand. An agent who just finished a quick customer is idle until the next person joins their lane, even if someone is waiting two metres away. With one feeder, every agent who frees up takes the head of the shared queue immediately, so no one is idle while anyone is waiting. For the numbers above (λ = 2.4, μ = 1, three agents), separate lines have about 9.6 people waiting on average; one feeder has about 2.6.
Three layouts side by side: random-pick separate lines, join-the-shortest, and the single feeder.
Variability is the silent killer
One more insight, and it might be the most counterintuitive one. Suppose I told you that two coffee shops have the exact same average service time of 5 minutes. The first one is consistent: every customer takes between 4 and 6 minutes. The second is wild: most customers take 1 minute, but every so often someone orders a flight of pour-overs and takes 25.
Same average, and both shops can keep up with the same arrival rate in the long run. But the queues in the second shop will be enormously longer.
The maths comes from something called the Pollaczek-Khintchine formula, which says (for a single-server queue):
L_q ∝ (1 + variance term) / (1 − ρ)
Translation: the more variable your service times, the longer your queues, even at the same average rate. A purely deterministic system (everyone takes exactly the same time) has roughly half the expected queue length of an exponentially distributed one. And the exponential is itself only “moderately” variable. With heavy-tailed service times (a few customers who take forever), it can get much worse.
Intuitively, one slow customer creates a wake of waiting customers behind them, and that wake takes a while to clear. Then another slow one comes along and does it again. The bad events have a long memory; the good ones (fast customers) don’t help nearly as much.
Back to the airport
So back to my airline counter. Knowing what I now know, here’s what was happening:
- They were running close to capacity. The line was long, which means ρ was high. From the hockey stick, small bumps in load translate to big bumps in wait.
- They were not pooling. Four lines, four agents. The line next to me moved faster because that customer happened to be quick. Mine moved slowly because my customer happened to be slow. With one shared line, those imbalances would have cancelled out and everyone’s wait would have been shorter on average.
- Service was variable. Some customers checked one bag and were gone. Others had three bags, a question about their seat, and a pet carrier. Variance in service time makes queues longer at the same average rate.
The fix is obvious, and it’s the thing TSA figured out years ago: one feeder line that fans out to whichever agent is free next. It costs nothing extra and it would have shortened my wait by quite a bit. The airline still hasn’t done it.
If you take one thing from this post: when you’re designing or running any system that has waiting (call centres, support tickets, kitchen tickets, hospital triage, code review), the cheapest single improvement is almost always to pool the queue. The second cheapest is to reduce variability in how long the work takes. And the third is to back off ρ from 1, because you cannot run any line at the edge of its capacity and expect it to feel anything but miserable.
Next time you’re in line for too long, you have a new hobby. You’ll be amazed how often the answer is “they didn’t pool the queue.”