DinhDev

Many developers are familiar with statements like "Node.js is single-threaded" and "it handles concurrent tasks through the event loop." However, the detailed workings of the event loop and why Node only needs a single thread to execute JavaScript might not be widely understood. Even after studying this topic, I think many dev still struggle to explain it clearly during interviews.

That's why I've researched this topic in depth to provide you with the most detailed explanation possible. Through this article, I'll help you:

Understanding the Event Loop Mechanism

The Event Loop is the heart of Node.js. In reality, Node.js isn't simply "single-threaded" as many believe. Although your JavaScript code runs on a main thread, behind the scenes, Node.js uses multiple threads to handle I/O tasks and timers.

We'll dive deep into the Event Loop architecture, its different phases such as timers, pending callbacks, idle & prepare, poll, check, and close callbacks. You'll understand why Node.js can handle thousands of concurrent connections without creating a new thread for each connection like traditional models.

The Role of libuv, Thread Pool, and epoll API

Behind Node.js's "magic" is the libuv library - a cross-platform library focused on asynchronous I/O. Libuv provides Node.js with a thread pool to handle time-consuming tasks like file reading/writing, DNS lookups, etc. Libuv also leverages the epoll API (on Linux) - a highly efficient I/O multiplexing mechanism that allows a process to monitor multiple file descriptors simultaneously.

Let's explore the initial phase

Initial Phase

This is the first phase, occurring before the event loop initialization. This phase runs exactly once.

Node executes the entire source code sequentially from top to bottom. All callbacks from I/O or timers are registered in their respective phases and won't run immediately.

If you load modules using require(), they will be loaded synchronously in this phase.

Next, the Event loop is initialized, and the main thread's journey officially begins. The Event loop consists of 6 main phases that repeat continuously until it can stop.

1. First Phase: Timer

As its name suggests, the timer phase executes timer callbacks (setTimeout and setInterval).

Node manages timer callbacks in a separate thread. First, it arranges all timers based on the duration value, then uses a sleep-and-wake mechanism when the duration expires, marks the timer as ready, goes back to sleep, and continues this process until all timers are processed.

Meanwhile, in the main thread, each time the event loop reaches the timer phase, it checks if any timers are marked as ready. If so, it executes their callbacks. Otherwise, it moves to the next phase.

2. Pending callback:

This phase is used to execute some low-priority callbacks from the previous loop, especially handling errors such as:

TCP error callbacks
UDP error callbacks
Other error callbacks from the system

Example of pending callbacks:

Network operation executed in the poll phase
It fails → schedule callback for pending callback phase
When running to the next loop, the pending callback phase takes the error and processes its callback (retry, for instance).

3. Idle - Prepare Phase

This phase allows Node to perform some of its background work and prepare for the next very important phase - the Poll phase.

This phase can be overridden with C++ Extensions. Indeed, we can write C++ extensions for Node.

4. Poll Phase

The Poll phase processes callbacks for I/O operations and is a crucial phase of the event loop.

The two main tasks of the poll phase include:

Polling for I/O operations:

Listening for socket connections
Receiving new connections
Reading/writing files

Executing callbacks when data is ready from I/O operations, for example:

Successfully reading a file
Successfully connecting to a server
Successfully receiving data from a connection
(And many other tasks)

Another task of the poll phase is handling dynamic imports. Imports need to read files from the system, which Node treats as I/O operations.

Node.js I/O Operations Handling Mechanism

Handling file operations

Under the hood, Node uses libuv to handle these operations. Libuv has a separate thread to process files, preventing the main thread from being blocked.

When this thread finishes processing a file, the callback is queued, and the main thread executes it in the poll phase.

In another case, if we create a stream to read a file, when libuv finishes processing a chunk, it queues the callback along with the chunk, which will also be executed in the poll phase.

Handling network operations

Reading and writing to network connections are essentially blocking operations. If we process them sequentially, one connection at a time, it would cause significant blocking.

This is where epoll (Linux), kqueue (macOS), and iocp (Windows) are leveraged by Node to handle asynchronous operations.

With epoll on Linux, Node registers connections through epoll_wait. If a connection has data ready, the Linux kernel notifies Node, which then executes that connection's callback immediately.

Interestingly, epoll_wait supports timeouts. For example, we can set up that if after waiting 1 second no connection has data, Node will move to the next phase.

In reality, Node will block at the poll phase waiting for data from connections. However, it handles this as follows:

If there's a timer scheduled to execute in the timer phase, Node will wait for epoll_wait for a specific amount of time, then move to the next phase.
If there are no timers, Node will block at this phase until a connection has data, receives a new connection, etc.

Summary of poll phase operation:

It's the phase where most tasks are executed
Executes I/O operations and callbacks
It calculates how long to wait before moving to the next phase

5. Check Phase

This phase always executes after the poll phase and provides a call for developers to register callbacks (setImmediate).

If you have an operation in the poll phase that's too large and you want to break it down, you can schedule it to the check phase.

6. Final Phase: Close Callback

This phase is used to execute callbacks when an I/O is closed. The main purpose is to clean up resources.

A separate phase is needed for this activity because closing connections also takes time. For example, TCP requires multiple close connection requests (FIN, FIN-ACK, FIN, ACK), which are executed in the poll phase. However, their callbacks when successfully closed are executed in the Close Callback phase.

The second reason is that close events have low priority, making them perfectly suitable for execution at the end of the event loop.

Summary

I hope that through this article, you now have a deeper understanding of how Node.js handles I/O operations, timers, and runs multiple tasks concurrently.

I believe there are other concepts in Node.js that you might be curious about too, such as promises, async/await, or garbage collection. I'll address these in another article, along with their positions in the event loop.

See you next time!