Async programming, or, why node.js is so popular?

Asynchronous programming is all the rage these days. One of the most prevalent tools is node.js.

But what is it, and why do we need it? First, any programming language, be it async or not, does all the computing synchronously. Computing, the operation of generating output from a given input, is always performed synchronously. It can be extended to one or many other functions, it can be done all in the same function – it doesn't matter. The async operation is referring to i/o. Data processing, once data is present, is done synchronously. However, pulling data from the network, file or a database(=network) is an IO operation, and this is what it's all about.

From single all capable process, to async programming

Very old programmers (like me) remember the "good" old DOS days. You had one process and that process has all the system resources. there were no threads, no cores, only a single CPU was present (in most configurations) and there were no limitations as to what your program can and can't access, including hardware ports. If you wrote a program, your program has to scan the keyboard for user input, then go elsewhere to get some input from a hardware device, process it, and again to the user. since the scanning took place all the time, a program like this would occupy up to 100% of the cpu upon running.

Later on, around the mid 90', A whole new (and better) approach was employed: the system is responsible for the working jobs. The system decides what time frame every job has, and switches to another job after the time frame ended.

Client server suites back then

Client server programs that appeared then, quickly took advantage over the new model. The server listened on one thread, and opened a thread for every new connection. The OS then switched to the right thread, based on the input. This model was working very well for a few years, until the number of users grew, and systems showed strange slow down.

The problem with the sync client/server

For most of the programs , the server is waiting for a client input, then processes it, and / or share it with the other clients. The problem? switching from one thread to another (this is called context switching) takes time. It also takes CPU power. And if you have a lot of clients (=a lot of threads) the switching time might actually eat more cpu then the actual processing. So, the load grew linearly to a certain point and then, exponentially, since context switching (saving all the current thread information and loading all the information for the next one) was now competing with actual processing. Add limited memory to the mix, and you got paging into context. Performance? Horrible. This problem was spotted around 2001, and few, very similar solutions were developed for it. On Linux, the polling system was developed. On MS Windows, MS developed IOCP, all with the same concept. Mac OS has a similar solution, but never have I used a mac as a server, So I won't refer to that OS.

The solution

The solution was very simple. Since one core (CPU at that time) can only perform one operation in a single point in time, having multiple threads running on it will just cause context switching which will be costly in a large scale. Instead, The OS will listen to io events, and queue them. once a thread is ready to deal with them, the thread will pull them from the queue. The solution was revolutionary, but, hard to implement. many companies and projects continued to use the old model, even though the new model offered only advantages. Better scaling and no downsides as far as the computer goes – on a low load – both programs will operate with the same efficiency. When the load grows, the async model has a serious advantage, as no context switching will occur. However, beside few high profile services, not many users took advantage of this model. First, at that time, only C/C++ developer could have employed it, the model was not as clear and simple as the thread/client model, and new web tools like PHP were taking programmers out of system programming. Later on, both C# and Java (java.nio) added support for this new model. Still, it wasn't that popular till NodeJS came along.

The promise of NodeJS

NodeJS, brought the async programming into the hands of the script programmer. Not only that, the language chosen for the job was a well-known async in nature language, JavaScript. In order to setup a fully working (though not very functional, but still) web server, you just need to following lines:

const http = require('http')
const port = 3000

const requestHandler = (request, response) => {
  console.log(request.url)
  response.end('Hello Node.js!')
}

const server = http.createServer(requestHandler)

server.listen(port, (err) => {
  if (err) {
    return console.log('oops... an error', err)
  }

  console.log(`server is listening on ${port}`)
})

I don't even going to try and paste the code in C++ requires to achieve the same task. JavaScript is not only easy, it's a language used for client side scripting in browsers. Not only programmers know it, it also means that now you can use one language to write both sides. And, you don't have to handle the asynchronicity – you just use callbacks (these days many syntactic sugar solutions are available to ease the pain of callback hell). NodeJS is also single threaded – simply put – you don't have to manage memory locks in multiple threads.

Other languages like Python and Ruby got there libraries as well, but NodeJS is the only wide use development tool that come like this right out of the box.

Issues with NodeJS

Like any other tool, NodeJS is not without issues.

  1. NodeJS is single threaded. To exploit the power of a multiple processing units, you must use a module, and this module will fork a separate process for each processing unit. This is not high performing as threads, and while you don't have to lock memory, you can't share memory between threads.
  2. Like all dynamic languages, performances are not as good as there compiled counterparts. One would argue though, that this is a small price to pay for the ease of use.
  3. Like all dynamic languages, NodeJS doesn't have a strong syntax. That means that a variable can change types, what can (and probably will) lead to difficulties solving bugs. Unit tests are nice and great, but statically typed language has a clear advantage here.
  4. Since NodeJS has its own runtime, bugs inside the runtime itself can happen
  5. Memory consumption will be higher then complied languages, but, again, with the prices of cloud computing, this might not be a real issue.

Leave a Comment