Async programming, or, why node.js is so popular?

Asynchronous programming is all the rage these days. One of the most prevalent tools is node.js.

But what is it, and why do we need it? First, any programming language, be it async or not, does all the computing synchronously. Computing, the operation of generating output from a given input, is always performed synchronously. It can be extended to one or many other functions, it can be done all in the same function – it doesn't matter. The async operation is referring to i/o. Data processing, once data is present, is done synchronously. However, pulling data from the network, file or a database(=network) is an IO operation, and this is what it's all about.

From single all capable process, to async programming

Very old programmers (like me) remember the "good" old DOS days. You had one process and that process has all the system resources. there were no threads, no cores, only a single CPU was present (in most configurations) and there were no limitations as to what your program can and can't access, including hardware ports. If you wrote a program, your program has to scan the keyboard for user input, then go elsewhere to get some input from a hardware device, process it, and again to the user. since the scanning took place all the time, a program like this would occupy up to 100% of the cpu upon running.

Later on, around the mid 90', A whole new (and better) approach was employed: the system is responsible for the working jobs. The system decides what time frame every job has, and switches to another job after the time frame ended.

Client server suites back then

Client server programs that appeared then, quickly took advantage over the new model. The server listened on one thread, and opened a thread for every new connection. The OS then switched to the right thread, based on the input. This model was working very well for a few years, until the number of users grew, and systems showed strange slow down.

The problem with the sync client/server

For most of the programs , the server is waiting for a client input, then processes it, and / or share it with the other clients. The problem? switching from one thread to another (this is called context switching) takes time. It also takes CPU power. And if you have a lot of clients (=a lot of threads) the switching time might actually eat more cpu then the actual processing. So, the load grew linearly to a certain point and then, exponentially, since context switching (saving all the current thread information and loading all the information for the next one) was now competing with actual processing. Add limited memory to the mix, and you got paging into context. Performance? Horrible. This problem was spotted around 2001, and few, very similar solutions were developed for it. On Linux, the polling system was developed. On MS Windows, MS developed IOCP, all with the same concept. Mac OS has a similar solution, but never have I used a mac as a server, So I won't refer to that OS.

The solution

The solution was very simple. Since one core (CPU at that time) can only perform one operation in a single point in time, having multiple threads running on it will just cause context switching which will be costly in a large scale. Instead, The OS will listen to io events, and queue them. once a thread is ready to deal with them, the thread will pull them from the queue. The solution was revolutionary, but, hard to implement. many companies and projects continued to use the old model, even though the new model offered only advantages. Better scaling and no downsides as far as the computer goes – on a low load – both programs will operate with the same efficiency. When the load grows, the async model has a serious advantage, as no context switching will occur. However, beside few high profile services, not many users took advantage of this model. First, at that time, only C/C++ developer could have employed it, the model was not as clear and simple as the thread/client model, and new web tools like PHP were taking programmers out of system programming. Later on, both C# and Java (java.nio) added support for this new model. Still, it wasn't that popular till NodeJS came along.

The promise of NodeJS

NodeJS, brought the async programming into the hands of the script programmer. Not only that, the language chosen for the job was a well-known async in nature language, JavaScript. In order to setup a fully working (though not very functional, but still) web server, you just need to following lines:

const http = require('http')
const port = 3000

const requestHandler = (request, response) => {
  console.log(request.url)
  response.end('Hello Node.js!')
}

const server = http.createServer(requestHandler)

server.listen(port, (err) => {
  if (err) {
    return console.log('oops... an error', err)
  }

  console.log(`server is listening on ${port}`)
})

I don't even going to try and paste the code in C++ requires to achieve the same task. JavaScript is not only easy, it's a language used for client side scripting in browsers. Not only programmers know it, it also means that now you can use one language to write both sides. And, you don't have to handle the asynchronicity – you just use callbacks (these days many syntactic sugar solutions are available to ease the pain of callback hell). NodeJS is also single threaded – simply put – you don't have to manage memory locks in multiple threads.

Other languages like Python and Ruby got there libraries as well, but NodeJS is the only wide use development tool that come like this right out of the box.

Issues with NodeJS

Like any other tool, NodeJS is not without issues.

  1. NodeJS is single threaded. To exploit the power of a multiple processing units, you must use a module, and this module will fork a separate process for each processing unit. This is not high performing as threads, and while you don't have to lock memory, you can't share memory between threads.
  2. Like all dynamic languages, performances are not as good as there compiled counterparts. One would argue though, that this is a small price to pay for the ease of use.
  3. Like all dynamic languages, NodeJS doesn't have a strong syntax. That means that a variable can change types, what can (and probably will) lead to difficulties solving bugs. Unit tests are nice and great, but statically typed language has a clear advantage here.
  4. Since NodeJS has its own runtime, bugs inside the runtime itself can happen
  5. Memory consumption will be higher then complied languages, but, again, with the prices of cloud computing, this might not be a real issue.

Migrating from Mongo DB to Dynamo DB

Mongo is great. I'm using it in more then one project, and I love it.

Is there a real reason to switch to Dynamo db? Well, there are few:

  1. Mongo is a memory hog. This means you have to maintain pretty big instances in order to keep it fast and happy.

  2. Servers cost money. Not only the hourly fee, but maintenance as well. The lean startup couldn't probably afford these, and bigger companies might want the easy scalability that comes with Dynamo. Scaling Mongo is not hard, but it’s yet another thing to do.

  3. Serverless. Using Node JS and Serverless, allows you to run a whole infrastructure without… well… an infrastructure. This is huge as you don’t have to maintain anything.

So, how do we migrate?

  1. Indexes. While Mongo enforces record ID, Dynamo enforces at least one index, with the option of another one. Migration here is pretty easy: create the main index as string, and add a random string to it upon inserting a new record:

    function createUsersTable(callback) {
        let params = {
            TableName: TABLE_NAME,
            KeySchema: [{
                AttributeName: "user_id",
                KeyType: "HASH"
            }, ],
            AttributeDefinitions: [{
                AttributeName: "user_id",
                AttributeType: "S"
            }],
            ProvisionedThroughput: {
                "ReadCapacityUnits": 5,
                "WriteCapacityUnits": 5
            }
        }
        dynamodb.createTable(params, (err, data) => {
            callback(err, data)
        });
    }

     

  2. Dynamo is able to offer a schema of some sort. You don’t have to use it, but if you can, you can guarantee a certain consistency within the records.

  3. Dynamo requires types when storing data. But, If you’re using NodeJS, you’re in luck. AWS.DynamoDB.DocumentClient will extract these for you, resulting in an extremely similar manner to what you know and love from mongo:

    let dynamo = new AWS.DynamoDB.DocumentClient

    and inserting a record is as easy:

    function addUser(callback) {
        let params = {
            TableName: TABLE_NAME
        };
        let item = {
            user_id: rand.generate(),
            username: "Bick "+rand.generate(7),
            password: rand.generate(10),
            address: {
                home: "123 wrefwre,fwref",
                work: "wre 5whbwergwregwerg"
            }
        }
        params.Item = item;
        dynamo.put(params, callback);
    }

    Where rand is a module I've used to generate random strings (see full source link at the bottom).

  4. Running locally: Using Mongo locally is easy. it’s open source, and you can just install it. Dynamo DB is proprietary software, and you can’t get a copy of it. Amazon solved it by creating a Java version of the api backed by sqlite. You can now run a front of Dynamo db if you need to test your code. The only setup you need to do is the aws config:

    const credentials = {
        accessKeyId: "fakeAccessKey",
        secretAccessKey: "fakeSecretAccessKey",
        region: "fakeRegion",
        endpoint: "http://localhost:15000"
    };

     

For the local version of Dynamo db installing instructions: https://github.com/talreg/dynamodb-node/blob/master/README.md

The full sample project can be found here: https://github.com/talreg/dynamodb-node

 

Static constructor in NodeJS objects

It's not hard to create an object in JavaScript and therefore in NodeJS. However, if you're an advanced user, you probably devil a little with static objects (Factories will be the definite example). Since JavaScript doesn't really comes with static constructors, we need to take advantage of NodeJS require feature.

About require

require is a nodejs specific. The cool thing about it, is that it is only called once. so, every code that executed there, is executed only once, what's making it a perfect place for static initialization.  Lets take a look at some code:

function Counter()
{
this.counter=0;
Counter.__counters++;
}

What we have here is a simple increase of the number of objects created. However, since Counter.__counters is not defined, we will get an error. sure, we can check that this variable exists in the object constructor, and currently this is not a big issue, but if the test is a timely manner or a costly one, we have a problem. Using NodeJS feature, we can solve it easily:


Counter.__counters=0;

function Counter() 
{ 
this.counter=0;
Counter.__counters++; 
}

The first line will be called only once, and thus make it a static constructor. This line can be replace in a function call, if we wish to make it neater, and the effect will remain the same.