Async programming, or, why node.js is so popular?

Asynchronous programming is all the rage these days. One of the most prevalent tools is node.js.

But what is it, and why do we need it? First, any programming language, be it async or not, does all the computing synchronously. Computing, the operation of generating output from a given input, is always performed synchronously. It can be extended to one or many other functions, it can be done all in the same function – it doesn't matter. The async operation is referring to i/o. Data processing, once data is present, is done synchronously. However, pulling data from the network, file or a database(=network) is an IO operation, and this is what it's all about.

From single all capable process, to async programming

Very old programmers (like me) remember the "good" old DOS days. You had one process and that process has all the system resources. there were no threads, no cores, only a single CPU was present (in most configurations) and there were no limitations as to what your program can and can't access, including hardware ports. If you wrote a program, your program has to scan the keyboard for user input, then go elsewhere to get some input from a hardware device, process it, and again to the user. since the scanning took place all the time, a program like this would occupy up to 100% of the cpu upon running.

Later on, around the mid 90', A whole new (and better) approach was employed: the system is responsible for the working jobs. The system decides what time frame every job has, and switches to another job after the time frame ended.

Client server suites back then

Client server programs that appeared then, quickly took advantage over the new model. The server listened on one thread, and opened a thread for every new connection. The OS then switched to the right thread, based on the input. This model was working very well for a few years, until the number of users grew, and systems showed strange slow down.

The problem with the sync client/server

For most of the programs , the server is waiting for a client input, then processes it, and / or share it with the other clients. The problem? switching from one thread to another (this is called context switching) takes time. It also takes CPU power. And if you have a lot of clients (=a lot of threads) the switching time might actually eat more cpu then the actual processing. So, the load grew linearly to a certain point and then, exponentially, since context switching (saving all the current thread information and loading all the information for the next one) was now competing with actual processing. Add limited memory to the mix, and you got paging into context. Performance? Horrible. This problem was spotted around 2001, and few, very similar solutions were developed for it. On Linux, the polling system was developed. On MS Windows, MS developed IOCP, all with the same concept. Mac OS has a similar solution, but never have I used a mac as a server, So I won't refer to that OS.

The solution

The solution was very simple. Since one core (CPU at that time) can only perform one operation in a single point in time, having multiple threads running on it will just cause context switching which will be costly in a large scale. Instead, The OS will listen to io events, and queue them. once a thread is ready to deal with them, the thread will pull them from the queue. The solution was revolutionary, but, hard to implement. many companies and projects continued to use the old model, even though the new model offered only advantages. Better scaling and no downsides as far as the computer goes – on a low load – both programs will operate with the same efficiency. When the load grows, the async model has a serious advantage, as no context switching will occur. However, beside few high profile services, not many users took advantage of this model. First, at that time, only C/C++ developer could have employed it, the model was not as clear and simple as the thread/client model, and new web tools like PHP were taking programmers out of system programming. Later on, both C# and Java (java.nio) added support for this new model. Still, it wasn't that popular till NodeJS came along.

The promise of NodeJS

NodeJS, brought the async programming into the hands of the script programmer. Not only that, the language chosen for the job was a well-known async in nature language, JavaScript. In order to setup a fully working (though not very functional, but still) web server, you just need to following lines:

const http = require('http')
const port = 3000

const requestHandler = (request, response) => {
  console.log(request.url)
  response.end('Hello Node.js!')
}

const server = http.createServer(requestHandler)

server.listen(port, (err) => {
  if (err) {
    return console.log('oops... an error', err)
  }

  console.log(`server is listening on ${port}`)
})

I don't even going to try and paste the code in C++ requires to achieve the same task. JavaScript is not only easy, it's a language used for client side scripting in browsers. Not only programmers know it, it also means that now you can use one language to write both sides. And, you don't have to handle the asynchronicity – you just use callbacks (these days many syntactic sugar solutions are available to ease the pain of callback hell). NodeJS is also single threaded – simply put – you don't have to manage memory locks in multiple threads.

Other languages like Python and Ruby got there libraries as well, but NodeJS is the only wide use development tool that come like this right out of the box.

Issues with NodeJS

Like any other tool, NodeJS is not without issues.

  1. NodeJS is single threaded. To exploit the power of a multiple processing units, you must use a module, and this module will fork a separate process for each processing unit. This is not high performing as threads, and while you don't have to lock memory, you can't share memory between threads.
  2. Like all dynamic languages, performances are not as good as there compiled counterparts. One would argue though, that this is a small price to pay for the ease of use.
  3. Like all dynamic languages, NodeJS doesn't have a strong syntax. That means that a variable can change types, what can (and probably will) lead to difficulties solving bugs. Unit tests are nice and great, but statically typed language has a clear advantage here.
  4. Since NodeJS has its own runtime, bugs inside the runtime itself can happen
  5. Memory consumption will be higher then complied languages, but, again, with the prices of cloud computing, this might not be a real issue.

About HTTP and the importance of Close()

Go, the so called system language from Google (I have checked the performances against C++, go loses.), has some very comfortable features dealing with http. When a strong type language is required, which now is more often then not, I found myself drawn to it. It is simple, compile very fast (one of the points where it beats C++ with ease). Go has both simple http client and server. They both have the same comfort level that Node js has, with far better performances, data sharing, and actual multithreading (Go uses frames, but this goes beyond the scope of this post).

To open a simple get connection to a server, we can just do this:

resp, err := http.Get(address)

where resp is the response object. We can get the status code, and also the body of the request, by using

resp.Body

Server side

Server side is not much harder.

first we need a function (or more) to server content to the client:

func handler(w http.ResponseWriter, r *http.Request) {
	w.Header().Set("Content-Type", "text/text; charset=UTF-8")
  fmt.Fprintf(w, "Hello world!")
}

Then we just start the server with that function:

listening_port:=3000
http.HandleFunc("/", handler)
http.ListenAndServe(fmt.Sprintf(":%d", listening_port), nil)

In both cases, client and server, there is a serious bug: the body is not being closed.

Now this code will work, the only question is, how long. Sooner or later, a socket limit error is going to show it's face, and the program will crash. This error happens quite a bit. The usual suggestion is to increase the file limit (Linux), which will solve it, for a little while, till the new limit is met.

So what the body has to do with that? well, the body is a stream object, io.ReadCloser to be exact, and it will not pull all the content for you and just store it in a buffer. There might be a lot of it, and you might don't want to do that. Therefore, once a body was received, it must be closed.

defer to the help

Luckily enough, go has a defer keyword, which will execute a command for you upon function exit, thus relieves you from figuring out exactly where to close the Body.

here is the correct function version:

func handler(w http.ResponseWriter, r *http.Request) {
 w.Header().Set("Content-Type", "text/text; charset=UTF-8") 
 fmt.Fprintf(w, "Hello world!") 
 r.Body.Close()
}

note that here, defer was not needed. you should also call body.Close on the client side as well.

resp, err := http.Get(address)
	if err != nil {
}else{        
	defer resp.Body.Close()
}

 

Migrating from Mongo DB to Dynamo DB

Mongo is great. I'm using it in more then one project, and I love it.

Is there a real reason to switch to Dynamo db? Well, there are few:

  1. Mongo is a memory hog. This means you have to maintain pretty big instances in order to keep it fast and happy.

  2. Servers cost money. Not only the hourly fee, but maintenance as well. The lean startup couldn't probably afford these, and bigger companies might want the easy scalability that comes with Dynamo. Scaling Mongo is not hard, but it’s yet another thing to do.

  3. Serverless. Using Node JS and Serverless, allows you to run a whole infrastructure without… well… an infrastructure. This is huge as you don’t have to maintain anything.

So, how do we migrate?

  1. Indexes. While Mongo enforces record ID, Dynamo enforces at least one index, with the option of another one. Migration here is pretty easy: create the main index as string, and add a random string to it upon inserting a new record:

    function createUsersTable(callback) {
        let params = {
            TableName: TABLE_NAME,
            KeySchema: [{
                AttributeName: "user_id",
                KeyType: "HASH"
            }, ],
            AttributeDefinitions: [{
                AttributeName: "user_id",
                AttributeType: "S"
            }],
            ProvisionedThroughput: {
                "ReadCapacityUnits": 5,
                "WriteCapacityUnits": 5
            }
        }
        dynamodb.createTable(params, (err, data) => {
            callback(err, data)
        });
    }

     

  2. Dynamo is able to offer a schema of some sort. You don’t have to use it, but if you can, you can guarantee a certain consistency within the records.

  3. Dynamo requires types when storing data. But, If you’re using NodeJS, you’re in luck. AWS.DynamoDB.DocumentClient will extract these for you, resulting in an extremely similar manner to what you know and love from mongo:

    let dynamo = new AWS.DynamoDB.DocumentClient

    and inserting a record is as easy:

    function addUser(callback) {
        let params = {
            TableName: TABLE_NAME
        };
        let item = {
            user_id: rand.generate(),
            username: "Bick "+rand.generate(7),
            password: rand.generate(10),
            address: {
                home: "123 wrefwre,fwref",
                work: "wre 5whbwergwregwerg"
            }
        }
        params.Item = item;
        dynamo.put(params, callback);
    }

    Where rand is a module I've used to generate random strings (see full source link at the bottom).

  4. Running locally: Using Mongo locally is easy. it’s open source, and you can just install it. Dynamo DB is proprietary software, and you can’t get a copy of it. Amazon solved it by creating a Java version of the api backed by sqlite. You can now run a front of Dynamo db if you need to test your code. The only setup you need to do is the aws config:

    const credentials = {
        accessKeyId: "fakeAccessKey",
        secretAccessKey: "fakeSecretAccessKey",
        region: "fakeRegion",
        endpoint: "http://localhost:15000"
    };

     

For the local version of Dynamo db installing instructions: https://github.com/talreg/dynamodb-node/blob/master/README.md

The full sample project can be found here: https://github.com/talreg/dynamodb-node

 

Installing light ide for go

The new google go language has a nice IDE to program with, light IDE.

The download link is here: http://sourceforge.net/projects/liteide/

Once untared, you might go in to an issue where the program seems to start and immediately dies (in Linux). That qt's doing. remove any library (lib folder) that has qt in it's name, and you should be done.

 

Ruby on windows with mongodb

setting up ruby for windows, like anything windows these days, is more annoying then the other environments, especially Linux. Here are some key point I hope will save you some time:

  1. Ruby installer for windows, is only the start. it's located here. while you're there, don't stop with the installer: make sure to download the development kit (Development Kit section). You'll need it later on. make sure to extract it in a simple path, e.g. c:\devkit or alike. don't use spaces or special characters.
  2. Once ruby is installed, lets check the gem operation: if you can run gem update –system without an error – great, but if not, here is what you need to do: download the pem file here and save it in your rubygems/ssl_certs/ folder. now, the command should be executed correctly.
  3. lets update the system with

     

     

    gem update

     

  4. To install mongo, lets run

     

     

    gem install mongo
    
    gem install bson_ext

     

  5. The last one will install bson in C which is much faster. great? sure, but it's not going to work (Windows). so now what? first, lets go to the install folder of this gem (..lib/ruby/gems/[version]/gems/bson_ext[xxx]/ using cmd.
  6. once there open the cbson.c file that is located inside of ext/cbson folder. make sure that you have a reference to winsock2 and not arpa/inet. note that it is existed in more advanced versions, so if it's there, you don't need to change it. This is how it should looks like:

     

    #ifdef _WIN32
    #include <winsock2.h>
    #else
    #include <arpa/inet.h>
    #include <sys/types.h>
    #endif

    note that if you already have this file, your installation might actually work, so you can skip directly to the test code below.

  7. next, you need to setup your devkit installment, so go to your devkit folder, and run

    ruby dk.rb init

    . this will generate the config.yml file in that folder.

  8. edit this file, making sure that it contains the ruby path at its end, like this: – c:/ruby. note the spaces and the backslash. these are not typos.
  9. next run ruby

    dk.rb install

    .

  10. in your command window that is in the gem folder, run gem build bson_ext.gemspec.
  11. move the new gem c
  12. delete the entire bson_ext gem folder
  13. Run:

    gem install bson_ext-1.11.1.gem --local

     

  from within the folder you've saved that gem.

Starter code:

require 'rubygems'
require('mongo')
puts('testing mongo...')

if you can run this code without an error or a mongo warning claiming that you are not using bson_ext, you are good to go!