Migrating from Mongo DB to Dynamo DB

Mongo is great. I'm using it in more then one project, and I love it.

Is there a real reason to switch to Dynamo db? Well, there are few:

  1. Mongo is a memory hog. This means you have to maintain pretty big instances in order to keep it fast and happy.

  2. Servers cost money. Not only the hourly fee, but maintenance as well. The lean startup couldn't probably afford these, and bigger companies might want the easy scalability that comes with Dynamo. Scaling Mongo is not hard, but it’s yet another thing to do.

  3. Serverless. Using Node JS and Serverless, allows you to run a whole infrastructure without… well… an infrastructure. This is huge as you don’t have to maintain anything.

So, how do we migrate?

  1. Indexes. While Mongo enforces record ID, Dynamo enforces at least one index, with the option of another one. Migration here is pretty easy: create the main index as string, and add a random string to it upon inserting a new record:

    function createUsersTable(callback) {
        let params = {
            TableName: TABLE_NAME,
            KeySchema: [{
                AttributeName: "user_id",
                KeyType: "HASH"
            }, ],
            AttributeDefinitions: [{
                AttributeName: "user_id",
                AttributeType: "S"
            }],
            ProvisionedThroughput: {
                "ReadCapacityUnits": 5,
                "WriteCapacityUnits": 5
            }
        }
        dynamodb.createTable(params, (err, data) => {
            callback(err, data)
        });
    }

     

  2. Dynamo is able to offer a schema of some sort. You don’t have to use it, but if you can, you can guarantee a certain consistency within the records.

  3. Dynamo requires types when storing data. But, If you’re using NodeJS, you’re in luck. AWS.DynamoDB.DocumentClient will extract these for you, resulting in an extremely similar manner to what you know and love from mongo:

    let dynamo = new AWS.DynamoDB.DocumentClient

    and inserting a record is as easy:

    function addUser(callback) {
        let params = {
            TableName: TABLE_NAME
        };
        let item = {
            user_id: rand.generate(),
            username: "Bick "+rand.generate(7),
            password: rand.generate(10),
            address: {
                home: "123 wrefwre,fwref",
                work: "wre 5whbwergwregwerg"
            }
        }
        params.Item = item;
        dynamo.put(params, callback);
    }

    Where rand is a module I've used to generate random strings (see full source link at the bottom).

  4. Running locally: Using Mongo locally is easy. it’s open source, and you can just install it. Dynamo DB is proprietary software, and you can’t get a copy of it. Amazon solved it by creating a Java version of the api backed by sqlite. You can now run a front of Dynamo db if you need to test your code. The only setup you need to do is the aws config:

    const credentials = {
        accessKeyId: "fakeAccessKey",
        secretAccessKey: "fakeSecretAccessKey",
        region: "fakeRegion",
        endpoint: "http://localhost:15000"
    };

     

For the local version of Dynamo db installing instructions: https://github.com/talreg/dynamodb-node/blob/master/README.md

The full sample project can be found here: https://github.com/talreg/dynamodb-node

 

Ruby on windows with mongodb

setting up ruby for windows, like anything windows these days, is more annoying then the other environments, especially Linux. Here are some key point I hope will save you some time:

  1. Ruby installer for windows, is only the start. it's located here. while you're there, don't stop with the installer: make sure to download the development kit (Development Kit section). You'll need it later on. make sure to extract it in a simple path, e.g. c:\devkit or alike. don't use spaces or special characters.
  2. Once ruby is installed, lets check the gem operation: if you can run gem update –system without an error – great, but if not, here is what you need to do: download the pem file here and save it in your rubygems/ssl_certs/ folder. now, the command should be executed correctly.
  3. lets update the system with

     

     

    gem update

     

  4. To install mongo, lets run

     

     

    gem install mongo
    
    gem install bson_ext

     

  5. The last one will install bson in C which is much faster. great? sure, but it's not going to work (Windows). so now what? first, lets go to the install folder of this gem (..lib/ruby/gems/[version]/gems/bson_ext[xxx]/ using cmd.
  6. once there open the cbson.c file that is located inside of ext/cbson folder. make sure that you have a reference to winsock2 and not arpa/inet. note that it is existed in more advanced versions, so if it's there, you don't need to change it. This is how it should looks like:

     

    #ifdef _WIN32
    #include <winsock2.h>
    #else
    #include <arpa/inet.h>
    #include <sys/types.h>
    #endif

    note that if you already have this file, your installation might actually work, so you can skip directly to the test code below.

  7. next, you need to setup your devkit installment, so go to your devkit folder, and run

    ruby dk.rb init

    . this will generate the config.yml file in that folder.

  8. edit this file, making sure that it contains the ruby path at its end, like this: – c:/ruby. note the spaces and the backslash. these are not typos.
  9. next run ruby

    dk.rb install

    .

  10. in your command window that is in the gem folder, run gem build bson_ext.gemspec.
  11. move the new gem c
  12. delete the entire bson_ext gem folder
  13. Run:

    gem install bson_ext-1.11.1.gem --local

     

  from within the folder you've saved that gem.

Starter code:

require 'rubygems'
require('mongo')
puts('testing mongo...')

if you can run this code without an error or a mongo warning claiming that you are not using bson_ext, you are good to go!

 

using mongoDB with C++

While it is very easy to connect to mongo via node.js, I wanted to write an article about using C++  to connect to this great DB. This is done under ubuntu.

Ingredients (using apt-get here will do)

  1. git
  2. scons
  3. build-utils
  4. openssl
  5. libboost1.54-all-dev (or 55, or whatever going to be the version when you read it)
Compile the driver

Don't install from repo; Drivers should be compiled and it's actually pretty easy.

  1. create the directory in which you want the driver to reside.
  2. clone this repository: git clone git@github.com:mongodb/mongo-cxx-driver.git
  3. get into the folder created by git
  4. run scons --prefix=$HOME/mongo-client-install --ssl install  to build the target. (if you are getting scons error, you are not in the folder, or you didn't install the entire list above). Let me clear this command as you run it: it will build and install from wherever you are now, to a new location, which is your home/mongo-client-install. You might wanna change that later on, but for now, this is fine.
  5. You can now use your favorite IDE to create a project to use with mongo. in your IDE, make sure you don't have any residues of old installations. If you do, remove them.
  6. make sure you are using paths to the include folder, path to the library and the library you created.
  7. more libraries you will need (you might need to adjust the names/path, under ubuntu64 this should be fine):

    1. /usr/lib/x86_64-linux-gnu/libcrypto.so
    2. /usr/lib/x86_64-linux-gnu/libssl.so
    3. /usr/lib/x86_64-linux-gnu/libboost_regex.so
  8. That's all! You are now ready to write your first program.

Test program can be found here: https://github.com/talreg/mongoclient

 

Updates:

On ubuntu 1604, you'll have gcc 5.x. the scons command should be :scons –ssl –prefix=/programs/mongocpp –c++11=on CCFLAGS="-Wno-unused-variable -Wno-maybe-uninitialized"

where prefix is where you want to driver files to be.

Creating mongodb replica set

For a matter of convenience, I’ll assume mongo is installed on Ubuntu 14.04 server.

Setting mongo replica set is quite easy. most of mongo’s settings are in the configuration file (/etc/mongodb.conf).

Setup

To setup replica, make sure that bind_ip IS NOT  set to local host.  unless you are creating replica on the same machine (which is a decent learning practice, but worth absolutely nothing in the real world). you need to add one more thing, which is the replSet instruction (note the exact syntax, it’s important). IF you are using 2.61, it’s already there. You can pick any legal name, e.g rs0.
This line should read then, replSet = rs0.

This same name should be set in every server you wish to connect. There are many types of servers that can be connected to a replica set.

Make sure all your servers  are up and running.

Login to mongo and run rs.status. you will see an error stating that the replica set is not yet initialized. run rs.initiate() to create a new empty one.

Next, you’ll need to setup server name. The server that you are setting up, is most likely to become primary, unless you’ll set a priority. Server name is usually the host name, and many times (using vpn, host name not set correctly) it’s not correct. For a test setup I do recommend virtual box host only, and setting up a static ip. so lets say we have 3 servers, 192.168.56.101 to 103. If we are setting the first one, we need to set it’s name property. Then we’ll add the other servers, which will adjust their name based on our settings, so their host name is less of an issue. To set this up, we’ll get into the first mongo instance, with mongo, and configure the replica set like so:

var conf=rs.conf();
conf.members[0].name="192.168.56.101:27017"
rs.reconfig(conf)

What We are doing here is getting an object, modifying it and then re-setting it to the replica set.  Now, we need to add the other members, like so:

rs.add("192.168.56.102:27017")
rs.add("192.168.56.103:27017")

Note that in 2.6+ you might get an error even starting the replica set when the host information is not correct. To fix it, set a server FQDN:

  1. Edit /etc/hostname, to e.g. yourhost.dyndns.org
  2. Run: hostname -F /etc/hostname
  3. IP addresses are acceptable
  4. you might also want to edit /etc/hosts



This will take few second or many minutes, all depends on the amount of data and network connectivity you have. at the any time you can issue an rs.status() command to view the current status. At the end you should see N-1 set as SECONDARY and one set as PRIMARY, usually, it will be the one that you’ve created the replica from.

Mongo users

when managing a cluster, you should setup at least one user for it. to be able to set all the aspects of a database, you need to set the following: db.addUser({user:'user',pwd:'password',roles:["root","clusterAdmin","userAdmin","userAdminAnyDatabase","readWriteAnyDatabase"]})

depends on the version of the server, root might be fine. Also, mongo 2.63 state that the createUser() function should be used now. If you are looking for other roles, you can find them here.

Security

In order to secure the sessions between the replica set members, we need to set  auth=true (this parameter is there, but disabled by default) also, if we to enable this under a replicate set, we also need to setup a key file,  with the keyFile=[path] parameter. Special note about key files: they need to have a special permission, or mongo will not start. They must be fully accessible by mongo, but not by other users. The best way to achieve that is by issuing both chown and chmod commands, the first to mongodb:mongodb and the second to 700. The : in the chown command changes the group as well. Note that these settings and the same key need to be present in all the other databases and also represented in their configuration file.

How to create a key

You must have the same key across all of your instances, also, you must have auth set to true in all of them. You will not be able to login now without your username and password.

setting replica set priority (must be done on the primary).

Testing

Well, this is pretty easy. You can shutdown one of the hosts (it’s best to test the primary) and wait. in about 10 or 15 seconds, you can issue an rs.status to any of the other instances, and see that one is now master. if you’ll boot the once was primary, and run rs.status on it, you’ll see that it’s now taking it’s place as a secondary. Magic.

If you want to run a query on one of the secondaries,  use rs.slaveOk()

Other links:

  1. changing user roles