MongoDB Replica Set Installation on Debian Squeeze

How to setup MongoDB with replication.

Created: Thu 08 September 2011 / Last updated: Wed 26 October 2011

Why a MongoDB Replica Set?

MongoDB is a document storage server. It is very comfortable to work with it because of the flexibility in which one can store, retrieve and update documents in the database while keeping relatively good performances. As with any system, one needs to be aware of the limitations to correctly use it.

A very nice thing of MongoDB is the support of replica sets. A replica set is a basically a couple of servers put together to replicate and serve the data. Simple replication was at the core of the design decisions of MongoDB and as such, it makes the setup really easy.

The logic behind replicat set is to always have at least 3 warm copies of the data at anytime. It does not exempt you from doing backups, but it allows you to have continuity of service in case of a one replica failure. This is good when you want to sleep at night.

Base Requirements

  • 3 servers or virtual machines with Debian Squeeze installed;
  • a private network for the MongoDB related communication;
  • a bit of relaxed time in front of you to iron out things;
  • bonus: a tool like fabric to not repeat yourself.

I like to create automated scripts to rebuild automatically the services after a failure, you should do the same. As I like Python, I am using fabric.

The setup is done with MongoDB 1.8.3 but is the same with 2.0.1. You can read the upgrade procedure from 1.8 to 2.0 near the end of this note.

Debian Squeeze Setup

The base Debian Squeeze is extremely easy to setup, you can refer to the Ganeti notes to have an idea of how the setup is done, what is important is that:

  • each VM has a public IP address and can access the Internet to easily download software, perform updates;
  • each VM has a private IP address to communicate with the other MongoDB VMs and with the administration backend.

To save IPv4 in the future, non critical VMs should get only a private IP and use a proxy to access the outside world.

MongoDB Setup

The setup is performed directly from the tarball. It may be seen as strange to use the tarball when Debian and Ubuntu packages are available, put I prefer to be able to run on the version I want and upgrade by simply downloading and copying the new binaries at the right place.

The setup is simple:

  1. create the mongodb user under which MongoDB will run;
  2. create the directories to store the data and the logs;
  3. copy the configuration and the init script at the right place;
  4. install the init script.

It is maybe one of the simplest database server to install, with a Fabric recipe, it takes me about 5 seconds per host to have MongoDB installed and configured including the download time.

MongoDB User Creation

Under Debian, this is simply a call to adduser:

adduser --system --shell /bin/sh --gecos 'MongoDB user' --group \
    --disabled-password --home /home/mongodb mongodb

Directories to Store Data and Logs

You must not forget to change the ownership of the directories to the mongodb user or MongoDB will not be able to start:

mkdir -p /var/lib/mongodb && chown mongodb.mongodb /var/lib/mongodb
mkdir -p /var/log/mongodb && chown mongodb.mongodb /var/log/mongodb

As I extract the MongoDB archive in /home/mongodb, I need to symlink the mongod binary to have it in the path at the right place:

ln -s /home/mongodb/bin/mongod /usr/bin/mongod

Copy the Configuration and Init Script

As you noticed when creating the log and data path, by default Debian does not follow the same conventions as MongoDB, that is, the databases are stored in /var/lib/mongodb and not /data/db. From a sysadmin point of view, it is easier to follow the Debian way for everything, so here is my base configuration, please note the bind_ip to have MongoDB open only on the private network:

dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
# Fork is done by the init script
fork=false
# Enables periodic logging of CPU utilization and I/O wait
cpu = true
# Disable the HTTP interface (Defaults to localhost:27018).
nohttpinterface = false
rest = true
bind_ip = %s
directoryperdb = true
# in replica set configuration, specify the name of the replica set
replSet = set1

Yes, the configuration is simple, data durability is maintained using a replica set with 3 nodes and without journaling.

The MongoDB init script is simply copied as /etc/init.d/mongodb and installed using the new dependency based boot system. I am using a small variation of the one provided by default. The difference is that I let MongoDB control the PID file. You can get the file at the end of this document.

# insserv mongodb

You can now start MongoDB on each server and go to the next step, get the configuration of the replica set.

Replica Set Configuration

First, you need to be sure that your VMs can in a way or another resolve the IP addresses of all the servers involved in your replica set. You can either do that with a DNS entry or less flexible in your /etc/hosts file. This is up to you.

The configuration is done by updating the admin database.

$ ./mongo  192.168.1.120/admin
MongoDB shell version: 1.8.3
connecting to: 192.168.1.120/admin

Notice that in the configuration, I force mongod to bind only on the private network, this is why now, I need to give the IP when connecting or it would default to localhost.

In the configuration, we define the name of the set as set1, so the configuration is really simple, we first create a cfg object:

> cfg = {
... _id : "set1",
... members : [
... { _id : 0, host : "mset1a.ipr.ceondo.net" },
... { _id : 1, host : "mset1b.ipr.ceondo.net" },
... { _id : 2, host : "mset1c.ipr.ceondo.net" } ] }

And then you initiate with this configuration. You do this only on one server, the configuration will be distributed to the others automatically.

> rs.initiate(cfg)
{
    "info" : "Config now saved locally.  Should come online in about a minute.",
    "ok" : 1
}
> rs.status()
{
    "set" : "set1",
    "date" : ISODate("2011-09-08T12:59:04Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 0,
            "name" : "mset1a.ipr.ceondo.net",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "optime" : {
                "t" : 1315486715000,
                "i" : 1
            },
            "optimeDate" : ISODate("2011-09-08T12:58:35Z"),
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "mset1b.ipr.ceondo.net",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 23,
            "optime" : {
                "t" : 1315486715000,
                "i" : 1
            },
            "optimeDate" : ISODate("2011-09-08T12:58:35Z"),
            "lastHeartbeat" : ISODate("2011-09-08T12:59:03Z")
        },
        {
            "_id" : 2,
            "name" : "mset1c.ipr.ceondo.net",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 21,
            "optime" : {
                "t" : 1315486715000,
                "i" : 1
            },
            "optimeDate" : ISODate("2011-09-08T12:58:35Z"),
            "lastHeartbeat" : ISODate("2011-09-08T12:59:03Z")
        }
    ],
    "ok" : 1
}
set1:PRIMARY>

Enjoy, your replica set is now running. It is time for you to dive into all the documentation regarding backup and monitoring on the MongoDB website.

Security Considerations

By default, MongoDB binds on all the interfaces, do not forget to restrict to an IP of your private network.

Upgrading

When note specified, the database storage is compatible from version n to n + 0.2, so the upgrade is mostly painless.

  1. Make a backup of your database;
  2. Download and extract the new version;
  3. Change the symlink to point to the new mongod;
  4. Restart all the mongod one by one. This will fail the master over a secondary and keep your replica set up.

Rotation of the Log Files

You want rotation of your log files to avoid saturation of your /var partition. These settings are rotating every week the logs. Because copytruncate is used, you may lose some information, but this is not really a problem as normally, a MongoDB cluster is monitored by connecting to the server itself and running stats queries.

Just put in /etc/logrotate.d/mongodb the following:

/var/log/mongodb/*.log {
       weekly
       rotate 10
       copytruncate
       delaycompress
       compress
       notifempty
       missingok
}

The log files will compress very well has they contain a lot of redundant information, for example on an idle system, I get thousands of:

Wed Oct 26 11:09:52 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Wed Oct 26 11:09:56 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Wed Oct 26 11:10:00 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Wed Oct 26 11:10:04 [snapshotthread] cpu: elapsed:4000  writelock: 0%
Wed Oct 26 11:10:08 [snapshotthread] cpu: elapsed:4000  writelock: 0%

MongoDB Debian Init.d File

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
#!/bin/sh
#
# init.d script with LSB support.
#
# Copyright (c) 2007 Javier Fernandez-Sanguino <jfs@debian.org>
#
# This is free software; you may redistribute it and/or modify
# it under the terms of the GNU General Public License as
# published by the Free Software Foundation; either version 2,
# or (at your option) any later version.
#
# This is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License with
# the Debian operating system, in /usr/share/common-licenses/GPL;  if
# not, write to the Free Software Foundation, Inc., 59 Temple Place,
# Suite 330, Boston, MA 02111-1307 USA
#
### BEGIN INIT INFO
# Provides:          mongodb
# Required-Start:    $network $local_fs $remote_fs
# Required-Stop:     $network $local_fs $remote_fs
# Should-Start:      $named
# Should-Stop:
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: An object/document-oriented database
# Description:       MongoDB is a high-performance, open source, schema-free 
#                    document-oriented  data store that's easy to deploy, manage
#                    and use. It's network accessible, written in C++ and offers
#                    the following features:
#                    
#                       * Collection oriented storage - easy storage of object-
#                         style data
#                       * Full index support, including on inner objects
#                       * Query profiling
#                       * Replication and fail-over support
#                       * Efficient storage of binary data including large 
#                         objects (e.g. videos)
#                       * Auto-sharding for cloud-level scalability (Q209)
#                    
#                    High performance, scalability, and reasonable depth of
#                    functionality are the goals for the project.
### END INIT INFO

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/bin/mongod
DESC=database

# Default defaults.  Can be overridden by the /etc/default/$NAME
NAME=mongodb
CONF=/etc/mongodb.conf
DATA=/var/lib/mongodb
LOGDIR=/var/log/mongodb
PIDFILE=/var/lib/mongodb/mongod.lock
LOGFILE=$LOGDIR/$NAME.log  # Server logfile
ENABLE_MONGODB=yes

# Include mongodb defaults if available
if [ -f /etc/default/$NAME ] ; then
    . /etc/default/$NAME
fi

if test ! -x $DAEMON; then
    echo "Could not find $DAEMON"
    exit 0
fi

if test "x$ENABLE_MONGODB" != "xyes"; then
    exit 0
fi

if test ! -x $DATA; then
    mkdir $DATA || exit 0
fi

. /lib/lsb/init-functions

STARTTIME=1
DIETIME=10                   # Time to wait for the server to die, in seconds
                            # If this value is set too low you might not
                            # let some servers to die gracefully and
                            # 'restart' will not work

DAEMONUSER=${DAEMONUSER:-mongodb}
DAEMON_OPTS=${DAEMON_OPTS:-"--dbpath $DATA --logpath $LOGFILE run"}
DAEMON_OPTS="$DAEMON_OPTS --config $CONF"

set -e


running_pid() {
# Check if a given process pid's cmdline matches a given name
    pid=$1
    name=$2
    [ -z "$pid" ] && return 1
    [ ! -d /proc/$pid ] &&  return 1
    cmd=`cat /proc/$pid/cmdline | tr "\000" "\n"|head -n 1 |cut -d : -f 1`
    # Is this the expected server
    [ "$cmd" != "$name" ] &&  return 1
    return 0
}

running() {
# Check if the process is running looking at /proc
# (works for all users)

    # No pidfile, probably no daemon present
    [ ! -f "$PIDFILE" ] && return 1
    pid=`cat $PIDFILE`
    running_pid $pid $DAEMON || return 1
    return 0
}

start_server() {
# Start the process using the wrapper
            start-stop-daemon --background --start --quiet --pidfile $PIDFILE \
                        --chuid $DAEMONUSER \
                        --exec $DAEMON -- $DAEMON_OPTS
            errcode=$?
    return $errcode
}

stop_server() {
# Stop the process using the wrapper
            start-stop-daemon --stop --quiet --pidfile $PIDFILE \
                        --retry 300 \
                        --user $DAEMONUSER \
                        --exec $DAEMON
            errcode=$?
    return $errcode
}

force_stop() {
# Force the process to die killing it manually
    [ ! -e "$PIDFILE" ] && return
    if running ; then
        kill -15 $pid
    # Is it really dead?
        sleep "$DIETIME"s
        if running ; then
            kill -9 $pid
            sleep "$DIETIME"s
            if running ; then
                echo "Cannot kill $NAME (pid=$pid)!"
                exit 1
            fi
        fi
    fi
    rm -f $PIDFILE
}


case "$1" in
  start)
    log_daemon_msg "Starting $DESC" "$NAME"
        # Check if it's running first
        if running ;  then
            log_progress_msg "apparently already running"
            log_end_msg 0
            exit 0
        fi
        if start_server ; then
            # NOTE: Some servers might die some time after they start,
            # this code will detect this issue if STARTTIME is set
            # to a reasonable value
            [ -n "$STARTTIME" ] && sleep $STARTTIME # Wait some time 
            if  running ;  then
                # It's ok, the server started and is running
                log_end_msg 0
            else
                # It is not running after we did start
                log_end_msg 1
            fi
        else
            # Either we could not start it
            log_end_msg 1
        fi
    ;;
  stop)
        log_daemon_msg "Stopping $DESC" "$NAME"
        if running ; then
            # Only stop the server if we see it running
            errcode=0
            stop_server || errcode=$?
            log_end_msg $errcode
        else
            # If it's not running don't do anything
            log_progress_msg "apparently not running"
            log_end_msg 0
            exit 0
        fi
        ;;
  force-stop)
        # First try to stop gracefully the program
        $0 stop
        if running; then
            # If it's still running try to kill it more forcefully
            log_daemon_msg "Stopping (force) $DESC" "$NAME"
            errcode=0
            force_stop || errcode=$?
            log_end_msg $errcode
        fi
    ;;
  restart|force-reload)
        log_daemon_msg "Restarting $DESC" "$NAME"
        errcode=0
        stop_server || errcode=$?
        # Wait some sensible amount, some server need this
        [ -n "$DIETIME" ] && sleep $DIETIME
        start_server || errcode=$?
        [ -n "$STARTTIME" ] && sleep $STARTTIME
        running || errcode=$?
        log_end_msg $errcode
    ;;
  status)

        log_daemon_msg "Checking status of $DESC" "$NAME"
        if running ;  then
            log_progress_msg "running"
            log_end_msg 0
        else
            log_progress_msg "apparently not running"
            log_end_msg 1
            exit 1
        fi
        ;;
  # MongoDB can't reload its configuration.
  reload)
        log_warning_msg "Reloading $NAME daemon: not implemented, as the daemon"
        log_warning_msg "cannot re-read the config file (use restart)."
        ;;

  *)
    N=/etc/init.d/$NAME
    echo "Usage: $N {start|stop|force-stop|restart|force-reload|status}" >&2
    exit 1
    ;;
esac

exit 0

Changelog

  • Thu 8 Sep, initial version.
  • Tue 13 Sep, added the upgrade procedure.
  • Wed 26 Oct, added the logrotate information.
Fluid Phase Equilibria, Chemical Properties & Databases
Back to Top