MONGODB COOKBOOK PDF

adminComment(0)

you enjoy exploring these recipes and make some great restaurant meals at home for your Appendix The Everything Restau. Application Performance Optimization Summary. Contribute to sjtuhjh/appdocs development by creating an account on GitHub. MongoDB Cookbook – PDF Books. PDF Building Node Applications With Mongodb And Backbone · The 36 Hour Course Online Marketing.


Mongodb Cookbook Pdf

Author:ROXY SZESTERNIAK
Language:English, Arabic, German
Country:Malta
Genre:Health & Fitness
Pages:657
Published (Last):14.12.2015
ISBN:489-2-51839-513-7
ePub File Size:26.38 MB
PDF File Size:13.31 MB
Distribution:Free* [*Registration Required]
Downloads:30584
Uploaded by: PENELOPE

1 MongoDB Cookbook Second Edition Over 80 comprehensive recipes that will . eBook versions of every book published, with PDF and ePub iles available?. Media, Inc. MongoDB: The Definitive Guide, Second Edition, the image of a mongoose lemur, and related trade dress are trademarks of. This is an autogenerated index file. Please create a /home/docs/checkouts/ kaz-news.info

Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufcient RAM resulting in page faults page 10 and disk reads.

Memory Usage MongoDB uses memory mapped les to store data. Given a data set of sufcient size, the MongoDB process will allocate all available memory on the system for its use. While this is part of the design, and affords MongoDB superior performance, the memory mapped les make it difcult to determine if the amount of RAM is sufcient for the data set.

The memory usage statuses metrics of the serverStatus output can provide insight into MongoDBs mem- ory use. Check the resident memory use i. You should also check the amount of mapped memory i. If this value is greater than the amount of systemmemory, some operations will require disk access page faults to read data fromvirtual memory and negatively affect performance. Page Faults Page faults can occur as MongoDB reads from or writes data to parts of its data les that are not currently located in physical memory.

In contrast, operating system page faults happen when physical memory is exhausted and pages of physical memory are swapped to disk. Page faults triggered by MongoDB are reported as the total number of page faults in one second. MongoDB on Windows counts both hard and soft page faults.

The MongoDB page fault counter may increase dramatically in moments of poor performance and may correlate with limited physical memory environments. Page faults also can increase while accessing much larger data sets, for example, scanning an entire collection. Limited and sporadic MongoDB page faults do not necessarily indicate a problem or a need to tune the database.

However, in aggregate, large volumes of page faults typically indicate that MongoDB is reading too much data from disk. In many situations, MongoDBs read locks will yield after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory.

This approach improves concurrency, and also improves overall throughput in high volume systems. If this is not possible, you may want to consider deploying a sharded cluster or adding shards to your deployment to distribute load among mongod instances.

See faq-storage-page-faults for more information. Number of Connections In some cases, the number of connections between the application layer i. This can produce performance irregularities.

The following elds in the serverStatus document can provide insight: globalLock. If requests are high because there are numerous concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy applications increase the size of your replica set and distribute read operations to secondary members.

For write heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances. Spikes in the number of connections can also be the result of application or driver errors. All of the ofcially supported MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efciently.

Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or other conguration error. Unless constrained by system-wide limits MongoDB has no limit on incoming connections. Database Proling MongoDBs Proler is a database proling system that can help identify inefcient queries and operations. The following proling levels are available: Level Setting 1 On.

Only includes slow operations 2 On. Includes all operations Enable the proler by setting the profile value using the following command in the mongo shell: db.

To set the threshold above which the proler considers operations slow and thus, included in the level 1 proling data , you can cong- ure slowOpThresholdMs at runtime as an argument to the db. See The documentation of db. At www.

TM https: PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why Subscribe? Simply use your login credentials for immediate access. Table of Contents Preface v Chapter 1: Installing and Starting the Server 1 Introduction 2 Installing single node MongoDB 2 Starting a single node instance using command-line options 3 Single node installation of MongoDB with options from the conig ile 6 Connecting to a single node in the Mongo shell with JavaScript 7 Connecting to a single node using a Java client 10 Connecting to a single node using a Python client 15 Starting multiple instances as part of a replica set 17 Connecting to the replica set in the shell to query and insert data 22 Connecting to the replica set to query and insert data from a Java client 24 Connecting to the replica set to query and insert data using a Python client 28 Starting a simple sharded environment of two shards 30 Connecting to a shard in the shell and performing operations 35 Chapter 2: Command-line Operations and Indexes 39 Introduction 39 Creating test data 39 Performing simple querying, projections, and pagination from Mongo shell 41 Updating and deleting data from the shell 43 Creating index and viewing plans of queries 45 Creating a background and foreground index in the shell 51 Creating and understanding sparse indexes 55 Expiring documents after a ixed interval using the TTL index 58 Expiring documents at a given time using the TTL index 61 Chapter 3: Administration 93 Introduction 94 Renaming a collection 94 Viewing collection stats 96 Viewing database stats 99 Manually padding a document The mongostat and mongotop utilities Getting current executing operations and killing them Using proiler to proile operations Setting up users in Mongo Interprocess security in Mongo Modifying collection behavior using the collMod command Setting up MongoDB as a windows service Replica set conigurations Stepping down as primary from the replica set Exploring the local database of a replica set Understanding and analyzing oplogs Building tagged replica sets Coniguring the default shard for non-sharded collections Manual split and migration of chunks Domain-driven sharding using tags Exploring the conig database in a sharded setup Chapter 5: It has an edge over the majority of NoSQL solutions for its ease of use, high performance, and rich features.

This book provides detailed recipes that describe how to use the different features of MongoDB. The recipes cover topics ranging from setting up MongoDB, knowing its programming language API, and monitoring and administration, to some advanced topics such as cloud deployment, integration with Hadoop, and some open source and proprietary tools for MongoDB.

The recipe format presents the information in a concise, actionable form; this lets you refer to the recipe to address and know the details of just the use case in hand without going through the entire book. It will demonstrate how to start the server in the standalone mode, as a replica set, and as a shard, with the provided start up options from the command line or coniguration ile.

Chapter 2, Command-line Operations and Indexes, has simple recipes to perform CRUD operations in the Mongo shell and create various types of indexes in the shell. Though Mongo supports a vast array of languages, we will look at how to use the drivers to connect to the MongoDB server from Java and Python programs only.

This chapter also explores the MongoDB wire protocol used for communication between the server and programming language clients. Chapter 4, Administration, contains many recipes for administration or your MongoDB deployment. This chapter covers a lot of frequently used administrative tasks such as viewing the stats of the collections and database, viewing and killing long-running operations and other replica sets, and sharding-related administration.

We will look at some of the slightly advanced features such as implementing server-side scripts, geospatial search, GridFS, full text search, and how to integrate MongoDB with an external full text search engine. Chapter 6, Monitoring and Backups, tells you all about administration and some basic monitoring.

In this chapter, we will look at some recipes around monitoring and backup using MMS. Chapter 9, Open Source and Proprietary Tools, is about using frameworks and products built around MongoDB to improve a developer's productivity or about simplifying some of the day-to-day jobs using Mongo. Appendix, Concepts for Reference, gives you a bit of additional information on the write concern and read preference for reference. What you need for this book The version of MongoDB used to try out the recipes is 3.

The recipes are good for version 2.

In case of some special feature speciic to version 2. Unless explicitly mentioned, all commands should be executed on Ubuntu Linux. The samples where Java programming was involved were tested and run on Java Version 1.

For MongoDB drivers, you can choose to use the latest available version. These are pretty common types of software, and their minimum versions are used across different recipes. All the recipes in this book will mention the required software to complete it and their respective versions.

Some recipes need to be tested on a Windows system, while some on Linux. It is also for those who know the basics of MongoDB and would like to expand their knowledge. The audience of this book is expected to have at least some basic knowledge of MongoDB. Conventions In this book, you will ind a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

In this book, you will ind a number of text styles that distinguish between different kinds of information.

Here are some examples of these styles and an explanation of their meaning. Code words in text, database table names, folder names, ilenames, ile extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: DB; import com. DBCollection; import com. DBObject; import com. MongoClient; Any command-line input or output is written as follows: Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Warnings or important notes appear in a box like this.

Tips and tricks appear like this. Let us know what you think about this book— what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of. To send us general feedback, simply send an e-mail to feedback packtpub. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.

Customer support Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your download. Downloading the example code You can download the example code iles for all Packt books you have downloadd from your account at http: If you downloadd this book elsewhere, you can visit http: Errata Although we have taken every care to ensure the accuracy of our content, mistakes do happen.

If you ind a mistake in one of our books—maybe a mistake in the text or the code— we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you ind any errata, please report them by visiting http: Once your errata are veriied, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title.

Any existing errata can be viewed by selecting your title from http: At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at copyright packtpub. We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions You can contact us at questions packtpub. Though it is a cakewalk to start the server with default settings for development purposes, there are numerous options available to ine-tune the start up behavior.

We will start the server as a single node and then introduce various coniguration options. We will conclude this chapter by setting up a simple replica set and running a sharded cluster. So, let's get started with installing and setting up the MongoDB server in the easiest way possible for simple development purposes. This is the simplest and quickest way to start a MongoDB server, but it is seldom used for production use cases.

However, this is the most common way to start the server for development purposes. In this recipe, we will start the server without looking at a lot of other startup options. Getting ready Well, assuming that we have downloaded the MongoDB binaries from the download site, extracted it, and have the resulting bin directory in the operating system's path variable.

This is not mandatory, but it really becomes convenient after doing so. The binaries can be downloaded from http: How to do it… 1.

This will be our database directory, and it needs to have permission to write to it by the mongod the mongo server process process.

Despite the simplicity in starting the server, there are a lot of coniguration options that can be used to tune the behavior of the server on startup. Most of the default options are sensible and need not be changed.

With the default values, the server should be listening to port for new connections, and the logs will be printed out to the standard output.

See also There are times where we would like to conigure some options on server startup. In the Installing single node MongoDB recipe, we will use some more start up options. Starting a single node instance using command-line options In this recipe, we will see how to start a standalone single node server with some command- line options. We will see an example where we want to do the following: We will soon see what this means.

Getting ready If you have already seen and executed the Installing single node MongoDB recipe, you need not do anything different. If all these prerequisites are met, we are good for this recipe. Execute the following command: MongoDB actually supports quite a few options at startup, and we will see a list of the most common and important ones in my opinion: Option Description --help or -h This is used to print the information of various start up options available.

We will see more on this option in a later recipe. It is just a convenient way of specifying the configurations in a file rather than on the command prompt; especially when the number of options specified is more. Using a separate configuration file shared across different MongoDB instances will also ensure that all the instances are running with identical configurations. It will keep the logs less chatty and clean. We would be frequently using this option whenever we are looking to start multiple mongo servers on the same machine, for example, --port will start the server listening to port for new connections.

Remember that the value provided should be a file and not a directory where the logs will be written. The default behavior is to rename the existing log file and then create a new file for the logs of the currently started mongo instance.

Suppose that we have used the name of the log file as server. The time is GMT as against the local time. Let's assume that the current date is October 28th, and time is Note that the value should be a directory rather than the name of the file. Mongo, on startup, creates a database file of size 64 MB on bit machines. This preallocation happens for performance reasons, and the file is created with zeros written to it to fill out space on the disk.

Adding this option on startup creates a preallocated file of 16 MB only again, on a bit machine. This option also reduces the maximum size of the database and journal files. Avoid using this option for production deployments. Additionally, the file sizes double to a maximum of 2 GB by default.

If the --smallfile option is chosen, it goes up to a maximum of MB. The value of this arg is the name of the replica set, for example, --replSet repl1.

You will learn more on this option in a later recipe where we will start a simple mongo replica set. The role of the configuration server will be made clearer when we set up a simple sharded environment in a later recipe in this chapter.

By giving this option, the server also listens to port instead of the default We will know more on this option when we start a simple sharded server. It is a capped collection where the data being written to the primary instances is stored in order to be replicated to the secondary instances. This collection resides in a database named local. On initialization of the replica set, the disk space for oplog is preallocated, and the database file for the local database is filled with zeros as placeholders.

The size of oplog is crucial because capped collections are of a fixed size and they discard the oldest documents in them on exceeding their size, thereby making space for new documents.

Having a very small oplog size can result in data being discarded before being replicated to secondary nodes. A large oplog size can result in unnecessary disk space utilization and large duration for the replica set initialization.

For development purposes, when we start multiple server processes on the same host, we might want to keep the oplog size to a minimum value, quickly initiate the replica set, and use minimum disk space. The previous default storage engine is now called mmapv1. This option allows you to store each database in its own subdirectory in the aforementioned data directory.

Having such granular control allows you to have separate disks for each database. There's more… For an exhaustive list of options that are available, use the --help or -h option.

This list of options is not exhaustive, and we will see some more coming up in later recipes as and when we need them. In the next recipe, we will see how to use a coniguration ile instead of the command-line arguments. See also f Single node installation of MongoDB with options from conig ile for using coniguration iles to provide start up options f Starting multiple instances as part of a replica set to start a replica set f Starting a simple sharded environment of two shards to set up a sharded environment Single node installation of MongoDB with options from the conig ile As we can see, providing options from the command line does the work, but it starts getting awkward as soon as the number of options that we provide increase.

We have a nice and clean alternative to provide the start up options from a coniguration ile rather than as command-line arguments.

Getting ready If you have already executed the Installing single node MongoDB recipe, you need not do anything different as all the prerequisites of this recipe are the same. Create a coniguration ile that can have any arbitrary name.

We then edit the ile and add the following lines to it: Start the mongo server using the following command: We are just providing them in a coniguration ile instead. If you have not visited the previous recipe, I would recommend you to do so as that is where we discussed some of the common command-line options.

For all the properties that don't have values, for example, the smallfiles option, the value given is a Boolean value, true. If you already know what the command-line option is, then it is pretty easy to guess what the value of the property is in the ile. It is almost the same as the command-line option with just the hyphen removed. Connecting to a single node in the Mongo shell with JavaScript This recipe is about starting the mongo shell and connecting to a MongoDB server.

Here we also demonstrate how to load JavaScript code in the shell. Though this is not always required, it is handy when we have a large block of JavaScript code with variables and functions with some business logic in them that is required to be executed from the shell frequently and we want these functions to be available in the shell always. To start a server on the localhost without much of a hassle, take a look at the irst recipe, Installing single node MongoDB, and start the server.

First, we create a simple JavaScript ile and call it hello. Type the following body in the hello. This can be saved at any other location too. On the command prompt, execute the following: On executing this, we should see the following printed to our console: MongoDB shell version: Test the database that the shell is connected to by typing the following command: Now, type the following command in the shell: You should get the following response: Hello Fred, how are you?

This book was written with MongoDB version 3. There is a good chance that you may be using a later version and hence see a different version number in the mongo shell. There could be multiple functions in the. On executing the mongo command without any arguments, we connect to the MongoDB server running on localhost and listen for new connections on the default port Generally speaking, the format of the command is as follows: Let's look at some example values of the db address command-line option and its interpretation: This will connect to the server running on localhost and listen for a connection on port The database connected will be mydb.

This will connect to the server running on mongo. The database connected will be the default database test. Now, there are quite a few options available on the mongo client too. We will see a few of them in the following table: Option Description --help or -h This shows help regarding the usage of various command-line options.

Providing this option ensures that the shell remains running after the JavaScript files execute. All the functions and variables defined in these. As in the preceding case, the sayHello function defined in the JavaScript file is available in the shell for invocation. If the db address is provided with the hostname, port, and database, then both the --host and --port options need not be specified.

It is used to provide the or -u username of the user to be logged in. It is used to or -p provide the password of the user to be logged in. You will repeatedly refer to this recipe while working on others, so read it very carefully.

Getting ready The following are the prerequisites for this recipe: Version 3. Alternatively, you may choose an appropriate local repository accessible to you from your computer. Take a look at the irst recipe, Installing single node MongoDB, and start the server.

Install the latest version of JDK from https: We will not be going through the steps to install JDK in this recipe, but before moving on with the next step, JDK should be present. Maven needs to be downloaded from http: We should see something similar to the following image on the download page. Choose the binaries in a.

This recipe is executed on a machine running on the Windows platform and thus these steps are for installation on Windows. Once the archive has been downloaded, we need to extract it and put the absolute path of the bin folder in the extracted archive in the operating system's path variable. Remember to set the root of your JDK as the value of this variable. All we need to do now is type mvn -version on the command prompt, and if we see the output that begins with something as follows, we have successfully set up maven: At this stage, we have maven installed, and we are now ready to create our simple project to write our irst Mongo client in Java.

We start by creating a project folder. Let's say that we create a folder called Mongo Java. The root of the project folder then contains a ile called pom. Once this folder's creation is done, the folder structure should look as follows: We just have the project skeleton with us.

We shall now add some content to the pom. Not much is needed for this. The following content is all we need in the pom. We inally write our Java client that will be used to connect to the Mongo server and execute some very basic operations. BasicDBObject; import com. MongoClient; import java. UnknownHostException; import java. It's now time to execute the preceding Java code. We will execute it using maven from the shell. You should be in the same directory as pom.

FirstMongoClient How it works… These were quite a lot of steps to follow. Let's look at some of them in more detail. Everything up to step 6 is straightforward and doesn't need any explanation. Let's look at step 7 onwards. The pom. We deined a dependency on mongo's Java driver. It relies on the online repository, repo.

For a local repository, all we need to do is deine the repositories and pluginRepositories tags in pom. For more information on maven, refer to the maven documentation at http: For the Java class, the org. MongoClient class is the backbone.

We irst instantiate it using one of its overloaded constructors giving the server's host and port. In this case, the hostname and port were not really needed as the values provided are the default values anyway, and the no-argument constructor would have worked well too. The following code snippet instantiates this client: This is returned as an object of the com.

DB type.

Note that this database might not exist, yet getDB will not throw any exception. Instead, the database will get created whenever we add a new document to the collection in this database. Similarly, getCollection on the DB object will return an object of the com.

DBCollection type representing the collection in the database. This too might not exist in the database and will get created on inserting the irst document automatically.

The following two code snippets from our class show you how to get an instance of DB and DBCollection: The collection is dropped using the drop method on the DBCollection object's instance. Next, we create an instance of com. This is an object that represents the document to be inserted into the collection. The concrete class used here is BasicDBObject, which is a type of java. LinkedHashMap, where the key is String and the value is Object.

The value can be another DBObject too, in which case, it is a document nested within another document. In our case, we have two keys, name and age, which are the ield names in the document to be inserted and the values are of the String and Integer types, respectively. The append method of BasicDBObject adds a new key value pair to the BasicDBObject instance and returns the same instance, which allows us to chain the append method calls to add multiple key value pairs.

This created DBObject is then inserted into the collection using the insert method. This is how we instantiated DBObject for the person collection and inserted it into the collection as follows: This version of findOne doesn't accept DBObject which otherwise acts as a query executed before a document is selected and returned as a parameter. This is synonymous to doing db. Finally, we simply invoke getDatabaseNames to get a list of databases' names in the server. At this point of time, we should at least be having test and the local database in the returned result.

Once all the operations are complete, we close the client. The MongoClient class is thread-safe and generally one instance is used per application. To execute the program, we use the maven's exec plugin.

On executing step 9, we should see the following lines toward the end in the console: Tue May 12 With Python's simple syntax and versatility clubbed together with MongoDB, many programmers ind that this stack allows faster prototyping and reduced development cycles. Python MongoDB driver. You can use the following command to install pip: Install the latest PyMongo driver using pip: Run the script using the following command: Next, we import pymongo so that it can be used in the script.

We instantiate pymongo. MongoClient with localhost and as the mongo server host and port, respectively. In our recipe, we used the client handler to select the database test simply by referring to client. This returns a database object even if the database does not exist. As a part of this recipe, we drop the collection by calling testdb. For this recipe, we are intentionally dropping the collection so that recurring runs will always yield one record in the collection.

Next, we instantiate a dictionary called employee with a few values such as name and age. This method returns the irst document in the collection, depending on the order of documents stored on the disk. This method returns a list of database names present on the server. This method may come in handy when you are trying to assert the existence of a database on the server. Finally, we close the client connection using the close method.

Starting multiple instances as part of a replica set In this recipe, we will look at starting multiple servers on the same host but as a cluster. Starting a single mongo server is enough for development purposes or non-mission-critical applications.

For crucial production deployments, we need the availability to be high, where if one server instance fails, another instance takes over and the data remains available to query, insert, or update. Clustering is an advanced concept and we won't be doing justice by covering this whole concept in one recipe. Here, we will be touching the surface and going into more detail in other recipes in the administration section later in the book.

In this recipe, we will start multiple mongo server processes on the same machine for the purpose of testing. In a production environment, they will be running on different machines or virtual machines in the same or even different data centers. As the name suggests, it is a set of servers that are replicas of each other in terms of data. Looking at how they are kept in sync with each other and other internals is something we will defer to some later recipes in the administration section, but one thing to remember is that write operations will happen only on one node, which is the primary one.

All the querying also happens from the primary by default, though we may permit read operations on secondary instances explicitly. An important fact to remember is that replica sets are not meant to achieve scalability by distributing the read operations across various nodes in a replica set.

Its sole objective is to ensure high availability. Getting ready Though not a prerequisite, taking a look at the Starting a single node instance using command-line options recipe will deinitely make things easier just in case you are not aware of various command-line options and their signiicance while starting a mongo server. Additionally, the necessary binaries and setups as mentioned in the single server setup must be done before we continue with this recipe.

Let's sum up on what we need to do. We will start three mongod processes mongo server instances on our localhost. The following image will give you an idea on how the cluster would look: On the Windows platform, you can choose the c: Ensure that these directories have appropriate write permissions for the mongo server to write the data and logs.

Start the three servers as follows. Users on the Windows platform need to skip the --fork option as it is not supported: Start the mongo shell and connect to any of the mongo servers running.

In this case, we connect to the irst one listening to port Try to execute an insert operation from the mongo shell after connecting to it: More information can be found in the How it works… section. The next step is to start coniguring the replica set. We start by preparing a JSON coniguration in the shell as follows: The last step is to initiate the replica set with the preceding coniguration as follows: Execute rs.

In a few seconds, one of them should become a primary and the remaining two should become secondary. As we are starting three independent mongod services, we have three dedicated database paths on the ilesystem. Similarly, we have three separate log ile locations for each of the processes.

We then start three mongod processes with the database and log ile path speciied. As this setup is for test purposes and is started on the same machine, we use the --smallfiles and --oplogSize options. As these processes are running on the same host, we also choose the ports explicitly to avoid port conlicts.

The ports that we chose here were , , and When we start the servers on different hosts, we may or may not choose a separate port. We can very well choose to use the default one whenever possible. The --fork option demands some explanation.

By choosing this option, we start the server as a background process from our operating system's shell and get the control back in the shell where we can then start more such mongod processes or perform other operations. In the absence of the --fork option, we cannot start more than one process per shell and would need to start three mongod processes in three separate shells. If we take a look at the logs generated in the log directory, we should see the following lines in it: This command-line option is just used to tell the server on startup that this process will be running as a part of a replica set.

The name of the replica set is the same as the value of this option passed on the command prompt. This also explains why the insert operation executed on one of the nodes failed before the replica set was initialized.

In mongo replica sets, there can be only one primary node where all the inserting and querying happens. In the image shown, the N1 node is shown as the primary and listens to port for client connections. It is only when the primary goes down that one of the secondary takes over and becomes a primary node.

However, it is possible to query the secondary for data as we have shown in the image; we will see how to query from a secondary instance in the next recipe. This is done by irst deining a JSON object as follows: Using localhost to refer to the host is not a very good idea and is usually discouraged; however, in this case, as we started all the processes on the same machine, we are ok with it. It is preferred that you refer to the hosts by their hostnames even if they are running on localhost.

Note that you cannot mix referring to the instances using localhost and hostnames both in the same coniguration. It is either the hostname or localhost. To conigure the replica set, we then connect to any one of the three running mongod processes; in this case, we connect to the irst one and then execute the following from the shell: Not giving the same value would throw the following error: It should now become a primary or secondary.

The following is an example of the shell connected to a primary member of the replica set: Replication isn't as simple as we saw here.

Related Interests

See the administration section for more advanced recipes on replication. See also If you are looking to convert a standalone instance to a replica set, then the instance with the data needs to become a primary irst, and then empty secondary instances will be added to which the data will be synchronized.

Refer to the following URL on how to perform this operation: In this recipe, we will work with this setup by connecting to it using the mongo client application, perform queries, insert data, and take a look at some of the interesting aspects of a replica set from a client's perspective. Getting ready The prerequisite for this recipe is that the replica set should be set up and running.

Refer to the previous recipe, Starting multiple instances as part of a replica set, for details on how to start the replica set. Execute the following command on the command prompt: It should show the replica set's name followed by a: In this case, if the replica set is initialized, up, and running, we should see either repSetTest: Suppose that the irst server we connected to is a secondary, we need to ind the primary.

Execute the rs. This should give us the primary server. Use the mongo shell to connect to this server. At this point, we should be having two shells running, one connected to a primary and another connected to a secondary.

In the shell connected to the primary node, execute the following insert: There is nothing special about this. We just inserted a small document in a collection that we will use for the replication test.

By executing the following query on the primary, we should get the following result: So far, so good. Now execute the following on the console: Execute the query that we executed in step 7 again on the shell. This should now get the results as follows: Execute the following insert on the secondary node; it should not succeed with the following message: The architecture of a Mongo replica set is made of one primary just one, no more, no less and multiple secondary nodes.

Note that replication is not a mechanism to distribute the read request load that enables scaling the system. Its primary intent is to ensure high availability of data. By default, we are not permitted to read data from the secondary nodes. In step 6, we simply insert data from the primary node and then execute a query to get the document that we inserted.

This is straightforward and nothing related to clustering here. Just note that we inserted the document from the primary and then queried it back. In the next step, we execute the same query but this time, from the secondary's shell. There might be a small lag in replicating the data possibly due to heavy data volumes to be replicated, network latency, or hardware capacity to name a few of the causes, and thus, querying on the secondary might not relect the latest inserts or updates made on the primary.

However, if we are ok with it and can live with the slight lag in the data being replicated, all we need to do is enable querying on the SECONDARY node explicitly by just executing one command, rs. Once this is done, we are free to execute queries on the secondary nodes too. Finally, we try to insert the data into a collection of the slave node. Under no circumstances is this permitted, regardless of whether we have done rs.

When rs. All write operations still have to go to the primary and then low down to the secondary. The internals of replication will be covered in a different recipe in the administration section. See also The next recipe, Connecting to the replica set to query and insert data from a Java client, is about connecting to a replica set from a Java client.

Connecting to the replica set to query and insert data from a Java client In this recipe, we will demonstrate how to connect to a replica set from a Java client and how the client would automatically failover to another node in the replica set, should a primary node fail.

As we are dealing with a Java client for replica sets, a replica set must be up and running. Refer to the Starting multiple instances as part of a replica set recipe for details on how to start the replica set.

Welcome to the MongoDB Docs

This Java class is also available for download from the Packt website. MongoClient; import com. ServerAddress; import java. Connect to any of the nodes in the replica set, say to localhost: Take a note of the primary instance in the replica set and connect to it from the shell if localhost: Here, switch to the administrator database as follows: We now execute the preceding program from the operating system shell as follows: ReplicaSetMongoClient 4.

Shut down the primary instance by executing the following on the mongo shell that is connected to the primary: Watch the output on the console where the com.

ReplicaSetMongoClient class is executed using maven. How it works… An interesting thing to observe is how we instantiate the MongoClient instance. It is done as follows: This class has a lot of overloaded constructors but we choose to use the one that takes the hostname and then port.

What we have done is provided all the server details in a replica set as a list. MongoClient is intelligent enough to igure this out and connect to the appropriate instance. The list of servers provided is called the seed list. It need not contain an entire set of servers in a replica set though the objective is to provide as much as we can. MongoClient will igure out all the server details from the provided subset.

For example, if the replica set is of ive nodes but we provide only three servers, it works ine. On connecting with the provided replica set servers, the client will query them to get the replica set metadata and igure out the rest of the provided servers in the replica set.

In the preceding case, we instantiated the client with three instances in the replica set. If the replica set was to have ive members, then instantiating the client with just three of them is still good enough and the remaining two instances will be automatically discovered.

Next, we start the client from the command prompt using maven. Once the client is running in the loop, we bring down the primary instance to ind one document. We should see something as the following output to the console: Server seen down: Software caused connection abort: However, the client switched to the new primary seamlessly. Well, nearly seamlessly, as the client might have to catch an exception and retry the operation after a predetermined interval has elapsed.

Dasadia Cyrus, Nayak Amol. MongoDB Cookbook

Connecting to the replica set to query and insert data using a Python client In this recipe, we will demonstrate how to connect to a replica set using a Python client and how the client would automatically failover to another node in the replica set, should a primary node fail. Additionally, a replica set must be up and running. This script is also available for download from the Packt website.

MongoClient ['localhost: Take a note of the primary instance in the replica set and connect to it from the shell, if localhost: We now execute the preceding script from the operating system shell as follows: Watch the output on the console where the Python script is executed.

How it works… You will notice that, in this script, we instantiated the mongo client by giving a list of hosts instead of a single host. As of version 3. The client will attempt to connect to the irst host in the list, and if successful, will be able to determine the other nodes in the replica set. Once connected, we perform normal database operations such as selecting the test database, dropping the repTest collection, and inserting a single document into the collection.

Each time, we fetch the record, display it, and sleep for three seconds. While the script is in this loop, we shut down the primary node in the replica set as mentioned in step 4. We should see an output similar to this: Fetching record: ObjectId 'bfaafdfce 1a1' , u'name': However, very soon, a new primary node is selected by the remaining nodes and the mongo client is able to resume the connection.

Starting a simple sharded environment of two shards In this recipe, we will set up a simple sharded setup made up of two data shards. There will be no replication conigured as this is the most basic shard setup to demonstrate the concept.

We won't be getting deep into the internals of sharding, which we will explore more in the administration section. Here is a bit of theory before we proceed. Scalability and availability are two important cornerstones to build any mission-critical application.

Availability is something that was taken care of by the replica sets, which we discussed in previous recipes in this chapter. Let's look at scalability now. Simply put, scalability is the ease with which the system can cope with increasing data and request load.

Consider an e-commerce platform. On regular days, the number of hits to the site and load is fairly modest and the system's response times and error rates are minimal. This is subjective. Now, consider the days where the system load becomes twice, thrice, or even more than that of an average day's load, say on Thanksgiving day, Christmas, and so on. If the platform is able to deliver similar levels of service on these high load days as on any other day, the system is said to have scaled up well to the sudden increase in the number of requests.

For each request hitting the website, we create a new record in the underlying data store. Suppose that each record is of bytes with an average load of three million requests per day, we will cross 1 TB of the data mark in about ive years.

This data would be used for various analytics purposes and might be frequently queried. The query performance should not be drastically affected when the data size increases.

If the system is able to cope with this increasing data volume and still give decent performance comparable to performance on low data volumes, the system is said to have scaled up well.

Now that we have seen in brief what scalability is, let me tell you that sharding is a mechanism that lets a system scale to increasing demands. The crux lies in the fact that the entire data is partitioned into smaller segments and distributed across various nodes called shards. Suppose that we have a total of 10 million documents in a mongo collection. At a given point of time, only one document will reside on one shard which by itself will be a replica set in a production system.

However, there is some magic involved that keeps this concept hidden from the developer who is querying the collection and who gets one uniied view of the collection irrespective of the number of shards. Based on the query, it is mongo that decides which shard to query for the data and returns the entire result set. With this background, let's set up a simple shard and take a closer look at it. Getting ready Apart from the MongoDB server already installed, no prerequisites are there from a software perspective.

We will be creating two data directories, one for each shard. There will be a directory for the data and one for logs. We start by creating directories for the logs and data.

On Windows, we can have c: There is also a coniguration server that is used in the sharded environment to store some metadata. Start the following mongod processes, one for each of the two shards, one for the coniguration database, and one mongos process.

For the Windows platform, skip the --fork parameter as it is not supported. From the command prompt, execute the following command. This should show a mongos prompt as follows: Finally, we set up the shard. From the mongos shell, execute the following two commands: On each addition of a shard, we should get an ok reply.

It is not a recommended approach and is discouraged. The better approach would be to use hostnames even if they are local processes. How it works… Let's see what all we did in the process. We created three directories for data two for the shards and one for the coniguration database and one directory for logs.

We can have a shell script or batch ile to create the directories as well. In fact, in large production deployments, setting up shards manually is not only time-consuming but also error-prone.

The following is an image of the shard setup that we just did: Shard 1 Shard 2 Config mongos Client 1 Client n If we look at the preceding image and the servers started in step 2, we have shard servers that would store the actual data in the collections. These were the irst two of the four processes that we started listening to ports and Next, we started a coniguration server that is seen on the left side in this image.

It is the third server of the four servers started in step 2 and it listens to port for the incoming connections. The sole purpose of this database is to maintain the metadata about the shard servers.

We will see what a shard key is in the next recipe, where we play around a sharded collection and see the shards that we have created in action. Finally, we have a mongos process. This is a lightweight process that doesn't do any persistence of data and just accepts connections from clients. This is the layer that acts as a gatekeeper and abstracts the client from the concept of shards. For now, we can view it as basically a router that consults the coniguration server and takes the decision to route the client's query to the appropriate shard server for execution.

It then aggregates the result from various shards if applicable and returns the result to the client. It is safe to say that no client connects directly to the coniguration or shard servers; in fact, no one ideally should connect to these processes directly except for some administration operations.

Clients simply connect to the mongos process and execute their queries and insert or update operations. On starting up the mongos process, we provided it with the details of the coniguration server. What about the two shards that would be storing the actual data?

However, the two mongod processes started as shard servers are not yet declared anywhere as shard servers in the coniguration. This is exactly what we do in the inal step by invoking sh. The mongos process is provided with the coniguration server's details on startup. Adding shards from the shell stores this metadata about the shards in the coniguration database, and the mongos processes then would be querying this conig database for the shard's information.

On executing all the steps of the recipe, we have an operational shard as follows: The preceding image gives us an idea of how a typical shard would be in a production environment. The number of shards would not be two but many more. Additionally, each shard will be a replica set to ensure high availability. There would be three coniguration servers to ensure availability of the coniguration servers as well. Similarly, there will be any number of mongos processes created for a shard listening for client connections.

In some cases, it might even be started on a client application's server. There's more… What good is a shard unless we put it to action and see what happens from the shell on inserting and querying the data? In the next recipe, we will make use of the shard setup here, add some data, and see it in action. Getting ready Obviously, we need a sharded mongo server setup up and running. See the previous recipe, Starting a simple sharded environment of two shards, for more details on how to set up a simple shard.

The mongos process, as in the previous recipe, should be listening to port number We have got some names in a JavaScript ile called names. This ile needs to be downloaded from the Packt website and kept on the local ilesystem. The ile contains a variable called names and the value is an array with some JSON documents as the values, each one representing a person. The contents look as follows: Start the mongo shell and connect to the default port on localhost as follows. This will ensure that the names will be available in the current shell: Switch to the database that would be used to test the sharding; we call it shardDB: Enable sharding at the database level as follows: Shard a collection called person as follows: Add the test data to the sharded collection: Execute the following to get a query plan and the number of documents on each shard: We downloaded a JavaScript ile that deines an array of 20 people.

Each element of the array is a JSON object with the name and age attributes. We start the shell connecting to the mongos process loaded with this JavaScript ile.

We then switch to shardDB, which we use for the purpose of sharding. For a collection to be sharded, the database in which it will be created needs to be enabled for the sharding irst.The steps to import the data are given in the Creating test data recipe from Chapter 2, Command-line Operations and Indexes.

Note: Because the database proler can negatively impact performance, only enable proling for strategic intervals and as minimally as possible on production systems. The following is an example of the shell connected to a primary member of the replica set: For development purposes, when we start multiple server processes on the same host, we might want to keep the oplog size to a minimum value, quickly initiate the replica set, and use minimum disk space.

The following proling levels are available: Level Setting 1 On. It is used to provide the or -u username of the user to be logged in. The binaries can be downloaded from http: In the shell connected to the primary node, execute the following insert: It is only when the primary goes down that one of the secondary takes over and becomes a primary node.

However, Packt Publishing cannot guarantee the accuracy of this information.