MongoDB for the uninitiated

August 21, 2010

MongoDB for the uninitiated

This post is one of a series on MongoDB I’ll be writing over the next few months. All are written from the perspective of a developer new to Mongo, experimenting with its functionality whilst getting to grips with the NoSQL way of working.

SQL, NoSQL and some other acronyms

Up until a few months ago, the only database management systems I had worked with as a Web Developer were SQL based. MySQL, MSSQL and PostgreSQL are all based on the same language syntactically, and employ a relational approach. This means databases of tables, the individual rows of which may be related. So you have a strict definition of what can be in each row, defined by the columns you specify on the table. If you are representing a blog, you might have a table called posts, and a table called comments. Posts would contain columns for ID, title, author, and contents. Comments would probably have an ID, author, comment and post_id – this last being a reference to the posts table. In my simple example, to display the full blog page, you’d have to query two tables. Of course, if this were a real blog, its likely to be more complex, and will probably involve more tables. Which brings me to JOINs. A JOIN statement describes how two tables in a database are related. Once you start involving multiple JOINs, SQL queries can quickly get out of hand, and become less efficient.

With the advent of far cheaper physical storage, people realised they could store far more data than had been expected in the early days of relational database design. With storing more data however, comes a need to get that data back again. Databases must be quicker to read vast volumes of data, and more flexible in what they can store. People began to look away from relational databases, and coined the catchy term ‘NoSQL’ (what is it with our industry and coining acronyms/silly words?). NoSQL shouldn’t mean replacing SQL completely, but rather using it where appropriate, and using other techniques when those are more suitable. There are a number of projects under the NoSQL umbrella, including Google’s BigTable, CouchDB and MongoDB, each with a slightly different approach, but all moving away from the relational model – good news for those who never quite understood JOINs…

MongoDB is a document-oriented database, meaning that it focuses on flexibility of content put into the database, rather than enforcing rigid rules on data types and relations. It’s also open source, very scalable and quick. Developed by 10gen, Mongo is often described as a hybrid between a key-value store such as BigTable, and the relational model of SQL. This doesn’t mean it’s some horrible lovechild of SQL and BigTable, but rather it takes some good features from each approach, adds a few uniquely-Mongo touches, and wraps it all up neatly. I was introduced to Mongo at the MongoUK conference in London a few months back. Granted it was a sales pitch, but you could also see through this and recognise the benefits offered, which I’ll try to make clear (together with the downsides) in this series.

Quit your jibber-jabber

That’s quite enough spiel, now on to what Mongo can do, and how to use it. Mongo uses BSON to define documents, and has a fully fledged javascript shell so you can try it out. You can even try out some of its functionality live on the MongoDB website, though installing it is such little fuss that I would recommend doing that.

{ _id: ObjectId("4c2209fef3924d31102bd84b"), name: "Blue t-shirt", sizes: ['s', 'm', 'l', 'xl'], quantity: 4, price: 3.50, discounted: false, created: "Sun May 02 2010 19:07:43 GMT-0700 (Pacific Daylight Time)" }

That’s a document! Roughly equivalent to a row in SQL. Lots of things to note in just that small fragment. First up the _id. What you see there is a BSON object ID. Rather than the simple numerical IDs most of us used in relational DBs, the _id field in Mongo is ready for use in sharding (more on that later), and contains a date built in. The next property in my document is name, a simple string. Note also the quantity and price fields, which store numeric values, and discounted, which is a boolean. The sizes field is a little more interesting. this is similar to a list field, known in Mongo as an embedded collection. You can store any number of values in it, and you aren’t restricted to any particular type – you can mix strings, numbers, and even other embedded collections and embedded documents. Finally the created field is a date, created using the javascript Date class. The document created above is stored inside a collection (see tables in SQL, ish), but collections aren’t fixed to containing documents of any particular type. I can store the document above in the same collection as the one below.

{ _id: 4, title: "Silly document", { embeddedDocumentProperty: "bazinga!" } }

Two new tricks in that document. First I have overridden the default _id from an ObjectId to a plain old number. You can do this, but you lose the benefits of _ids. In a future post I’ll go into this in some more depth. Secondly, I have embedded an extra document in there, with a single property => value defined. You have just as much flexibility when inserting embedded documents as root level documents. These are sensible because they allow you to create enriched documents. The most commonly-used example is that of a blog. In SQL you might have 3 or 4 tables to define everything about a single post, but in Mongo you can embed comments, author details, tags and categories as embedded documents, or embedded collections of documents. This helps to cut down on the bottleneck we often experience when reading from a database.

Find them… find them and destroy them!

To put any of those objects into the database, we use:

db.myCollection.insert( object );

So how do you read out from MongoDB? Well, it depends how you are interfacing with the database. There are official language drivers for most of the mainstream programming languages – PHP, Java, Python, Ruby – plus community drivers for many more. For the sake of this article, I will stick to writing things that could be used in the javascript shell.

db.products.find( { _id: 4 } );

That little snippet should find the document with the _id 4, from the products collection of our current database. Pretty simple eh! You can also easily add more criteria, skip and limit settings, and choose a subset of fields to return.

db.products.update( { }, {$set: { title: "Badgers" } } );

That little snippet will look at every item in the products collection (note the empty braces where my criteria should have been), and perform a $set operation on the title property of each. $set is one of the update operators in Mongo, of which there are several. It can change the value of/create whichever property names it is given, using the data provided. By combining these operators and criteria you can achieve a lot more in a single Mongo query than is immediately obvious. Finally to delete things, use:

db.collectionName.remove( { criteria } );

What, no schema? Are you crazy fool?!

The two documents above, though very different, can be held in the same collection. This is because Mongo is schema-less, so any document can be stored alongside any other. This speeds up application development considerably, because rather than setting out a concrete schema before you start development, you can improve it as you go along. This also means its easier to add new features later on in development. However, it isn’t all rosy. If you are working in a team rather than as an individual, you may find it tricky without at least a loose schema for your data. Many choose to agree a schema or employ an ODM layer on top (more on those later).

Hell yeah, Hannibal

MongoDB is used by a lot of high profile names, both in speculative development and live production deployments. Among the bigger names are Foursquare, Bit.ly and SourceForge. These companies and more have found the speed and simplicity of Mongo of great benefit to their websites, particularly as it scales so easily as visitor numbers increase.

Maybe you can call the A-Team

If you like what you’ve read, why not head over to the MongoDB website and learn more. There is also a Google group where you can search the questions people have asked before, or post your own if you need a specific answer. Coming up next, how to use Mongo in PHP, including a light review on the various libraries you could use in your application. I’ll also be doing a post on common questions, answers and problems in Mongo.

P.S. Sorry for the various A-Team/Mr. T references. I am rather excited about the film, and I don’t care who judges me for it.

Subscribe for more

2 responses to “MongoDB for the uninitiated”

Rsramirez49 says:

August 15, 2012 at 9:31 pm

Nice post. Very informative and funny. Did you continue? How can I read more?
Daniel G Wood says:

August 18, 2012 at 7:02 pm

Sorry, I haven’t yet! Hopefully one day will return to regular blogging.