Close

How to use a SQL LIKE statement in MongoDB

Posted by: AJ Welch

For this tutorial we’ll be using the official dummy dataset, which contains numerous restaurant documents from around the New York area.

Here is an example of the basic document structure in this collection, using the .findOne() method:

> db.restaurants.findOne()
{
        "_id" : ObjectId("56c651e7d84ccfde319961af"),
        "address" : {
                "building" : "469",
                "coord" : [
                        -73.961704,
                        40.662942
                ],
                "street" : "Flatbush Avenue",
                "zipcode" : "11225"
        },
        "borough" : "Brooklyn",
        "cuisine" : "Hamburgers",
        "grades" : [
                {
                        "date" : ISODate("2014-12-30T00:00:00Z"),
                        "grade" : "A",
                        "score" : 8
                },
                {
                        "date" : ISODate("2014-07-01T00:00:00Z"),
                        "grade" : "B",
                        "score" : 23
                },
                {
                        "date" : ISODate("2013-04-30T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                },
                {
                        "date" : ISODate("2012-05-08T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                }
        ],
        "name" : "Wendy'S",
        "restaurant_id" : "30112340"
}

The power of Find


The most important piece of the puzzle when searching within a MongoDB collection is the simple yet flexible db.collection.find() method.

With .find(), you can easily query a collection of documents, by passing a few simple parameters, and return a cursor. A cursor is simply a result set and can be iterated through to manipulate or otherwise make use of the documents being pointed to by the cursor.

As a simple example of the .find() method in action, we’ll try to find all the restaurants in our collection that server Hamburgers as their cuisine:

>db.restaurants.find( { cuisine: "Hamburgers" } )
{ "_id" : ObjectId("56c651e7d84ccfde319961af"), "address" : { "building" : "469", "coord" : [ -73.961704, 40.662942 ], "street" : "Flatbush Avenue", "zipcode" : "11225" }, "borough" : "Brooklyn", "cuisine" : "Hamburgers", "grades" : [ { "date" : ISODate("2014-12-30T00:00:00Z"), "grade" : "A", "score" : 8 }, { "date" : ISODate("2014-07-01T00:00:00Z"), "grade" : "B", "score" : 23 }, { "date" : ISODate("2013-04-30T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2012-05-08T00:00:00Z"), "grade" : "A", "score" : 12 } ], "name" : "Wendy'S", "restaurant_id" : "30112340" }
...

The result set is quite large, so a better measurement for our test examples would be to chain the .count() method onto .find() to simply see how many documents matched our query:

> db.restaurants.find( { cuisine: "Hamburgers" } ).count()
433

That’s a lot of burgers!

Searching for word similarities using Regex


Now that we are using .find() to query our collection, we can actually modify our syntax ever so slightly and begin searching for matches based on a word or phrase that may be a partial match within a given field, similar to the LIKE operator for SQL engines.

The trick is to utilize regular expressions (or regex for short), which is basically a text string that defines a search pattern. There are a number of regex engines that are written in slightly different syntax, but the fundamentals are all basically the same, and in this case, MongoDB uses the Perl Regex (PCRE) engine.

At the most basic level, a regex expression is a string (series of characters) enclosed on both sides by a single slash (/).

For example, if we want to use regex to perform the same query as above and find out how many restaurants serve Hamburgers, we can replace our string "Hamburgers" with /Hamburgers/ instead:

> db.restaurants.find( { cuisine: /Hamburgers/ } ).count()
433

Keen observers may recognize that we’ve effectively changed nothing about the actual query we’re performing – we’re still simply looking up all documents where the cuisine field is equal to the string "Hamburgers".

That said, by simply using regex instead of a normal “quoted string”, we can start to look for partial word/phrase matches instead.

For example, let’s look at the borough field to get a better idea of how this works. First we’ll notice that there are six boroughs in total within our collection:

> db.restaurants.distinct('borough')
[
        "Brooklyn",
        "Bronx",
        "Manhattan",
        "Queens",
        "Staten Island",
        "Missing"
]

Now let’s use regex to find out how many restaurants are in the Bronx borough:

> db.restaurants.find( { borough: /Bronx/ } ).count()
2338

But imagine we want to find the number of restaurants where borough starts with the first three characters "Bro". We’d modify our regex very slightly, like so:

> db.restaurants.find( { borough: /^Bro/ } ).count()
8424

We’re seeing over 6000 additional documents in this result set, which makes sense because not only are we getting results where the borough is "Bronx", but also everything for "Brooklyn" as well.

The caret character (^) specifies the location in our string which should be the beginning, so if we had a document where those three letters were in the middle of the field, we wouldn’t get a match.

As another quick example, let’s search anywhere in the field for the characters "at", which should give us results for both "Manhattan" and "Staten Island":

> db.restaurants.find( { borough: /Manhattan/ } ).count()
10259

> db.restaurants.find( { borough: /Staten Island/ } ).count()
969

> db.restaurants.find( { borough: /AT/i } ).count()
11228

Sure enough, our final query has combined the two result sets into one.

You may notice that even though our characters "AT" are uppercase in our regex string, but they are lowercase in the actual document records, we still returned results. This is because we also added the special i flag following our regex closing slash (/). This informs the regex engine that we want to the search to be case insensitive, matching regardless of upper or lowercase.