Monday, November 9, 2015

NoSQL Document DB's Joins rundown Couchbase 4.0 vs MongoDB 3.2

Hi all,

There is a lot of heat in the NoSQL (Not Only SQL) realms lately.
Especially if we take the Document databases, which are based on JSON to store data.

Little less than a week ago MongoDB came out with some significant release called in the misleading name "3.2" they've added some quite interesting features, one interesting feature is joins. That version, by no mean is a minor version.
The latter were also introduced by Couchbase in the 4.0 major release which included many new features and the most prominent is probably the N1QL language - which is basically a SQL for JSON, which released in early October 2015.

It's as SQL as it gets for NoSQL databases, it more or less a super-set and a subset of SQL as it has some features that does not apply to relational DB - such as NEST & UNNEST of documents,
As of NEST, think of it, as promoting an array inside a documents to it's own "SQL table", on which we can perform queries.

The N1QL language is just another mean of accessing and querying data from Couchbase, in addition to Key-Value system and the View mechanism (the Map-Reduce).

One of the most talked about feature is of course join.
Joining 2 or more documents to one reduces the amount of traffic on the network causing faster response times by the application consuming the data.

So while Couchbase were following the rule "use what you already know" with the N1QL,
the Mongo team suggested another approach to the join,
the went on an introduced another keyword called $lookup,
While that works perfectly - it's not neat, and you will catch some learning curve on the way to perfection, while on Couchbase you just do - SQL joins.

Both DB's join feature is available as community & enterprise editions feature.

So let's join!

In Couchbase,
Let’s use the "travel-sample" that is bundled with it,
I have a route document which looks like that:
  {
    "airline": "AF",
    "airlineid": "airline_137",
    "destinationairport": "CDG",
    "distance": 573.0051071016999,
    "equipment": "E90 AR8 E70",
    "id": 10007,
    "sourceairport": "TRN",
    "stops": 0,
    "type": "route"
  }

I would like to check, which airline corresponds to the route.
which portrait with the following document:
{
      "callsign": "AIRFRANS",
      "country": "France",
      "iata": "AF",
      "icao": "AFR",
      "id": 137,
      "name": "Air France",
      "type": "airline"
    }

Up until now, the way I could "join" those two documents, was by code in my application.
take the first document, figure out the airlineid field, then go back and get the airline document by id.
So while it's still possible to do it, N1QL introduced the concept of join. 

and that is the Query:

SELECT airline.*, route. Airline, route.airlineid, route.destinationairport, route.distance, route.equipment, route.id, route.sourceairport, route.stops, route.type
FROM `travel-sample` route
JOIN `travel-sample` airline ON KEYS route.airlineid
WHERE route.id = 10007

notes about that query above:
1)  I would recommend of not using "star" in your application, but for testing purposes only
2) notice the back tick in the bucket name, this is not an apostrophe.

of which the result is:
a merge of those two documents

  {
    "airline": "AF",
    "airlineid": "airline_137",
    "callsign": "AIRFRANS",
    "country": "France",
    "destinationairport": "CDG",
    "distance": 573.0051071016999,
    "equipment": "E90 AR8 E70",
    "iata": "AF",
    "icao": "AFR",
    "id": 10007,
    "name": "Air France",
    "sourceairport": "TRN",
    "stops": 0,
    "type": "route"
  }

or if we want all of the documents just remove the route.id.
Pure plain SQL.

we can do it programmatically, via the cbc-linq command line, or through the Query Workbench (CBQ) which is currently under developer preview (expected to be released on next Couchbase release).

In MongoDB
We can only do join programmatically with the aggregation pipeline.
Note, that in Couchbase we are joining on keys, and in Mongo on fields.

Assume the following changes:
1) We have two collections, one for routes, and one for the airlines.
2) The field "id" in the airline document is "airline_137" and not just 137, 

{
      "callsign": "AIRFRANS",
      "country": "France",
      "iata": "AF",
      "icao": "AFR",
      "id": "airline_137",
      "name": "Air France",
      "type": "airline"
    }

So the lookup will look like that: 

db.routes.aggregate([
                                    { $match: 
                                                id: 10007
                                    }},
                                    { $lookup: {
                                                from: "airlines"
                                                localField: "airlineid"
                                                foreignField: "id"
                                                as:  "combined_airline_doc"
                                    }}
                                    ]);

The table below compares the two databases join wise:
Couchbase 4.0
Mongo 3.2
Complexity
Simple (SQL joins)
Complex (new language)
Syntax
Similar to SQL joins
New $lookup keyword
Join type
Left Outer joins\inner
Left outer joins\inner
Learning curve
Flat (SQL)
Steep (new query language)
Functionality
Good
Good
Query path
Query Service, Split across the cluster, or with MDS do not load on data nodes
Primary Shard
the pipeline commands distributed workload with scatter gather (the gather on one shard)
Join on
Within or with other buckets
collections
Version
Community
Community


Limitations with MongoDB joins:
  • Only in Aggregation pipeline, programmatically.
  • Right collection for $lookup cannot be sharded (only primary shard contains the unsharded collection) – implementation limitation
  • Indexes are used only in the first state of the pipeline – before manipulation data
  • No right outer joins
Limitation with Couchbase joins:
  • No Right outer joins
  • Joins are only on the keys (as in key-value or object-id)

So this was a light roundup on the new join features on the 2 biggest document databases here.
As for the winner here in that round, it seems like Couchbase wins the trophy here, in terms of usability, testability, tools,  ease of use and distribution.


Hope you've enjoyed.
Roi.


2 comments:

  1. Also in future - there will be even more good news. The limitation on Couchbase JOIN that is listed there may be further relaxed - tbd, we are looking into that. Having said that even 4.0 offers plenty. So, hope more developers (both community and developers in enterprise) are already excited about the 4.0 and “join” our efforts.

    ReplyDelete
  2. Thank you for the info. It sounds pretty user friendly. I guess I’ll pick one up for fun. thank u.

    MongoDB Training Centers in Chenai

    ReplyDelete