Tuesday, March 8, 2016

How to index arrays in Couchbase 4.5

Hi,

So you've installed your new shiny Couchbase 4.5 (currently in dev preview stage) and found out that you now can index your favorite array in you JSON document.

But how can you do it?
I am going to demonstrate in the easiest and the quickest way how to achieve that extra crank of speed for you array entries.

I'm going to insert some documents in the following format to the default bucket
{ "name": "Some Name", "cities_traveled":["city1", "city2" ... , "cityN"] }
Next, we going to need to create a simple index for that array of cities,
and Primary index for convenience.

Primary:
create primary index `idx_prime` on `default` using GSI;
Array Index:
create index `idx_traveled` on `default` (distinct array city for city in cities_traveled end) using GSI;
 and try to select our data out using explain to verify we are using our index:
explain select * from `default` d
where any city in d.cities_traveled satisfies city="London" end;
Let's break it up a little:
  • We create our index named idx_traveled on the bucket default, and specifying that we want to iterate on every property inside an array in the property "cities_traveled" distinctly.
  • The first part of the select it's simple, select everything from our bucket, naming it d
  • In the where clause we have stated that we want at least on of our elements in the cities_traveled property to have London string in it.
  • We need to be careful, the "any city" must much to the city definition string of the index - or it won't work.
Good Query

Query
select * from `default` d
where any city in d.cities_traveled satisfies city = "London" end
The "city" after any and in the expression are exactly the same as in the index definition.

Explain

Notice the index that being used is our "idx_traveled" and the operator is IndexScan
 [  
  {  
   "#operator": "Sequence",  
   "~children": [  
    {  
     "#operator": "UnionScan",  
     "scans": [  
      {  
       "#operator": "IndexScan",  
       "index": "idx_traveled",  
       "keyspace": "default",  
       "namespace": "default",  
       "spans": [  
        {  
         "Range": {  
          "High": [  
           "\"London\""  
          ],  
          "Inclusion": 3,  
          "Low": [  
           "\"London\""  
          ]  
         }  
        }  
       ],  
       "using": "gsi"  
      }  
     ]  
    },  
    {  
     "#operator": "Parallel",  
     "~child": {  
      "#operator": "Sequence",  
      "~children": [  
       {  
        "#operator": "Fetch",  
        "as": "d",  
        "keyspace": "default",  
        "namespace": "default"  
       },  
       {  
        "#operator": "Filter",  
        "condition": "any `city` in (`d`.`cities_traveled`) satisfies (`city` = \"London\") end"  
       },  
       {  
        "#operator": "InitialProject",  
        "result_terms": [  
         {  
          "expr": "self",  
          "star": true  
         }  
        ]  
       },  
       {  
        "#operator": "FinalProject"  
       }  
      ]  
     }  
    }  
   ]  
  }  
 ]  

Bad Query

Query
select * from `default` d
where any someCity in d.cities_traveled satisfies someCity = "London" end
The "someCity" after any and in the expression are not the same as in the index definition.

Explain

Notice the index used is idx_prime and the operator is PrimaryScan
 [  
  {  
   "#operator": "Sequence",  
   "~children": [  
    {  
     "#operator": "PrimaryScan",  
     "index": "idx_prime",  
     "keyspace": "default",  
     "namespace": "default",  
     "using": "gsi"  
    },  
    {  
     "#operator": "Parallel",  
     "~child": {  
      "#operator": "Sequence",  
      "~children": [  
       {  
        "#operator": "Fetch",  
        "as": "d",  
        "keyspace": "default",  
        "namespace": "default"  
       },  
       {  
        "#operator": "Filter",  
        "condition": "any `a` in (`d`.`cities_traveled`) satisfies (`a` = \"London\") end"  
       },  
       {  
        "#operator": "InitialProject",  
        "result_terms": [  
         {  
          "expr": "self",  
          "star": true  
         }  
        ]  
       },  
       {  
        "#operator": "FinalProject"  
       }  
      ]  
     }  
    }  
   ]  
  }  
 ]  

Query result:
 <pre style="font-family:arial;font-size:12px;border:1px dashed #CCCCCC;width:99%;height:auto;overflow:auto;background:#f0f0f0;;background-image:URL(http://2.bp.blogspot.com/_z5ltvMQPaa8/SjJXr_U2YBI/AAAAAAAAAAM/46OqEP32CJ8/s320/codebg.gif);padding:0px;color:#000000;text-align:left;line-height:20px;"><code style="color:#000000;word-wrap:normal;"> &lt;pre style="font-family:arial;font-size:12px;border:1px dashed #CCCCCC;width:99%;height:auto;overflow:auto;background:#f0f0f0;;background-image:URL(http://2.bp.blogspot.com/_z5ltvMQPaa8/SjJXr_U2YBI/AAAAAAAAAAM/46OqEP32CJ8/s320/codebg.gif);padding:0px;color:#000000;text-align:left;line-height:20px;"&gt;&lt;code style="color:#000000;word-wrap:normal;"&gt; [    
  {    
   "d": {    
   "cities_traveled": [    
   "Tel-Aviv",    
   "London",    
   "New-York",    
   "San Francisco",    
   "Los-Angeles"    
   ],    
   "name": "Roi"    
   }    
  },    
  {    
   "d": {    
   "cities_traveled": [    
   "Kilmarnock",    
   "London",    
   "New-York",    
   "San Francisco",    
   "Los-Angeles"    
   ],    
   "name": "Jonny Walker"    
   }    
  }    
  ]    
  &lt;/code&gt;&lt;/pre&gt;   
 </code></pre>  

So that was a brief how-to of array indexes in Couchbase 4.5!
for more information and memory optimized indexes - check the documentation.

Thanks all!
Roi.