Programming, Architecture, NoSQL.: 2016

Monday, June 20, 2016

Advanced query and indexing an array inside an array in Couchbase(or... arrays part 2)

Hi All,

TL;DR- Using indexes is a must, especially when you index an array!

This is a following post to that post about array indexing in Couchbase.

In the previous post I only showed the simple case of an array without any property, or an array without any nested array,

Today we will take it to the next level, Array inside an array (nested array) and properties inside the JSON array.

So take in instance this kind of document:

{
  "name": "Roi",
  "lastName": "Katz",
  "note2": "blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah blah",
  "city": "Tel Aviv",
  "note1": "blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah blah",
  "age": 50,
  "visited": [
    {
      "country": "UK",
      "cities": [
        "London",
        "Manchester",
        "Coventry"
      ]
    },
    {
      "country": "Israel",
      "cities": [
        "Kfar-Saba",
        "Tel-Aviv",
        "Jerusalem"
      ]
    }
  ]
}

We want to query the countries inside the "visited" property first, and later a city inside the inner JSON array.

So how do we query the Countries inside the array?

select * from people p
where visited is not missing
and any visit in p.visited satisfies visit.country="Israel" end;

Where p is actually the people bucket alias, and visit is a sort of a function on top of the people.
But - while it works we are doing it without an index (except for PrimaryScan), and it going to take a long time to get the results even on a fairly small dataset.
With a dataset of over 100K documents, but only 2 are really relevant it took the query about 2 seconds to run - as it was running on the primary scan.

However, When I do introduce the index, the timings are cut down to 4ms!

The index I used here to speed things up was "countries_indx"

CREATE INDEX countries_indx ON

people(distinct array visit.country for visit in visited end)

Please note again, as it was in my previous post - that "visit" which comes after the for keyword must be the exact same phrase as after the any keyword in the select query.
If they are not - you won't be able to use the index.

Here is a taste of the proper explain plan - from which you can understand the usage of the correct index (countries_indx with IndexScan).

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "DistinctScan",
          "scan": {
            "#operator": "IndexScan",
            "index": "countries_indx",
            "index_id": "f0dd08732dd1b9a2",
            "keyspace": "people",
            "namespace": "default",
            "spans": [
              {
                "Range": {
                  "High": [
                    "\"Israel\""
                  ],
                  "Inclusion": 3,
                  "Low": [
                    "\"Israel\""
                  ]
                }
              }
            ],
            "using": "gsi"
          }
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "as": "p",
                "keyspace": "people",
                "namespace": "default"
              },
              {
                "#operator": "Filter",
                "condition": "(((`p`.`visited`) is not missing) and any `visit` in (`p`.`visited`) satisfies ((`visit`.`country`) = \"Israel\") end)"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "select * from people p\nwhere visited is not missing\nand any visit in p.visited satisfies visit.country=\"Israel\" end;"
  }
]

Now let's continue to the more interesting query,
We want to query for a documents which contains the city of London in their visited property,
how would we do it? by nesting array queries!

select * from people p
where visited is not missing
and any visit in p.visited satisfies 
    any city in visit.cities satisfies city = "London" end

end;

That query needs a bit more of an explanation.
First we should add the visited is not missing expression in order to filter out every document which doesn't have that property.

Second, we would like to search an array of arrays basically, because of it we would like to do something similar a a nested for loop.
first we looking in the visited property, and afterwards for each part of the outer array we are looking for a property cities and in the cities array for a city named "London".
The visit variable in the outer loop is the same one as in the inner loop for visit.cities.

But as before we still don't have an index for that, which yields a very slow performance roughly 1.8s -2s to execute that query.

When we add the index we are going back to the golden time of ±4ms for the exact same query.
The index is

CREATE INDEX `cities_indx` ON 
people( distinct array 
         ( distinct array city for city in visit.cities  end)
        for `visit` in `visited` end)

Just make sure that city in the index corresponds to the city in the query, and so on for the visit.

if you have a JSON property in each of the cities, you just add in the query "city.yourProperty".

That's all!
Hope you've enjoyed.

Tuesday, May 31, 2016

Shrinking your Couchbase memory footprint with compression

Hi,

TL;DR: Compression will decrease your memory footprint and increase your memory residency.

An important and integral part of Couchbase server many people forget is that Couchbase is not only a fast general use document database with advanced querying of N1QL, it's also is Key/Value store altogether.

If it is the kind of operations you might need, and you can "pay" in terms of be able to get the data by only the key and not be able to index things than you might consider of compressing your JSON or object and save it into binary document in the database instead of just a JSON.

For instance, each bucket contains exactly 1 million documents which looks similar to this one:

Underscores are there in order to guaranty the size of 278 bytes

Each document contains 56 bytes of metadata, about 15 bytes of key size and 278 bytes of value in JSON format. (349bytes per document) - about 349,000,000 bytes of RAM which are 332.8Mb.

We can check the amount of bytes in the memory, of active vbuckets by using the following cbstats command:

./cbstats localhost:11210 -b compressed all | grep active_itm_memory 
vb_active_itm_memory:                        349000000

Compressed document takes about 107-108bytes, so 178,000,000 appx. or 169Mb.

in cbstats:

./cbstats localhost:11210 -b compressed all | grep active_itm_memory 
vb_active_itm_memory:                        178139161

Figures in the following screenshot are slightly different as there is some extra overhead or the Couchbase engine.

The figures here are the actual volume that the bucket takes in the memory, not only data.

We can see here that the data compressed almost by the factor of 2! (349 vs 178).
meaning - reducing the amount of machines/memory needed by almost 50%.
And if you are not on 100% residence ratio - that method will surely increase it.

So wait! If I need half of the machines (on that use case), where is the catch?

Three things you must note here:
1) As described before, you cannot index compressed documents.
2) Creating the document you want to insert takes more time.
3) Reading the document takes longer as you need to decompress.

Creating the documents time is varying in a factor of 6.
I've used the best compression setting for worst case scenario.
From the tests I've run, the compression factor in gzip lib for Java doesn't really change much for the data in terms of time and footprint.
That machine is my laptop so not a server grade machine.

Uncompressed
Generating 1M documents took: 5002ms

Compressed
Generating 1M documents took: 31731ms

So how do you insert compressed documents?

Create the stream
Wrap it with BinaryDocument
Insert it to Couchbase (observable)

Here is a snip of creating a compressed binary document and adding it to a collection:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStream gzipOut = new GZIPOutputStream(baos){{def.setLevel(Deflater.BEST_SPEED);}};

ObjectOutputStream objectOut = new ObjectOutputStream(gzipOut);
objectOut.writeObject(doc.content().toString());
objectOut.close();

byte[] bytes = baos.toByteArray();
ByteBuf toWrite = Unpooled.copiedBuffer(bytes);
BinaryDocument binDoc = BinaryDocument.create(key, toWrite);
docsToInsertZipped.add(binDoc);


Observable.from(docsToInsertZipped).flatMap(docBinary->compressedBucket.async().upsert(docBinary)).toBlocking().subscribe();

In order to read the data:

Read (get) the document
Uncompress the content
Convert the byte buffer to string

BinaryDocument binaryDocument = compressedBucket.get("person::0000001",BinaryDocument.clas
byte[] data = new byte[binaryDocument.content().readableBytes()];

binaryDocument.content().readBytes(data);
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(data));
InputStreamReader reader = new InputStreamReader(gis);
BufferedReader buffered = new BufferedReader(reader);

String read;
while ((read = buffered.readLine()) != null) {
    System.out.println(read);
}

A much more complicated process than just get the document,

JsonDocument document = uncompressedBucket.get("person::0000001");
System.out.println(document.content().toString());

but it is faster, as in that code you don't need to serialize the bytes into JsonDocument.
Take into account that you might have to serialize it anyway, or put it behind wrappers.

That's it.
Now you have another tool you might use in your toolbox.

Tuesday, March 8, 2016

How to index arrays in Couchbase 4.5

Hi,

So you've installed your new shiny Couchbase 4.5 (currently in dev preview stage) and found out that you now can index your favorite array in you JSON document.

But how can you do it?
I am going to demonstrate in the easiest and the quickest way how to achieve that extra crank of speed for you array entries.

I'm going to insert some documents in the following format to the default bucket

{ "name": "Some Name", "cities_traveled":["city1", "city2" ... , "cityN"] }

Next, we going to need to create a simple index for that array of cities,
and Primary index for convenience.

Primary:

create primary index `idx_prime` on `default` using GSI;

Array Index:

create index `idx_traveled` on `default` (distinct array city for city in cities_traveled end) using GSI;

and try to select our data out using explain to verify we are using our index:

explain select * from `default` d

where any city in d.cities_traveled satisfies city="London" end;

Let's break it up a little:

We create our index named idx_traveled on the bucket default, and specifying that we want to iterate on every property inside an array in the property "cities_traveled" distinctly.
The first part of the select it's simple, select everything from our bucket, naming it d
In the where clause we have stated that we want at least on of our elements in the cities_traveled property to have London string in it.
We need to be careful, the "any city" must much to the city definition string of the index - or it won't work.

Good Query

Query

select * from `default` d

where any city in d.cities_traveled satisfies city = "London" end

The "city" after any and in the expression are exactly the same as in the index definition.

Explain

Notice the index that being used is our "idx_traveled" and the operator is IndexScan

 [  
  {  
   "#operator": "Sequence",  
   "~children": [  
    {  
     "#operator": "UnionScan",  
     "scans": [  
      {  
       "#operator": "IndexScan",  
       "index": "idx_traveled",  
       "keyspace": "default",  
       "namespace": "default",  
       "spans": [  
        {  
         "Range": {  
          "High": [  
           "\"London\""  
          ],  
          "Inclusion": 3,  
          "Low": [  
           "\"London\""  
          ]  
         }  
        }  
       ],  
       "using": "gsi"  
      }  
     ]  
    },  
    {  
     "#operator": "Parallel",  
     "~child": {  
      "#operator": "Sequence",  
      "~children": [  
       {  
        "#operator": "Fetch",  
        "as": "d",  
        "keyspace": "default",  
        "namespace": "default"  
       },  
       {  
        "#operator": "Filter",  
        "condition": "any `city` in (`d`.`cities_traveled`) satisfies (`city` = \"London\") end"  
       },  
       {  
        "#operator": "InitialProject",  
        "result_terms": [  
         {  
          "expr": "self",  
          "star": true  
         }  
        ]  
       },  
       {  
        "#operator": "FinalProject"  
       }  
      ]  
     }  
    }  
   ]  
  }  
 ]

Bad Query

Query

select * from `default` d

where any someCity in d.cities_traveled satisfies someCity = "London" end

The "someCity" after any and in the expression are not the same as in the index definition.

Explain

Notice the index used is idx_prime and the operator is PrimaryScan

 [  
  {  
   "#operator": "Sequence",  
   "~children": [  
    {  
     "#operator": "PrimaryScan",  
     "index": "idx_prime",  
     "keyspace": "default",  
     "namespace": "default",  
     "using": "gsi"  
    },  
    {  
     "#operator": "Parallel",  
     "~child": {  
      "#operator": "Sequence",  
      "~children": [  
       {  
        "#operator": "Fetch",  
        "as": "d",  
        "keyspace": "default",  
        "namespace": "default"  
       },  
       {  
        "#operator": "Filter",  
        "condition": "any `a` in (`d`.`cities_traveled`) satisfies (`a` = \"London\") end"  
       },  
       {  
        "#operator": "InitialProject",  
        "result_terms": [  
         {  
          "expr": "self",  
          "star": true  
         }  
        ]  
       },  
       {  
        "#operator": "FinalProject"  
       }  
      ]  
     }  
    }  
   ]  
  }  
 ]

Query result:

 <pre style="font-family:arial;font-size:12px;border:1px dashed #CCCCCC;width:99%;height:auto;overflow:auto;background:#f0f0f0;;background-image:URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9jutZy5ae9XOcdQRH4FL-qzq4CK-R6OXA5JylhxXx02qa_BB5c1yR1Bkyou_XDzUiQJWUvpJc4HKO0yMPUdaO8CYo3PfSZZ1kVpMpIFht4zqrvwPbcEfiNr1qZAsowiFo1j1hQ8noG92w/s320/codebg.gif);padding:0px;color:#000000;text-align:left;line-height:20px;"><code style="color:#000000;word-wrap:normal;"> &lt;pre style="font-family:arial;font-size:12px;border:1px dashed #CCCCCC;width:99%;height:auto;overflow:auto;background:#f0f0f0;;background-image:URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9jutZy5ae9XOcdQRH4FL-qzq4CK-R6OXA5JylhxXx02qa_BB5c1yR1Bkyou_XDzUiQJWUvpJc4HKO0yMPUdaO8CYo3PfSZZ1kVpMpIFht4zqrvwPbcEfiNr1qZAsowiFo1j1hQ8noG92w/s320/codebg.gif);padding:0px;color:#000000;text-align:left;line-height:20px;"&gt;&lt;code style="color:#000000;word-wrap:normal;"&gt; [    
  {    
   "d": {    
   "cities_traveled": [    
   "Tel-Aviv",    
   "London",    
   "New-York",    
   "San Francisco",    
   "Los-Angeles"    
   ],    
   "name": "Roi"    
   }    
  },    
  {    
   "d": {    
   "cities_traveled": [    
   "Kilmarnock",    
   "London",    
   "New-York",    
   "San Francisco",    
   "Los-Angeles"    
   ],    
   "name": "Jonny Walker"    
   }    
  }    
  ]    
  &lt;/code&gt;&lt;/pre&gt;   
 </code></pre>

So that was a brief how-to of array indexes in Couchbase 4.5!
for more information and memory optimized indexes - check the documentation.

Thanks all!
Roi.

Thursday, February 4, 2016

Getting started with Kafka and Couchbase as an endpoint

Hi all,
Couchbase is great as a source for Apache Kafka using the DCP connector.
However it is also great as an endpoint for digesting data, as it is fast, memory first and reliable storage.

In this blog post I will show you how to build simple Java application for a producer and a consumer which save the published messages from Kafka into Couchbase.

I assume here, that you already have a Kafka cluster (even if it's single node cluster).
If not, try to follow that installation guide.

This blog environment have 4 parts:
1. Kafka producer
2. Apache Kafka queue
3. Kafka consumer
4. Couchbase server

Producer

We need the producer in order to submit messages to our queue.
In the queue, those messages are being digested and every application which subscribed to the topic - can read those messages.
The source of our messages will be a dummy JSON file I've created using Mockaroo, which we will split and sent to the queue.

our sample JSON data looks something similar to:

 {   
   "id":1,  
   "gender":"Female",  
   "first_name":"Jane",  
   "last_name":"Holmes",  
   "email":"jholmes0@myspace.com",  
   "ip_address":"230.49.112.20",  
   "city":"Houston"  
 }

The producer code:

 import com.fasterxml.jackson.databind.JsonNode;  
 import com.fasterxml.jackson.databind.ObjectMapper;  
 import com.fasterxml.jackson.databind.node.ArrayNode;  
 import org.apache.kafka.clients.producer.KafkaProducer;  
 import org.apache.kafka.clients.producer.ProducerConfig;  
 import org.apache.kafka.clients.producer.ProducerRecord;  
 import org.apache.kafka.clients.producer.RecordMetadata;  
   
 import java.io.File;  
 import java.nio.charset.Charset;  
 import java.nio.file.Files;  
 import java.nio.file.Paths;  
 import java.util.ArrayList;  
 import java.util.HashMap;  
 import java.util.List;  
 import java.util.Map;  
 import java.util.concurrent.Future;  
   
   
 public class KafkaSimpleProducer {  
   public static void main(String[] args) throws Exception {  
     Map<String, Object> config = new HashMap<>();  
     config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");  
     config.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");  
     config.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");  
     KafkaProducer<String, String> producer = new KafkaProducer<String, String>(config);  
   
     File input = new File("sampleJsonData.json");  
     byte[] encoded = Files.readAllBytes(Paths.get(input.getPath()  ));  
   
     String jsons = new String(encoded, Charset.defaultCharset());  
     System.out.println("Splitting file to jsons....");  
   
     List<String> splittedJsons = split(jsons);  

     System.out.println("Converting to JsonDocuments....");  
   
     int docCount = splittedJsons.size();  
   
     System.out.println("Number of documents is: " + docCount );  
   
     System.out.println("Starting sending msg to kafka....");  
     int count = 0;  
     for ( String doc : splittedJsons) {  
       System.out.println("sending msg...." + count);  
       ProducerRecord<String,String> record = new ProducerRecord<>( "couchbaseTopic", doc );  
       Future<RecordMetadata> meta = producer.send(record);  
       System.out.println("msg sent...." + count);  
   
       count++;  
     }  
   
     System.out.println("Total of " + count + " messages sent");  
   
     producer.close();  
   }  

   public static List<String> split(String jsonArray) throws Exception {  
     List<String> splittedJsonElements = new ArrayList<String>();  
     ObjectMapper jsonMapper = new ObjectMapper();  
     JsonNode jsonNode = jsonMapper.readTree(jsonArray);  
   
     if (jsonNode.isArray()) {  
       ArrayNode arrayNode = (ArrayNode) jsonNode;  
       for (int i = 0; i < arrayNode.size(); i++) {  
         JsonNode individualElement = arrayNode.get(i);  
         splittedJsonElements.add(individualElement.toString());  
       }  
     }  
     return splittedJsonElements;  
   }  
 }

Output from the Producer App

Consumer

This is a simple one, very straight forward, just get the messages from the queue, and use the Couchbase Java SDK in order to insert documents into Couchbase. For simplicity, I'll be using the sync java SDK, but using the async is totally possible and even recommended.

 import com.couchbase.client.java.Bucket;  
 import com.couchbase.client.java.Cluster;  
 import com.couchbase.client.java.CouchbaseCluster;  
 import com.couchbase.client.java.document.JsonDocument;  
 import com.couchbase.client.java.document.json.JsonObject;  
 import kafka.consumer.Consumer;  
 import kafka.consumer.ConsumerConfig;  
 import kafka.consumer.KafkaStream;  
 import kafka.javaapi.consumer.ConsumerConnector;  
 import kafka.message.MessageAndMetadata;  
   
 import java.util.*;  
   
 public class KafkaSimpleConsumer {  
   public static void main(String[] args) {  
   
     Properties config = new Properties();  
     config.put("zookeeper.connect", "localhost:2181");  
     config.put("zookeeper.connectiontimeout.ms", "10000");  
     config.put("group.id", "default");  
   
     ConsumerConfig consumerConfig = new kafka.consumer.ConsumerConfig(config);  
   
     ConsumerConnector consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);  
   
     Map<String, Integer> topicCountMap = new HashMap<>();  
     topicCountMap.put("couchbaseTopic", 1);  
   
     Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector.createMessageStreams(topicCountMap);  
   
     List<KafkaStream<byte[], byte[]>> streams = consumerMap.get("couchbaseTopic");  
   
     List<String> nodes = new ArrayList<>();  
     nodes.add("localhost");  
   
     Cluster cluster = CouchbaseCluster.create(nodes);  
     final Bucket bucket = cluster.openBucket("kafkaExample");  
   
     try {  
       for (final KafkaStream<byte[], byte[]> stream : streams) {  
         for (MessageAndMetadata<byte[], byte[]> msgAndMetaData : stream) {  
           String msg = convertPayloadToString(msgAndMetaData.message());  
           System.out.println(msgAndMetaData.topic() + ": " + msg);  
   
           try {  
             JsonObject doc = JsonObject.fromJson(msg);  
             String id = UUID.randomUUID().toString();  
             bucket.upsert(JsonDocument.create(id, doc));  
           } catch (Exception ex) {  
             System.out.println("Not a json object: " + ex.getMessage());  
           }  
         }  
       }  
     } catch (Exception ex) {  
       System.out.println("EXCEPTION!!!!" + ex.getMessage());  
       cluster.disconnect();  
     }  
   
     cluster.disconnect();  
   }  
   
   private static String convertPayloadToString(final byte[] message) {  
     String string = new String(message);  
     return string;  
   }  
 }

Output from the Consumer app

Couchbase Server

Now we can look on the result in Couchbase server.
Look at kafkaExample bucket - Filled with 1000 documents.

Each document looks something like that:

Simple 3 part solution.
Note, that on a Production environment, Producer, Consumer, Kafka or Couchbase will be on or more machines each.

Full (including Maven dependencies) code in GitHub.

Roi.

Sunday, January 3, 2016

Couchbase Mobile - Part 2 - Couchbase lite views (Indexes!)

Hi,

In the part 1 we've built our tiny yet cool application, we've even replicated it to another Couchbase Lite.

But what now? We want to actually use it!
So how do we use a database? well at least with "getting" the data you have couple of options:
1) Get by primary key
2) Get by an index (or "selecting" it where x)

Up until now, in our simple sample app we could only use the "primary key" to access our data and retrieve it.

But it's not the only way to get you data from Couchbase Lite,
In this part we will learn the basics of Couchbase lite indexing. AKA Views.

On the the views, we run our Queries.
So we need to:

1. Create View
2. Run Queries on the view
3. Get the results

We will built our use case - of how using "views" in Couchbase Lite.

1. start a new WPF project.

2. Add Nuget Couchbase.Lite package

3. Copy that XAML

 <Window x:Class="CouchbaseLiteViews_Blog.MainWindow"  
     xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"  
     xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"  
     xmlns:d="http://schemas.microsoft.com/expression/blend/2008"  
     xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"  
     xmlns:local="clr-namespace:CouchbaseLiteViews_Blog"  
     mc:Ignorable="d"  
     Title="CouchbaseLite Working with views" Height="285.808" Width="525">  
   <Grid>  
     <Grid.ColumnDefinitions>  
       <ColumnDefinition Width="100"/>  
       <ColumnDefinition Width="*"/>  
     </Grid.ColumnDefinitions>  
     <Grid.RowDefinitions>  
       <RowDefinition Height="*"/>  
       <RowDefinition Height="auto"/>  
     </Grid.RowDefinitions>  
     <StackPanel Grid.Column="0" Grid.Row="0" Margin="0 10 0 0">  
       <Button Content="Insert" Click="InsertDocumentClick" />  
       <Button Content="Read" Click="GetDocumentClick" />  
       <Button Content="InsertSomeData!" Click="InsertSomeDataClick" />  
     </StackPanel>  
     <StackPanel Grid.Column="1" Grid.Row="0" Grid.ColumnSpan="2" Margin="0 10 0 0">  
       <TextBox Text="{Binding DocumentId}" Margin="1"/>  
       <TextBox Text="{Binding DocumentText}" TextWrapping="Wrap" AcceptsReturn="True" Height="190" VerticalScrollBarVisibility="Visible"/>   
     </StackPanel>  
     <StackPanel Grid.Column="0" Grid.Row="1" Grid.ColumnSpan="3" Margin="0 10 0 0" Orientation="Horizontal">  
       <Button Content="GetDocument" Width="100" Margin="1" Click="GetByCityClick"/>  
       <TextBox Text="{Binding City}" Width="100"/>  
     </StackPanel>  
   </Grid>  
 </Window>

Which translates to

GUI generates form the XAML above

4. After you got the basic UI, which you can explore later (nothing much here really), lets go the the actual code.

After we started all up, and Initialized the Database let us define our views.
In this case I've defined 1 view - just to show how to set things up.

     private void GenerateViews()  
     {  
       var docsByCity = _database.GetView("docs_by_city");  
       docsByCity.SetMap((doc, emit) =>  
       {  
         if (doc.ContainsKey("City"))  
         {  
           emit(doc["City"], doc);  
         }  
       }, "1");  
     }

What you can see here, that once I retrieve a name from the _database I can define a map on it,
a map is basically a projection and filtering.

In the example above, I've created a view named "docs_by_city",
assigned a delegate, checked if some key ("City") exist and then emitted it to the index.
simple as that.
We've just created our index which for every document contains a property named City - it emits the whole document, you can choose to emit whatever you want, depends on app's requirements.
It can be adjusted for better performance and smaller index size.
Also you can put as your key about any string you would like or compose your index from several properties to target special needs.

It's never good to store the entire document in the index as it basically make a copy of the document inside the index. Try to keep your index as small as possible. But if you happen to need some kind of index which has the entire document as a result, for performance it's better to keep the document in the index instead of accessing the result.Document property - to same some round tripping to the database.

The number "1" here, it the version of the index, During development if you change the map function you also need to increment that number (in case you haven't deleted the whole database), in order to rebuild the index.

There are 2 special queries.
1. Get all documents count. (with _database.DocumentCount)
2. Get all documents. (with _database.CreateAllDocumentsQuery())

After we defined our view (*index) we can start writing the code and use it.

The usage, is fairly simple only 5 steps.

Get the view
Create a query on the view
Define your criteria on the index
Run it
Read it

In code it's look even simpler

     private void GetByCityClick(object sender, RoutedEventArgs e)  
     {  
       var docsByCity = _database.GetView("docs_by_city");  
       var query = docsByCity.CreateQuery();  

       query.StartKey = City;  
       query.EndKey = City;  

       var queryResults = query.Run();  
       MessageBox.Show(string.Format("{0} documents has been retrieved for that query", queryResults.Count));

       if (queryResults.Count == 0) return;  

       var documents = queryResults.Select(result => JsonConvert.SerializeObject(result.Value, Formatting.Indented)).ToArray();  
       var commaSeperaterdDocs = "[" + string.Join(",", documents) + "]";  

       DocumentText = commaSeperaterdDocs;  
     }

I want the exact "City" so i've written on start and end key the same value.
I run the query and check if there is any results.
Then I "beautify" the result (for every value) and return that as a JSON array.

Please pay attention here that I'm not using result.Document but result.Value, as using the result.Document will not use the index and will go and query the database for each result.
so for performance, please use result.key, result.value or result.DocumentId.

Now just add that part to generate some data...

     private void InsertSomeDataClick(object sender, RoutedEventArgs e)  
     {  
       var result = MessageBox.Show("Press Yes to insert some data (10 docs)!", "Confirm", MessageBoxButton.YesNo);  
       if (result == MessageBoxResult.Yes)  
       {  
         var count = _database.DocumentCount;  
         string[] cities = { "London", "New York", "Tel Aviv" };  
         var rnd = new Random();  
         for (int i = 0; i < 10; i++)  
         {  
           var id = "document" + (i + count);  
           var cityIndex = rnd.Next(0, 3);  
   
           var properties = new Dictionary<string, string>();  
           properties.Add("name", "Roi Katz");  
           properties.Add("City", cities[cityIndex]);  
             
           var doc = JsonConvert.SerializeObject(properties);  
           InsertDocument(id, doc);  
         }   
         MessageBox.Show("10 Records inserted");  
       }  
     }

And we are good to go!
This is how we do a simple view!
Of course we have more to come on Couchbase lite views, it's just the start.

Of course we do need to create the proper properties,
So for full project, please check my GitHub page.

Roi.