Thursday, 5 July 2018

Cloudant - Fun with Indexing and Querying

So I was trying to resolve an issue for a colleague, who needed to use an $or operator.

He found that his query would take a very long time ( minutes ) and fail to return any results, searching through ~500K documents.

I tested the problem and, eventually, the solution, using my own data set: -

id,givenName,familyName
1,Maggie,Simpson
2,Lisa,Simpson
3,Bart,Simpson
4,Homer,Simpson
5,Fred,Flintstone
6,Wilma,Flintstone
7,Barney,Rubble
8,Betty,Rubble


In Cloudant, each document looks like this: -

{
  "_id": "1",
  "_rev": "1-0152a3e6867ad34da6e882a80f0fbeff",
  "id": "1",
  "givenName": "Maggie",
  "familyName": "Simpson"
}

{
  "_id": "2",
  "_rev": "1-6bbb94301323a3c3f6ff54f1c3c765e5",
  "id": "2",
  "givenName": "Lisa",
  "familyName": "Simpson"
}

etc.

So this was the query I was using: -

{
  "selector": {
     "familyName": "Simpson",
     "givenName": {
        "$or": [
           {
              "givenName": "Maggie"
           },
           {
              "givenName": "Lisa"
           }
        ]
     }
  },
  "fields": [
     "givenName",
     "familyName"
  ],
  "sort": [
     {
        "givenName": "asc"
     }
  ]
}

In my simple brain, this would return documents for Maggie and Lisa, out of the eight in my database.

I'd previously created this index: -

{
   "index": {
      "fields": [
         "givenName"
      ]
   },
   "name": "givenName-json-index",
   "type": "json"
}

When I ran my query, I got nothing back: -


apart from this statistic: 


Thankfully I found a smart person on our Cloudant Slack channel, who told me where I was going wrong: -


So I changed my query: -

{
   "selector": {
      "$or": [
         {
            "givenName": "Maggie"
         },
         {
            "givenName": "Lisa"
         }
      ]
   },
   "fields": [
      "givenName",
      "familyName"
   ],
   "sort": [
      {
         "givenName": "asc"
      }
   ]
}

and now I see data: -


Yay!



This was also very useful: -


1 comment:

Anonymous said...

A big hint sometimes is your index or query does not working with each other is either looking at response of _find in network tab of browser tools.

Also the little exclamation point to left of Execute in xx time try to make that a little prettier.

Note to self - use kubectl to query images in a pod or deployment

In both cases, we use JSON ... For a deployment, we can do this: - kubectl get deployment foobar --namespace snafu --output jsonpath="{...