Question

I'm using what seems to be a common trick for creating a join view:

// a Customer has many Orders; show them together in one view:
function(doc) {
  if (doc.Type == "customer") {
    emit([doc._id, 0], doc);
  } else if (doc.Type == "order") {
    emit([doc.customer_id, 1], doc);
  }
}

I know I can use the following query to get a single customer and all related Orders:

?startkey=["some_customer_id"]&endkey=["some_customer_id", 2]

But now I've tied my query very closely to my view code. Is there a value I can put where I put my "2" to more clearly say, "I want everything tied to this Customer"? I think I've seen

?startkey=["some_customer_id"]&endkey=["some_customer_id", {}]

But I'm not sure that {} is certain to sort after everything else.

Credit to cmlenz for the join method.

Further clarification from the CouchDB wiki page on collation:

The query startkey=["foo"]&endkey=["foo",{}] will match most array keys with "foo" in the first element, such as ["foo","bar"] and ["foo",["bar","baz"]]. However it will not match ["foo",{"an":"object"}]

So {} is late in the sort order, but definitely not last.

Was it helpful?

Solution

Rather than trying to find the greatest possible value for the second element in your array key, I would suggest instead trying to find the least possible value greater than the first: ?startkey=["some_customer_id"]&endkey=["some_customer_id\u0000"]&inclusive_end=false.

OTHER TIPS

I have two thoughts.

Use timestamps

Instead of using simple 0 and 1 for their collation behavior, use a timestamp that the record was created (assuming they are part of the records) a la [doc._id, doc.created_at]. Then you could query your view with a startkey of some sufficiently early date (epoch would probably work), and an endkey of "now", eg date +%s. That key range should always include everything, and it has the added benefit of collating by date, which is probably what you want anyways.

or, just don't worry about it

You could just index by the customer_id and nothing more. This would have the nice advantage of being able to query using just key=<customer_id>. Sure, the records won't be collated when they come back, but is that an issue for your application? Unless you are expecting tons of records back, it would likely be trivial to simply pluck the customer record out of the list once you have the data retrieved by your application.

For example in ruby:

customer_records = records.delete_if { |record| record.type == "customer" }

Anyways, the timestamps is probably the more attractive answer for your case.

CouchDB is mostly written in Erlang. I don't think there would be an upper limit for a string compound/composite key tuple sizes other than system resources (e.g. a key so long it used all available memory). The limits of CouchDB scalability are unknown according to the CouchDB site. I would guess that you could keep adding fields into a huge composite primary key and the only thing that would stop you is system resources or hard limits such as maximum integer sizes on the target architecture.

Since CouchDB stores everything using JSON, it is probably limited to the largest number values by the ECMAScript standard.All numbers in JavaScript are stored as a floating-point IEEE 754 double. I believe the 64-bit double can represent values from - 5e-324 to +1.7976931348623157e+308.

It seems like it would be nice to have a feature where endKey could be inclusive instead of exclusive.

This should do the trick:

?startkey=["some_customer_id"]&endkey=["some_customer_id", "\uFFFF"]

This should include anything that starts with a character less than \uFFFF (all unicode characters)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top