質問

My use case is as follows: I have to store and query data from multiple third-party sources. The only predefined schema knowledge I have is that it consists of Entity-Attribute-Value tuples with an additional Source component: (E, A, V, S). I do not know in advance which attributes will be present (so using the actual attributes themselves as columns is a problematic). The combination (E,A,V) must be unique, and I therefore thought using that as the composite key would be the best way to model this, so we get this:

CREATE TABLE t1 (
       E text,
       A text,
       V text,
       S text,
       PRIMARY KEY(E, A, V)
);

This will enable me to ask queries like:

  • given E, show all attributes A and values V and the sources S in which they occur
  • given E and A, give all V.

The plan is to create mirrors of this table with differently ordered compound keys so that I can also answer queries of the form:

  • given A, show all entities E and values V.
  • given S, show all E, A, V tuples.

and so on and so forth (effectively, the mirror tables play the role of indexes, and to get full indexing I will need 6 copies of effectively the same data - not sure about the scalability of that approach yet, but that's a separate question I guess).

So far so good, however, the part I'm struggling with is this: V is in fact an object in its own right, with multiple properties. If this were a relational model, I would have V be a foreign key field pointing to a relation that maps its id to, for example, a type field and a value field. But getting rid of foreign keys (and the joins that go with them) is more or less the point of the BigTable approach I guess, so I'm looking for a way to incorporate this into my table t1.

Of course, I could do something like this:

CREATE TABLE t1 (
       E text,
       A text,
       V_id text,
       S text,
       V_type text,
       V_value text,
       PRIMARY KEY(E, A, V_id)
);

But the problem I see is that this fails to capture the (inverse-)functional relation between V's id, type and value: with the above table, I could get, for example:

 E  | A  | V_id | V_type | V_value
 ---+----+------+--------+--------
 a1 | b1 |    1 | X      | foo
 a1 | b1 |    2 | X      | foo
 a1 | b2 |    1 | Y      | bar

While I want to be able to ensure that given a V_id, type and value are unique, and vice versa. I guess what I'm after is what in the older Cassandra versions would be a nested super column, but I'm trying to achieve what I need with CQL3.

I briefly looked into collection types, but that doesn't quite seem to fit my use case.

Can anyone suggest a better way to model this, keeping in mind that ideally, I'd like to be able to retrieve (E, A, V) and in as few queries as possible? Or am I just overthinking it and is the approach as I currently have it in fact fine (I can of course try and ensure uniqueness at the application level)?

役に立ちましたか?

解決

Jeen, consider using map for V. See http://www.datastax.com/docs/1.2/cql_cli/using/collections Map, list, set types were introduced to avoid situations like encoding/decoding key-value pairs into and from string/byte arrays.

他のヒント

You could denormalize t1 and create a new table where you could store attributes for a particular v_id. This way you could achieve the join equivalent of SQL. Assuming v_id is unique, your new table could have the following structure:

    CREATE TABLE t2 (
        V_id text PRIMARY KEY,
        type text,
        value text,
    );

There is a very good read on mapping SQL operations to Cassandra : http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/enter link description here

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top