Cassandra timeseries datamodel

https://stackoverflow.com/questions/17987921

04-06-2022
|

質問

Let assume 10 devices(dev01,dev02,dev03..etc).

It send data with some interval time,we collect those data,so our data schema is

 dev01      :int
 signalname :string
 signaltime :date/time[with YY-MM-DD HHMMSS.mm]
 Extradata  :String

I want to push data into cassandra ,which way is best to store those data?

My Query is Like ,

1 Need to retrive device based current day data,or with some date range?

2 5 Device current day data?

I am not sure the following way to store data into cassadra is best model

Standard columnfamily Name:signalname
row key                   :dev01
columnname                :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue               :Json data
columnname                :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue               :Json data

row key               :dev02
columnname            :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue           :Json data
columnname            :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue           :Json data

Or  

Super columnfamily   :signalname
row key              :Clientid1

supercolumnname      :dev01
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

supercolumnname      :dev02
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data


row key              :Clientid2

supercolumnname      :dev03
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

supercolumnname      :dev04
columnname           :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue          :Json data

kindly help me out regarding this issue, Any other Way?

Thanks&Regards, Kannadhasan

解決

I see 3 issues with your approach here which I will address below:

super column families,
thrift vs cql3,
json data as cell values.

Before you go ahead: the use super column families is discouraged. Read more here. Composite keys (as described below) are the way to go.

Also, you might need to read up on CQL3, since thrift is a legacy API since 1.2.

Instead of storing json data, you may make use of native collection data types like lists, and maps etc. If you still want to work with JSON, there is improved JSON support in in Cassandra since version 2.2.

In general, it is pretty straightforward to query per device and per timeperiod:

you row key would be the device id and the column key a timeuuid
To avoid hot spots, you could add "bucket" counters to the row key (create a composite row/partition key) to rotate the nodes
You can then query for time ranges if you know the row/device id.

Alternatively you could use your signal type as a row key (and timeuuid/timestamp as a column key) if you want to query data for multiple devices (but one event type) at once. Read more on timeseries data in cassandra in this blog entry.

Hope that helps!

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow