Best way to store, query and update 300M rows of data

https://dba.stackexchange.com/questions/68121

11-12-2019
|

Question

I'm struggling to find a solution (preferably DBaaS) that I can rely on for storing and querying some 300M rows of data (roughly 100GB).

The data in question is pretty much numeric. There is also one "description" column that I would want to perform full-text search on. There are couple of "category" columns used for filtering as well. I also want to filter/order search results in many ways (10+ different indexes).

There is no need for doing complex joins since the data is pretty much denormalized. The data is updated heavily: some 50M of records are being replaced every day.

I've first tried with DynamoDB, but it can support only up to 5 indexes, and is not capable of doing full-text search at reasonable speed. I've also considered google's BigQuery, but it is designed for "append-only" data. I'm now considering Redshift, but I'm not sure how it will be able to handle such large number of daily updates.

Any advice would be appreciated!

Solution

I ended up storing data in DynamoDB and doing daily sync with Redshift. I've tried Redshift with 600M sample data on 4 node cluster and it runs extremely fast. It is exactly what i need.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange