Frage

Ok, So this is going to be a speculative question, mostly design oriented and a rather long one. I'd grab a cuppa' covfefe, if I were you.

Preface: So I have been researching into databases and wanted a really fast (Like really-really) database(engine) with the following must haves,

  1. ACID Compliance
  2. In-Memory-ish for blazing fast IO.
  3. Persistant (well...duh)
  4. Scalable as in cluster/master-slave/etc
  5. High-Availability(HA)
  6. MySQL Drop-In replacement
  7. Open Source
  8. Should run on commodity server (IYKWIM)

So judging by my over optimistic list of requirements you'd already jump to ....ummm

How to speed up mysql, slow queries

Alright, alright jokes aside, I know if innodb_buffer_pool_size is tweaked properly it'll run off of the memory most of the time, but I say

It ain't in-memory yo!

But you'd say Hey, its 2k18 people might have already created some 100% in memory DBs, Right? Umm... actually they have but each have their own tradeoffs.

  1. VoltDB Community Edition Everything seems fine until you realise it isn't a drop in replacement. It needs some stored procedures-ish commands in java which requires you rewriting whole of your application or atleast the db layer/driver/etc of your php app. So? DEALBREAKER!!!

  2. MemSQL, Well this seems a pretty strong contender for our "Best OpenSource In-Mem Scalable SQL Acid DB ever" contest. Only for, the memSQL Boss be like...

MemSQL Server Requirements

Needless to say, memSQL needs atleast 4 cores and 8Gigs of RAM at minimum, and recommeded is pretty insane at 4 cores and 32 gigs per core!!!! Also the community version of memSQL(which btw, isn't fully OpenSource!, its just free) doesn't support high availability, as its a paid feature. Also its NoSQL. So? DEALBREAKER!!!

  1. All other NoSQLish dbs like membase, Redis, Memcached, etc are prelly much ruled out.

So now for my genius idea!!!

I was wondering if we could run an XtraDB/Galera Cluster with all the instances running off of a RAMDisc with regular snapshots?

It gets all the checkboxes ticked.

Just hear me out, First addressing the elephant in the room, We know that running full mysql Dbs off of RAMDiscs is pretty umm...bold, put in the most polite of ways. So what happens if the server crashes/shuts off/etc we loose a node. While all our DB Cluster as a whole is still alive and kicking a**. All we have to implement is booting the db up from last snapshot and syncing back with the cluster which btw the clusters are pretty good at, inherently!

OK, peeps, Don't go all salty on me, if you see a flaw in my implementation, guide me.

War es hilfreich?

Lösung

OK, 3 Galera nodes, with:

  • Each having all disk stuff sitting in RAM disks.
  • Enough RAM to make that possible.
  • The nodes separated geographically so that a volcano, flood, etc, cannot take out all of them at once.
  • High speed network.

Notes:

  • That will allow for full and automatic recovery from one server loss.
  • The only delay during a failure is switching clients to one of the other two nodes.
  • But, you may find that the network is the slow point. (Entanglement has not been perfected yet.)
  • No snapshots needed. (You always have 2 other nodes with full copies of the data.)

Andere Tipps

(Edited to address comments by @RickJames, plus a bit more on NDB Cluster and general improvements.)

There are several issues with your plan:

  1. If you want persistence, then obviously you can't use a RAM disk for the data files.

  2. If you're planning to use a RAM disk for all the database files (table data files, log files, temporary files, ...) then it would seem you will need more than double the amount of RAM as required by a memory-only database system, since you're storing both the tables themselves and their data files (which would normally be persisted to disk) in RAM.

  3. How are you going to guarantee that the RAM disk is large enough for all possible use-cases? For example, on-disk internal temporary tables can be created when a query is too large to handle in memory. So if you use a RAM disk as your disk storage, then you risk running out of "disk", which is likely to have some detrimental effect. (And this leads me to think there are good reasons why MemSQL has such large RAM requirements ...)

    Maybe there is a way to configure your storage / RAM disk so that part of the storage is on the RAM disk (the part that is used first / preferred) and the other part is on real disk?

That said, there are cases where InnoDB will create on-disk temporary tables even if the query could potentially have been handled in memory. See 8.4.4 Internal Temporary Table Use in MySQL for details. These are cases when a RAM disk could perhaps have been helpful. Here's a blog entry (from 2012) about how to put the MySQL tmpdir on RAM-disk.

However, any RAM used for the tmpdir in a RAM-disk solution is RAM that could potentially have been used for the all-important InnoDB buffer pool. So make sure you have a large enough buffer pool for your data working set before you consider using any of your RAM on a RAM disk.

Assuming you put the tables' data files on the RAM disk and you plan to snapshot it, then you also need to take steps to ensure you're getting a point-in-time consistent backup. These kinds of steps will make the RAM disk slower.

So alternatively, instead of using a RAM disk, you could use Galera as is, but take all possible precautions to avoid the creation of on-disk internal temporary tables. You should obviously also make sure to use SSD instead of spinning disks.

Another technology to consider may be MySQL NDB Cluster:

NDB Cluster is a technology that enables clustering of in-memory databases in a shared-nothing system. The shared-nothing architecture enables the system to work with very inexpensive hardware, and with a minimum of specific requirements for hardware or software. [...]

NDB Cluster really is quite fast, with 200 million (NoSQL) QPS reported back in Feb 2015, see MySQL Cluster Benchmarks. What worries me about it is that they're still using those benchmarks now in 2018, as if no progress has been made since 2015. I also get the sense that, for whatever reason, NDB Cluster's popularity is fading compared to Galera and other solutions. (See e.g. the stats for the various tags here on DBA.SE.)

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit dba.stackexchange
scroll top