Most efficient (or meaningful) way to index grid-based data

https://dba.stackexchange.com/questions/16086

22-10-2019
|

문제

I have a MySQL table which I expect to be in surplus of a few million rows and 99% select statements. The problem I am having is coming up with a meaningful way to determine the Primary Key. (I have provided a table dump at the bottom for reference)

For some background, I am working with a 2-D grid, whose ranges vary from approximately -800000 to +800000 in each direction. Each row is identified by its X/Z coordinate, for which each coordinate may have 1-30 associated bitstrings (type).

The concerns I have are:

1) I lack a meaningful way to Primary key this table. While I know I can create an id field with auto_increment, I know that in practice this key will never be used, as 100% of SELECTS will be in the form of:

SELECT `type`, `offset`, `bitstring` WHERE `x` = 0 AND `z` = 0;

2) I intend to index on multiple columns (the logical x/z), via:

CREATE INDEX coordinate ON bitstrings(x, z)

While I feel this appropriately addresses my real-world selects, I am constantly concerned it is insufficient indexing, based on the dozens of posts saying 'YES, YOU NEED A PK'. Is this a case where a PK can be ignored or is the arbitrary id still ultimately going to provide some behind-the-scenes optimization well worth the additional table size and column?

As a side note, I have absolutely no limitations on completely restructuring this table, if there is are any more practical, proven ways to store this sort of data.

CREATE TABLE `bitstrings` (
  `x` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
  `z` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
  `type` smallint(6) NOT NULL COMMENT 'range: 1-4096',
  `offset` smallint(6) NOT NULL COMMENT 'range: 1-65535',
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;

--
-- Dumping data for table `bitstrings`
--

INSERT INTO `bitstrings` (`x`, `z`, `type`, `offset`, `bitstring`) VALUES
(0, 0, 1, 0, 0x52),
(0, 0, 2, 1878, 0x52);

해결책

Two things

Since range of x and z are +/- 1000000, use meduimint (Range of meduimint is +/- 8388608)
Since ranges of offset and type is 1-65535, use smallint unsigned

Make these chnages

CREATE TABLE `bitstrings` ( 
  `x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096', 
  `offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535', 
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;

When it comes to indexes, you may have a choice

CHOICE #1 : Use a primary key of ID

Setup the table like this

CREATE TABLE `bitstrings` ( 
  `id` int unsigned not null auto_incrmenet,
  `x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096', 
  `offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535', 
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
  primary key (id),
  key xztype (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;

This could allow you to query using x and z, while retrieving all ids to have a quick reference back to the row:

SELECT id FROM bitstrings WHERE x=0 and z=0;

If you ever have to retrieve the data for a given id, you could select it as

SELECT * FROM bitstrings WHERE id = 12;

This will retrieve the specific info for whatever x, z and type are there.

CHOICE #2 : Use x,z,type as the PRIMARY KEY

Run this query

SELECT COUNT(1) rcount,x,z,type FROM bitstrings GROUP BY x,z,type HAVING COUNT(1) > 1;

If this query comes back with no rows at all, this can be your primary key

CREATE TABLE `bitstrings` ( 
  `x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096', 
  `offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535', 
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
  primary key (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;

CONCLUSION

Since all queries are based on mainly on x and z, either choice would be OK

The first choice would yield a bigger .MYI file, but two ways to gather data (grouped by x,z or granularly by id).
The second choice only gathers by x and z but a smaller .MYI file.

다른 팁

One alternative may be to combine X and Z into one value. This would limit you to queries that only exacly match one row, so it may not be suitable for you.

SELECT `type`, `offset`, `bitstring` WHERE `xz` = 250 + (800 * 1000000);

This will also put a pretty hard limit on your maximum size of your grid (million x million in my example) and you'd need a bigger type than mediumint.

Unlikely to be worth the trouble, but it could save a few bytes and cycles here and there.

Is (x,y,type) unique? (If not, then the proposed PK will not work.)

Use InnoDB to get the PK "clustered".
Use PARTITION to get a hint of two clustered indexes.
PK(x,z,...) PARTITION by RANGE(z) --yes, specifically the second field
If you need the AUTO_INCREMENT id, then also have INDEX(id) -- that is sufficient to support auto-inc. (More generally, any index starting with id.)

Depending on the properties of x and z, this may be better: PK(z,x,...), RANGE(x).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange