Most efficient (or meaningful) way to index grid-based data
-
22-10-2019 - |
문제
I have a MySQL table which I expect to be in surplus of a few million rows and 99% select statements. The problem I am having is coming up with a meaningful way to determine the Primary Key. (I have provided a table dump at the bottom for reference)
For some background, I am working with a 2-D grid, whose ranges vary from approximately -800000 to +800000 in each direction. Each row is identified by its X/Z coordinate, for which each coordinate may have 1-30 associated bitstrings (type).
The concerns I have are:
1) I lack a meaningful way to Primary key this table. While I know I can create an id
field with auto_increment, I know that in practice this key will never be used, as 100% of SELECTS will be in the form of:
SELECT `type`, `offset`, `bitstring` WHERE `x` = 0 AND `z` = 0;
2) I intend to index on multiple columns (the logical x/z), via:
CREATE INDEX coordinate ON bitstrings(x, z)
While I feel this appropriately addresses my real-world selects, I am constantly concerned it is insufficient indexing, based on the dozens of posts saying 'YES, YOU NEED A PK'. Is this a case where a PK can be ignored or is the arbitrary id
still ultimately going to provide some behind-the-scenes optimization well worth the additional table size and column?
As a side note, I have absolutely no limitations on completely restructuring this table, if there is are any more practical, proven ways to store this sort of data.
CREATE TABLE `bitstrings` (
`x` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;
--
-- Dumping data for table `bitstrings`
--
INSERT INTO `bitstrings` (`x`, `z`, `type`, `offset`, `bitstring`) VALUES
(0, 0, 1, 0, 0x52),
(0, 0, 2, 1878, 0x52);
해결책
Two things
- Since range of x and z are +/- 1000000, use
meduimint
(Range of meduimint is +/- 8388608) - Since ranges of offset and type is 1-65535, use
smallint unsigned
Make these chnages
CREATE TABLE `bitstrings` (
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;
When it comes to indexes, you may have a choice
CHOICE #1 : Use a primary key of ID
Setup the table like this
CREATE TABLE `bitstrings` (
`id` int unsigned not null auto_incrmenet,
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
primary key (id),
key xztype (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;
This could allow you to query using x and z, while retrieving all ids to have a quick reference back to the row:
SELECT id FROM bitstrings WHERE x=0 and z=0;
If you ever have to retrieve the data for a given id, you could select it as
SELECT * FROM bitstrings WHERE id = 12;
This will retrieve the specific info for whatever x, z and type are there.
CHOICE #2 : Use x,z,type as the PRIMARY KEY
Run this query
SELECT COUNT(1) rcount,x,z,type FROM bitstrings GROUP BY x,z,type HAVING COUNT(1) > 1;
If this query comes back with no rows at all, this can be your primary key
CREATE TABLE `bitstrings` (
`x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
`type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096',
`offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535',
`bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
primary key (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;
CONCLUSION
Since all queries are based on mainly on x and z, either choice would be OK
- The first choice would yield a bigger .MYI file, but two ways to gather data (grouped by x,z or granularly by id).
- The second choice only gathers by x and z but a smaller .MYI file.
다른 팁
One alternative may be to combine X and Z into one value. This would limit you to queries that only exacly match one row, so it may not be suitable for you.
SELECT `type`, `offset`, `bitstring` WHERE `xz` = 250 + (800 * 1000000);
This will also put a pretty hard limit on your maximum size of your grid (million x million in my example) and you'd need a bigger type than mediumint.
Unlikely to be worth the trouble, but it could save a few bytes and cycles here and there.
Is (x,y,type) unique? (If not, then the proposed PK will not work.)
- Use InnoDB to get the PK "clustered".
- Use PARTITION to get a hint of two clustered indexes.
- PK(x,z,...) PARTITION by RANGE(z) --yes, specifically the second field
- If you need the AUTO_INCREMENT id, then also have INDEX(id) -- that is sufficient to support auto-inc. (More generally, any index starting with id.)
Depending on the properties of x and z, this may be better: PK(z,x,...), RANGE(x).