문제

I have a MySQL table which I expect to be in surplus of a few million rows and 99% select statements. The problem I am having is coming up with a meaningful way to determine the Primary Key. (I have provided a table dump at the bottom for reference)

For some background, I am working with a 2-D grid, whose ranges vary from approximately -800000 to +800000 in each direction. Each row is identified by its X/Z coordinate, for which each coordinate may have 1-30 associated bitstrings (type).

The concerns I have are:

1) I lack a meaningful way to Primary key this table. While I know I can create an id field with auto_increment, I know that in practice this key will never be used, as 100% of SELECTS will be in the form of:

SELECT `type`, `offset`, `bitstring` WHERE `x` = 0 AND `z` = 0;

2) I intend to index on multiple columns (the logical x/z), via:

CREATE INDEX coordinate ON bitstrings(x, z)

While I feel this appropriately addresses my real-world selects, I am constantly concerned it is insufficient indexing, based on the dozens of posts saying 'YES, YOU NEED A PK'. Is this a case where a PK can be ignored or is the arbitrary id still ultimately going to provide some behind-the-scenes optimization well worth the additional table size and column?

As a side note, I have absolutely no limitations on completely restructuring this table, if there is are any more practical, proven ways to store this sort of data.

CREATE TABLE `bitstrings` (
  `x` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
  `z` int(11) NOT NULL COMMENT 'roughly +/- 10^6 range',
  `type` smallint(6) NOT NULL COMMENT 'range: 1-4096',
  `offset` smallint(6) NOT NULL COMMENT 'range: 1-65535',
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8;

--
-- Dumping data for table `bitstrings`
--

INSERT INTO `bitstrings` (`x`, `z`, `type`, `offset`, `bitstring`) VALUES
(0, 0, 1, 0, 0x52),
(0, 0, 2, 1878, 0x52);
도움이 되었습니까?

해결책

Two things

  • Since range of x and z are +/- 1000000, use meduimint (Range of meduimint is +/- 8388608)
  • Since ranges of offset and type is 1-65535, use smallint unsigned

Make these chnages

CREATE TABLE `bitstrings` ( 
  `x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096', 
  `offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535', 
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192'
) ENGINE=MyISAM DEFAULT CHARSET=utf-8; 

When it comes to indexes, you may have a choice

CHOICE #1 : Use a primary key of ID

Setup the table like this

CREATE TABLE `bitstrings` ( 
  `id` int unsigned not null auto_incrmenet,
  `x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096', 
  `offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535', 
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
  primary key (id),
  key xztype (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8; 

This could allow you to query using x and z, while retrieving all ids to have a quick reference back to the row:

SELECT id FROM bitstrings WHERE x=0 and z=0;

If you ever have to retrieve the data for a given id, you could select it as

SELECT * FROM bitstrings WHERE id = 12;

This will retrieve the specific info for whatever x, z and type are there.

CHOICE #2 : Use x,z,type as the PRIMARY KEY

Run this query

SELECT COUNT(1) rcount,x,z,type FROM bitstrings GROUP BY x,z,type HAVING COUNT(1) > 1;

If this query comes back with no rows at all, this can be your primary key

CREATE TABLE `bitstrings` ( 
  `x` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `z` meduimint(11) NOT NULL COMMENT 'roughly +/- 10^6 range', 
  `type` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-4096', 
  `offset` smallint(6) UNSIGNED NOT NULL COMMENT 'range: 1-65535', 
  `bitstring` blob NOT NULL COMMENT 'binary data len: 1-8192',
  primary key (x,z,type)
) ENGINE=MyISAM DEFAULT CHARSET=utf-8; 

CONCLUSION

Since all queries are based on mainly on x and z, either choice would be OK

  • The first choice would yield a bigger .MYI file, but two ways to gather data (grouped by x,z or granularly by id).
  • The second choice only gathers by x and z but a smaller .MYI file.

다른 팁

One alternative may be to combine X and Z into one value. This would limit you to queries that only exacly match one row, so it may not be suitable for you.

SELECT `type`, `offset`, `bitstring` WHERE `xz` = 250 + (800 * 1000000);

This will also put a pretty hard limit on your maximum size of your grid (million x million in my example) and you'd need a bigger type than mediumint.

Unlikely to be worth the trouble, but it could save a few bytes and cycles here and there.

Is (x,y,type) unique? (If not, then the proposed PK will not work.)

  1. Use InnoDB to get the PK "clustered".
  2. Use PARTITION to get a hint of two clustered indexes.
  3. PK(x,z,...) PARTITION by RANGE(z) --yes, specifically the second field
  4. If you need the AUTO_INCREMENT id, then also have INDEX(id) -- that is sufficient to support auto-inc. (More generally, any index starting with id.)

Depending on the properties of x and z, this may be better: PK(z,x,...), RANGE(x).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top