Pregunta

by Random Access i do not mean selecting a random record,
Random Access is the ability to fetch all records in equal time,
the same way values are fetched from an array.
From wikipedia: http://en.wikipedia.org/wiki/Random_access

my intention is to store a very large array of strings, one that is too big for memory.
but still have the benefit or random-access to the array.

I usally use MySQL but it seems it has only B-Tree and Hash index types.

I don't see a reason why it isn't possible to implement such a thing.
The indexes will be like in array, starting from zero and incrementing by 1.

I want to simply fetch a string by its index, not get the index according to the string. The goal is to improve performance. I also cannot control the order in which the strings will be accessed, it'll be a remote DB server which will constantly receive indexes from clients and return the string for that index.

Is there a solution for this?

p.s I don't thing this is a duplicate of Random-access container that does not fit in memory?
Because in that question he has other demands except random access

¿Fue útil?

Solución

You are talking about constant time, but you mention a unique incrementing primary key.

Unless such a key is gapless, you cannot use it as an offset, so you still need some kind of structure to look up the actual offset.

Finding a record by offset isn't usually particularly useful, since you will usually want to find it by some more friendly method, which will invariably involve an index. Searching a B-Tree index is worst case O(log n), which is pretty good.

Assuming you just have an array of strings - store it in a disk file of fixed length records and use the file system to seek to your desired offset.

Then benchmark against a database lookup.

Otros consejos

Given your definition, if you just use an SSD for storing your data, it will allow for what you call random access (i.e. uniform access speed across the data set). The fact that sequential access is less expensive than random one comes from the fact that sequential access to disk is much faster than random one (and any database tries it's best to make up for this, btw).

That said, even RAM access is not uniform as sequential access is faster due to caching and NUMA. So uniform access is an illusion anyway, which begs the question, why you are so insisting of having it in the first place. I.e. what you think will go wrong when having slow random access - it might be still fast enough for your use case.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top