Reverse Byte-Order of a postgres bytea field
-
04-10-2020 - |
Frage
I'm currently working on a table that contains hashes, stored in bytea format. Converting the hashes to hex-strings however yields the wrong order of bytes. Example:
SELECT encode(hash, 'hex') FROM mytable LIMIT 1;
Output: 1a6ee4de86143e81
Expected: 813e1486dee46e1a
Is there a way to reverse the order of bytes for all entries?
Lösung
Here is one method of doing it, however I would never do this. There is nothing wrong with storing bytes in a database's bytea
column. But, I wouldn't bit wrangle in the database, and if I did I would use,
- a C language function, or
- some fancy procedural language that didn't require me exploding the inputs into a set of bytes.
This is sql-esque and should work -- here is what we're doing,
- Generate a set consisting of a series of offsets 0 - (bytelength-1).
- Map those offsets to bytes represented as strings of hex.
- String aggregate them in reverse order.
Here is an example,
CREATE TABLE foo AS SELECT '\x813e1486dee46e1a'::bytea AS bar;
SELECT bar, string_agg(to_hex(byte), '') AS hash
FROM foo
CROSS JOIN LATERAL (
SELECT get_byte(bar,"offset") AS byte
FROM generate_series(0,octet_length(bar)-1) AS x("offset")
ORDER BY "offset" DESC
) AS x
GROUP BY bar;
Two notes,
- We could probably not use
offset
because it's reserved but you get the point. - This assumes that your hash (bar in the above) is UNIQUE.
Andere Tipps
You could treat encoded representation as text and use regexp to reverse byte by byte.
SELECT string_agg(reverse(b[1]),'')
FROM regexp_matches(reverse(encode('STUFF','hex')),'..','g')b;
Another (more verbose) method:
WITH bytes AS (
SELECT row_number() over() AS n, byte[1]
FROM regexp_matches( encode( 'STUFF', 'hex' ), '..', 'g' ) AS byte
), revbytes AS (
SELECT * FROM bytes ORDER BY n DESC
)
SELECT array_to_string(array_agg(byte),'')
FROM revbytes;
Sample usage:
(filip@[local:/var/run/postgresql]:5432) filip=# SELECT encode( 'STUFF', 'hex' );
encode
------------
5354554646
(1 row)
(filip@[local:/var/run/postgresql]:5432) filip=# SELECT string_agg(reverse(b[1]),'')FROM regexp_matches(reverse(encode('STUFF','hex')),'..','g')b;
string_agg
------------
4646555453
(1 row)
If you need just to reverse bytes in the bytea
value there is the (relatively) simple and fast solution using plpythonu
:
create or replace function reverse_bytea(p_inp bytea) returns bytea stable language plpythonu as $$
b = bytearray()
b.extend(p_inp)
b.reverse()
return b
$$;
select encode(reverse_bytea('\x1a6ee4de86143e81'), 'hex');
----
813e1486dee46e1a
However I suppose that something wrong with data itself (the storage way, the data interpretation...)
Solutions with tools in vanilla Postgres:
I added a column bytea_reverse
to both solutions. Remove it if you don't need it.
With get_byte()
:
SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM tbl t
LEFT JOIN LATERAL (
SELECT string_agg(to_hex(get_byte(b, x)), '') AS text_reverse
FROM generate_series(octet_length(t.b) - 1, 0, -1) x
) x ON true;
This is similar to what @Evan provided. Most of his excellent explanation applies. But:
- Use
LEFT JOIN LATERAL ... ON true
or you lose rows with NULL values. generate_series()
can provide numbers in reverse, so we do not need anotherORDER BY
step.- While using a
LATERAL
join, aggregate in the subquery. Less error prone, easier to integrate with more complex queries, and no need toGROUP BY
in the outer query.
With regexp_matches()
:
SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM tbl t
LEFT JOIN LATERAL (
SELECT string_agg(byte[1], '' ORDER BY ord DESC) AS text_reverse
FROM regexp_matches(encode(t.b, 'hex' ), '..', 'g' ) WITH ORDINALITY AS x(byte, ord)
) x ON true;
This is similar to the "verbose" variant @filiprem provided. But:
- Use
LEFT JOIN LATERAL ... ON true
or you lose rows with NULL values. - Use
WITH ORDINALITY
to get row numbers "for free". So we neither need another subquery withrow_number()
nor a doublereverse()
. Details: - Reverse ordering can be done in the aggregate function. (But it might be a bit faster to order in the subquery and add another subquery layer to aggregate pre-ordered rows.)
- One subquery (or two) instead of two CTE is typically faster.
Similar question on SO:
Thanks to all the suggestions, I wrote this C-Language-Function that works as needed:
#include "postgres.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
Datum bytea_custom_reverse(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(bytea_custom_reverse);
Datum
bytea_custom_reverse(PG_FUNCTION_ARGS) {
bytea *data = PG_GETARG_BYTEA_P_COPY(0);
unsigned char *ptr = (unsigned char *) VARDATA(data);
int32 dataLen = VARSIZE(data) - VARHDRSZ;
unsigned char *start, *end;
for ( start = ptr, end = ptr + dataLen - 1; start < end; ++start, --end ) {
unsigned char swap = *start;
*start = *end;
*end = swap;
}
PG_RETURN_BYTEA_P(data);
}
Thanks for helping this thread. And this is my choose of convert bigint to bytea in littleEndian like in C# using BitConverter.GetBytes() according on answers:
with mycte as (
select int8send(394112768534335::bigint) as conversionValue
)
SELECT decode(string_agg (
(case when get_byte(conversionValue, x)<= 15 then ('0') else '' end) ||
to_hex(get_byte(conversionValue, x))
, ''), 'hex') AS nativeId_reverse
FROM mycte, generate_series(octet_length(conversionValue) - 1, 0, -1) as x;
For search value placed in postgresql as littleEndian byteA by it Bigint presentation:
with mycte as (
select int8send(394112768534335::bigint) as conversionValue
)
Select * FROM mycte, *SomeByteaFieldTable*
where *SomeByteaId* =
(SELECT decode(string_agg (
(case when get_byte(conversionValue, x)<= 15 then ('0') else '' end) ||
to_hex(get_byte(conversionValue, x))
, ''), 'hex') AS nativeId_reverse
FROM generate_series(octet_length(conversionValue) - 1, 1, -1) x);