Reverse Byte-Order of a postgres bytea field

https://dba.stackexchange.com/questions/156700

04-10-2020
|

Frage

I'm currently working on a table that contains hashes, stored in bytea format. Converting the hashes to hex-strings however yields the wrong order of bytes. Example:

SELECT encode(hash, 'hex') FROM mytable LIMIT 1;

Output: 1a6ee4de86143e81
Expected: 813e1486dee46e1a

Is there a way to reverse the order of bytes for all entries?

Lösung

Here is one method of doing it, however I would never do this. There is nothing wrong with storing bytes in a database's bytea column. But, I wouldn't bit wrangle in the database, and if I did I would use,

a C language function, or
some fancy procedural language that didn't require me exploding the inputs into a set of bytes.

This is sql-esque and should work -- here is what we're doing,

Generate a set consisting of a series of offsets 0 - (bytelength-1).
Map those offsets to bytes represented as strings of hex.
String aggregate them in reverse order.

Here is an example,

CREATE TABLE foo AS SELECT '\x813e1486dee46e1a'::bytea AS bar;

SELECT bar, string_agg(to_hex(byte), '') AS hash
FROM foo
CROSS JOIN LATERAL (
  SELECT get_byte(bar,"offset") AS byte
  FROM generate_series(0,octet_length(bar)-1) AS x("offset")
  ORDER BY "offset" DESC
) AS x
GROUP BY bar;

Two notes,

We could probably not use offset because it's reserved but you get the point.
This assumes that your hash (bar in the above) is UNIQUE.

Andere Tipps

You could treat encoded representation as text and use regexp to reverse byte by byte.

SELECT string_agg(reverse(b[1]),'')
FROM regexp_matches(reverse(encode('STUFF','hex')),'..','g')b;

Another (more verbose) method:

WITH bytes AS (
  SELECT row_number() over() AS n, byte[1]
  FROM regexp_matches( encode( 'STUFF', 'hex' ), '..', 'g' ) AS byte
), revbytes AS (
  SELECT * FROM bytes ORDER BY n DESC
)
SELECT array_to_string(array_agg(byte),'')
FROM revbytes;

Sample usage:

(filip@[local:/var/run/postgresql]:5432) filip=# SELECT encode( 'STUFF', 'hex' );
   encode   
------------
 5354554646
(1 row)

(filip@[local:/var/run/postgresql]:5432) filip=# SELECT string_agg(reverse(b[1]),'')FROM regexp_matches(reverse(encode('STUFF','hex')),'..','g')b;
 string_agg 
------------
 4646555453
(1 row)

If you need just to reverse bytes in the bytea value there is the (relatively) simple and fast solution using plpythonu:

create or replace function reverse_bytea(p_inp bytea) returns bytea stable language plpythonu as $$
  b = bytearray()
  b.extend(p_inp)
  b.reverse()
  return b
$$;

select encode(reverse_bytea('\x1a6ee4de86143e81'), 'hex');
----
813e1486dee46e1a

However I suppose that something wrong with data itself (the storage way, the data interpretation...)

Solutions with tools in vanilla Postgres:

I added a column bytea_reverse to both solutions. Remove it if you don't need it.

With get_byte():

SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM   tbl t
LEFT   JOIN LATERAL (
   SELECT string_agg(to_hex(get_byte(b, x)), '') AS text_reverse
   FROM   generate_series(octet_length(t.b) - 1, 0, -1) x
   ) x ON true;

This is similar to what @Evan provided. Most of his excellent explanation applies. But:

Use LEFT JOIN LATERAL ... ON true or you lose rows with NULL values.
generate_series() can provide numbers in reverse, so we do not need another ORDER BY step.
While using a LATERAL join, aggregate in the subquery. Less error prone, easier to integrate with more complex queries, and no need to GROUP BY in the outer query.

With regexp_matches():

SELECT t.b, text_reverse, decode(text_reverse, 'hex') AS bytea_reverse
FROM   tbl t
LEFT   JOIN LATERAL (
   SELECT string_agg(byte[1], '' ORDER  BY ord DESC) AS text_reverse
   FROM   regexp_matches(encode(t.b, 'hex' ), '..', 'g' ) WITH ORDINALITY AS x(byte, ord)
   ) x ON true;

This is similar to the "verbose" variant @filiprem provided. But:

Use LEFT JOIN LATERAL ... ON true or you lose rows with NULL values.
Use WITH ORDINALITY to get row numbers "for free". So we neither need another subquery with row_number() nor a double reverse(). Details:
- PostgreSQL unnest() with element number
Reverse ordering can be done in the aggregate function. (But it might be a bit faster to order in the subquery and add another subquery layer to aggregate pre-ordered rows.)
One subquery (or two) instead of two CTE is typically faster.

Similar question on SO:

Convert bigint to bytea, but swap the byte order

Thanks to all the suggestions, I wrote this C-Language-Function that works as needed:

#include "postgres.h"
#include "fmgr.h"

#ifdef PG_MODULE_MAGIC
    PG_MODULE_MAGIC;
#endif

Datum bytea_custom_reverse(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(bytea_custom_reverse);
Datum
bytea_custom_reverse(PG_FUNCTION_ARGS) {
  bytea *data = PG_GETARG_BYTEA_P_COPY(0);
  unsigned char *ptr = (unsigned char *) VARDATA(data);

  int32 dataLen = VARSIZE(data) - VARHDRSZ;

  unsigned char *start, *end;

  for ( start = ptr, end = ptr + dataLen - 1; start < end; ++start, --end ) {
    unsigned char swap = *start;
    *start = *end;
    *end = swap;
  }


  PG_RETURN_BYTEA_P(data);
}

Thanks for helping this thread. And this is my choose of convert bigint to bytea in littleEndian like in C# using BitConverter.GetBytes() according on answers:

with mycte as (
select int8send(394112768534335::bigint) as conversionValue
)
SELECT decode(string_agg (
  (case when get_byte(conversionValue, x)<= 15 then ('0')  else  '' end) ||
  to_hex(get_byte(conversionValue, x))
  , ''), 'hex') AS nativeId_reverse  
   FROM mycte,  generate_series(octet_length(conversionValue) - 1, 0, -1) as x;

For search value placed in postgresql as littleEndian byteA by it Bigint presentation:

with mycte as (
select int8send(394112768534335::bigint) as conversionValue
)
Select * FROM mycte, *SomeByteaFieldTable*
where *SomeByteaId* =                                                                         
(SELECT decode(string_agg (
  (case when get_byte(conversionValue, x)<= 15 then ('0')  else  '' end) ||
  to_hex(get_byte(conversionValue, x))
  , ''), 'hex') AS nativeId_reverse  
   FROM   generate_series(octet_length(conversionValue) - 1, 1, -1) x);

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit dba.stackexchange