문제

In our organization, we handle GIS content in different file formats. I need to put these files into a PostGIS database, and that is done using ogr2ogr. The problem is, that the database is UTF8 encoded, and the files might have a different encoding.

I found descriptions of how I can specify the encoding by adding an options parameter to org2ogr, but appearantly it doesn't have an effect.

ogr2ogr -f PostgreSQL PG:"host=localhost user=username dbname=dbname \
password=password options='-c client_encoding=latin1'" sourcefile;

The error I recieve is:

ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "målsætning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe56c73
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

ERROR 1: ALTER TABLE "soer_vd" ADD COLUMN "påvirkning" CHAR(10)
ERROR: invalid byte sequence for encoding "UTF8": 0xe57669
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

ERROR 1: INSERT command for new feature failed.
ERROR: invalid byte sequence for encoding "UTF8": 0xf8
HINT: This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".

Currently, my source file is a Shape file and I'm pretty sure, that it is Latin1 encoded.

What am I doing wrong here and can you help me?

Kind regards, Casper

도움이 되었습니까?

해결책

That does sound like it would set the client encoding to LATIN1. Exactly what error do you get?

Just in case ogr2ogr doesn't pass it along properly, you can also try setting the environment variable PGCLIENTENCODING to latin1.

I suggest you double check that they are actually LATIN1. Simply running file on it will give you a good idea, assuming it's actually consistent within the file. You can also try sending it through iconv to convert it to either LATIN1 or UTF8.

다른 팁

Magnus is right and I will discuss the solution here.

I have seen the option to inform PostgreSQL about character encoding, options=’-c client_encoding=xxx’, used many places, but it does not seem to have any effect. If someone knows how this part is working, feel free to elaborate.

Magnus suggested to set the environment variable PGCLIENTENCODING to LATIN1. This can, according to a mailing list I queried, be done by modifying the call to ogr2ogr:

ogr2ogr -–config PGCLIENTENCODING LATIN1 –f PostgreSQL 
PG:”host=hostname user=username dbname=databasename password=password” inputfile

This didn’t do anything for me. What worked for me was to, before the call to ogr2ogr, to:

SET PGCLIENTENCODING=LATIN1

It would be great to hear more details from experienced users and I hope it can help others :)

Currently, OGR from GDAL does not perform any recoding of character data during translation between vector formats. The team has prepared RFC 23.1: Unicode support in OGR document which discusses support of recoding for OGR drivers. The RFC 23 was adopted and the core functionality was already released in GDAL 1.6.0. However, most of OGR drivers have not been updated, including Shapefile driver.

For the time being, I would describe OGR as encoding agnostic and ignorant. It means, OGR does take what it gets and sends it out without any processing. OGR uses char type to manipulate textual data. This is fine to handle multi-byte encoded strings (like UTF-8) - it's just a plain stream of bytes stored as array of char elements.

It is advised that developers of OGR drivers should return UTF-8 encoded strings of attribute values, however this rule has not been widely adopted across OGR drivers, thus making this functionality not end-user ready yet.

You need to write your command line like this :

PGCLIENTENCODING=LATIN1 ogr2ogr -f PostgreSQL PG:"dbname=...

On windows a command is

SET PGCLIENTENCODING=LATIN1

On linux

export PGCLIENTENCODING=LATIN1

or

PGCLIENTENCODING=LATIN1

Moreover this discussion help me:

https://gis.stackexchange.com/questions/218443/ogr2ogr-encoding-on-windows-using-os4geo-shell-with-census-data

On windows

SET PGCLIENTENCODING=LATIN1 ogr2ogr...

do not give me any error, but ogr2ogr do not works...I need to change the system variable (e.g. System--> Advanced system settings--> Environment variables -->New system variable) reboot the system and then run

ogr2ogr...

I solved this problem using this command:

pg_restore --host localhost --port 5432 --username postgres --dbname {DBNAME} --schema public --verbose "{FILE_PATH to import}"

I don't know if this is the right solution, but it worked for me.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top