mingwのでUTF-8のためのC ++のctypeファセット

https://stackoverflow.com/questions/1354124

20-09-2019
|

質問

プロジェクトのすべての内部文字列は、UTF-8エンコーディングに保持されます。このプロジェクトは、LinuxとWindowsに移植されます。今to_lower機能の必要性があります。

POSIX OSに私はのstd :: ctype_byname（ "ru_RU.UTF-8"）を使用することができます。しかし、（Debianの4.3.4-1）++グラムで、CTYPE :: TOLOWERは、（）（ラテン語のテキストが細かい小文字さ）ロシアのUTF-8文字を認識しません。

私は "ru_RU.UTF-8" 引数でのstd :: ctype_bynameを構築しようとすると： "ロケール::椎:: _ S_create_c_locale名前ではない有効なのstd :: runtime_error"

Windowsでは、mingwのの標準ライブラリには、例外がスローされます。 >

どのように私はWindows上でUTF-8のためのstd ::のctype見つける/実装していますか？プロジェクトはすでにlibiconvを（codecvtファセットが、それに基づいています）に依存しますが、私はそれとto_lower実装する明白な方法が表示されません。

解決

あなたが必要とするすべては、キリル文字のto_lowerある場合は、自分で関数を記述することができます。

АБВГДЕЖ in UTF8  D0 90 D0 91 D0 92 D0 93 D0 94 D0 95 D0 96 0A
абвгдеж in UTF8  D0 B0 D0 B1 D0 B2 D0 B3 D0 B4 D0 B5 D0 B6 0A

しかし、UTF8はマルチバイトエンコーディングであることを忘れないでください。

また、あなたは（libiconvを使用して）wchar_tにUTF8から文字列を変換しようとto_lowerを実装するためにWindowsの特定の機能を使用することができます。

他のヒント

STLportのの

使用してみてください

  Here is a description of how you can use STLport to read/write utf8 files.
utf8 is a way of encoding wide characters. As so, management of encoding in
the C++ Standard library is handle by the codecvt locale facet which is part
of the ctype category. However utf8 only describe how encoding must be
performed, it cannot be used to classify characters so it is not enough info
to know how to generate the whole ctype category facets of a locale
instance.

In C++ it means that the following code will throw an exception to
signal that creation failed:

#include 
// Will throw a std::runtime_error exception.
std::locale loc(".utf8");

For the same reason building a locale with the ctype facets based on
UTF8 is also wrong:

// Will throw a std::runtime_error exception:
std::locale loc(locale::classic(), ".utf8", std::locale::ctype);

The only solution to get a locale instance that will handle utf8 encoding
is to specifically signal that the codecvt facet should be based on utf8
encoding:

// Will succeed if there is necessary platform support.
locale loc(locale::classic(), new codecvt_byname(".utf8"));

  Once you have obtain a locale instance you can inject it in a file stream to
read/write utf8 files:

std::fstream fstr("file.utf8");
fstr.imbue(loc);

You can also access the facet directly to perform utf8 encoding/decoding operations:

typedef std::codecvt codecvt_t;
const codecvt_t& encoding = use_facet(loc);

Notes:

1. The dot ('.') is mandatory in front of utf8. This is a POSIX convention, locale
names have the following format:
language[_country[.encoding]]

Ex: 'fr_FR'
    'french'
    'ru_RU.koi8r'

2. utf8 encoding is only supported for the moment under Windows. The less common
utf7 encoding is also supported.

複数のロケールが付属しています -

（例えばSTDCXXは、Apacheから1のような）いくつかのSTLがあります。しかし、他の状況でのロケールは、システムに依存しています。

あなたが名前を使用することができた場合は、

「ru_RU.UTF-8」1上のオペレーティングシステムは、それが他のシステムは、このロケールの同じ名前を持っていることを意味するものではありません。 Debianと窓は、おそらく他の名前を持っており、これを使用すると、実行時例外を持っている理由です。

あなたが前に、システムにしたいロケールをインストールする必要があります。それとも、すでにこのロケールを持つSTLを使用します。

私のセント...

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow