I'm trying to figure the best way of encoding text (either 8-bit ubyte[] or string) to its HTML counterpart.

My proposal so far is to use a lookup-table to map the 8-bit characters

string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = "&quot";
lutLatin1ToXML[0x26] = "&amp";
...

in HTML that have special meaning using the function

pure string toHTML(in string src,
                   ref in string[256] lut) {
    return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}

I almost work except for the fact that I don't know how to create a string from a `ubyte? (the no-translation case).

I tried

writeln(new string('a'));

but it prints garbage and I don't know why.

For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference

有帮助吗?

解决方案

You can make a string from a ubyte most easily by doing "" ~ b, for example:

ubyte b = 65;
string a = "" ~ b;
writeln(a); // prints A

BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful: https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

It has a html parser, dom manipulation functions similar to javascript (e.g. ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded.

assert(htmlEntitiesEncode("foo < bar") == "foo &lt; bar";

stuff like that.

其他提示

In this case Adam's solution works just fine, of course. (It takes advantage of the fact that ubyte is implicitly convertible to char, which is then appended to the immutable(char)[] array for which string is an alias.)

In general the safe way of converting types is to use std.conv.

import std.stdio, std.conv;

void main() {
    // utf-8
    char cc = 'a';
    string s1 = text(cc);
    string s2 = to!string(cc);
    writefln("%c %s %s", cc, s1, s2);

    // utf-16
    wchar wc = 'a';
    wstring s3 = wtext(wc);
    wstring s4 = to!wstring(wc);
    writefln("%c %s %s", wc, s3, s4);    

    // utf-32
    dchar dc = 'a';
    dstring s5 = dtext(dc);
    dstring s6 = to!dstring(dc); 
    writefln("%c %s %s", dc, s5, s6);

    ubyte b = 65;
    string a = to!string(b);
} 

NB. text() is actually intended for processing multiple arguments, but is conveniently short.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top