Why does `OpCode.Value` have the “wrong” endianness?
-
27-06-2021 - |
質問
Facts:
The correct encoding for the CIL instruction
rethrow
's op-code is the two-byte sequenceFE 1A
.OpCodes.Rethrow.Value
(which has typeshort
) has value0xFE1A
on my little-endian machine.BitConverter
honours the machine's endianness when converting to/from byte sequences.On my little-endian machine,
BitConverter.GetBytes(OpCodes.Rethrow.Value)
results in the byte sequence1A FE
.
That means, serializing an OpCode.Value
on a little-endian machine using BitConverter
does not produce the correct encoding for the op-code; the byte order is reversed.
Questions:
Is the byte ordering of
OpCode.Value
documented (and if so, where?), or is it an "implementation detail"?Does step 4 above on a big-endian machine also result in the wrong byte ordering? That is, would
OpCodes.Rethrow.Value
be0x1AFE
on a big-endian machine?
解決 2
I've reached the conclusion that serializing an op-code representation based on the OpCode.Value
property, i.e.:
OpCode someOpCode = …;
byte[] someOpCodeEncoding = BitConverter.GetBytes(someOpCode.Value);
is a bad idea, but not because of the use of BitConverter.GetBytes(short)
, whose behaviour is well-documented. The main culprit is the OpCode.Value
property, whose documentation is vague in two respects:
It states that this property contains "the value of the immediate operand", which may or may not refer to the op-code's encoding; that term doesn't appear anywhere in the CLI specification.
Even when we assume that it does in fact contain an op-code's encoding, the documentation says nothing about byte order. (Byte order comes into play when converting between
byte[]
andshort
.)
Why am I basing my argument on MSDN documentation, and not on the CLI standard? Because System.Reflection.Emit
is not part of the Reflection Library as defined by the CLI standard. For this reason, I think it's fairly safe to say that the MSDN reference documentation for this namespace is as close as it gets to an official specification. (But unlike @Hans Passant's answer, I would not take one step further and claim that the reference source is in any way a specification.)
Conclusion:
There are two ways to output the op-code encoding for a given OpCode
object:
Stay with
System.Reflection.Emit
functionality and useILGenerator.Emit(someOpCode)
. This may be too restrictive in some situations.Create your own mapping between op-code encodings (i.e.
byte[]
sequences) and the variousOpCode
objects.
他のヒント
The Value property looks like this in the Reference Source:
public short Value
{
get
{
if (m_size == 2)
return (short) (m_s1 << 8 | m_s2);
return (short) m_s2;
}
}
That looks entirely sane of course, m_s2 is always the Least Significant Byte. Looking at ILGenerator:
internal void InternalEmit(OpCode opcode)
{
if (opcode.m_size == 1)
{
m_ILStream[m_length++] = opcode.m_s2;
}
else
{
m_ILStream[m_length++] = opcode.m_s1;
m_ILStream[m_length++] = opcode.m_s2;
}
UpdateStackSize(opcode, opcode.StackChange());
}
Which is want you expected, the 0xfe byte gets emitted first.
So the framework code carefully avoids taking a dependency on endian-ness. CIL doesn't have an endian-ness dependency, no variable length data ever does. True for text files, utf-8 encoding, x86 core machine code instructions. An CIL. So if you convert variable length data to a single value, like the Value property getter does, then that code inevitable does make a conversion from non-endian-ness data to endian-ness data. Which inevitably gets half of the world upset because they think it was the wrong way around. And 100% of all programmers that run into it.
Probably the best way is to do it like the framework does and recover m_s1 and m_s2 as quickly as you can, using your own version of the Opcode type. Easy to do with:
foo.m_s1 = opc.Value >> 8;
foo.m_s2 = opc.Value & 0xff;
foo.m_size = opc.Size;
Which has no endian-ness dependency.
Try:
var yourStream = MemoryStream();
var writer = new System.IO.BinaryWriter(yourStream);
writer.Write(OpCodes.Rethrow.Value);
You don't need to worry about byte order since the BinaryWriter (or reader) will handle the implementation details for you. I suspect that the reason why you're getting the "wrong" byte order is that you're applying the BitConverter on the OpCode value when it's already decoded as little endian, and applying the BitConverter.GetShort() call again will reverse the byte order, giving you the "wrong" result.