Purity of functions generating ByteString (or any object with ForeignPtr component)

https://stackoverflow.com/questions/8616861

28-03-2021
|

문제

Since a ByteString is a constructor with ForeignPtr:

data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
                     {-# UNPACK #-} !Int                -- offset
                     {-# UNPACK #-} !Int                -- length

If I have a function that returns ByteString, then given an input, say a constant Word8, the function will return a ByteString with non-deterministic ForeignPtr value - as to what that value will be is determined by the memory manager.

So, does that mean that a function that returns ByteString is not pure? That doesn't seem to be the case obviously, if you have used ByteString and Vector libraries. Surely, it would have been discussed widely if it were the case (and hopefully show up on top of google search). How is that purity enforced?

The reason for asking this question is I am curious what are the subtle points involved in using ByteString and Vector objects, from the GHC compiler perspective, given ForeignPtr member in their constructor.

해결책

There is no way to observe the value of the pointer inside the ForeignPtr from outside the Data.ByteString module; its implementation is internally impure, but externally pure, because it makes sure that the invariants required to be pure are maintained as long as you cannot see inside the ByteString constructor — which you can't, because it's not exported.

This is a common technique in Haskell: implementing something with unsafe techniques under the hood, but exposing a pure interface; you get both the performance and power unsafe techniques bring, without compromising Haskell's safety. (Of course, the implementation modules can have bugs, but do you think ByteString would be less likely to leak its abstraction if it was written in C? :))

As far as the subtle points go, if you're talking from a user's perspective, don't worry: you can use any function the ByteString and Vector libraries export without worrying, as long as they don't start with unsafe. They are both very mature and well-tested libraries, so you shouldn't run into any purity problems at all, and if you do, that's a bug in the library, and you should report it.

As far as writing your own code that provides external safety with an unsafe internal implementation, the rule is very simple: maintain referential transparency.

Taking ByteString as an example, the functions to construct ByteStrings use unsafePerformIO to allocate blocks of data, which they then mutate and put in the constructor. If we exported the constructor, then user code would be able to get at the ForeignPtr. Is this problematic? To determine whether it is, we need to find a pure function (i.e. not in IO) that lets us distinguish two ForeignPtrs allocated in this way. A quick glance at the documentation shows that there is such a function: instance Eq (ForeignPtr a) would let us distinguish these. So we must not allow user code to access the ForeignPtr. The easiest way to do this is to not export the constructor.

In summary: When you use an unsafe mechanism to implement something, verify that the impurity it introduces cannot leak outside of the module, e.g. by inspecting the values you produce with it.

As far as compiler issues go, you shouldn't really have to worry about them; while the functions are unsafe, they shouldn't allow you to do anything more dangerous, beyond violating purity, than you can do in the IO monad to start with. Generally, if you want to do something that could produce really unexpected results, you'll have to go out of your way to do so: for instance, you can use unsafeDupablePerformIO if you can deal with the possibility of two threads evaluating the same thunk of the form unsafeDupablePerformIO m simultaneously. unsafePerformIO is slightly slower than unsafeDupablePerformIO because it prevents this from happening. (Thunks in your program can be evaluated by two threads simultaneously during normal execution with GHC; this is normally not a problem, as evaluating the same pure value twice should have no adverse side-effects (by definition), but when writing unsafe code, it's something you have to take into account.)

The GHC documentation for unsafePerformIO (and unsafeDupablePerformIO, as I linked above) details some pitfalls you might run into; similarly the documentation for unsafeCoerce# (which should be used through its portable name, Unsafe.Coerce.unsafeCoerce).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow