Question

Does anyone know which type of CPU cache behaviour or policy (e.g. uncacheable write-combining) is assigned to memory mapped file-backed regions on modern x86 systems?

Is there any way to detect which is the case, and possibly override the default behaviour?

Windows and Linux are the main operating systems of interest.

(Editor's note: the question was previously phrased as memory mapped I/O, but that phrase has a different specific technical meaning, especially when talking about CPU caches. i.e. actual I/O devices like NICs or video cards that you talk to with loads / stores.

This question is actually about what kind of memory you get from mmap(some_fd, ...), when you don't use MAP_ANONYMOUS and it's backed by a regular file on disk.)

Était-ce utile?

La solution

TL:DR Memory mapped files use the normal Write-Back policy for pages of the pagecache that they map into the address space of your process. You have to do something special and OS-specific if you ever want pages that aren't WB.


Caching policy applied to the address space region is generally operating system independent and depends only on the type of device behind the address space page. In fact, the operating system is free to apply any caching policy to any memory region, but incorrectly assigned caching policy can reduce system performance or broke system logic at all.

There are at least four caching policies:

  1. Full caching (write-back, aka WB). Applied to the physical address space mapped to the main memory (RAM). Used to increase the performance of memory subsystem performance. The main property of such devices is that its state can be changed only by software and can affect only software.

    The memory mapped files implementation use full caching because they implemented completely by software (operating system) that read file chunk from disk and place it memory and then put this chunk (possibly modified) back to disk. Hardware updates a "dirty" bit in the page tables to let the OS figure out what needs to be synced to disk.

  2. Write-through caching. (WT) The main property of such devices is that its state can be changed only by software, but the change must have an immediate effect on the device. According to this policy, data written to the memory-mapped IO device register will be placed in two places concurrently: in the cache and in the device. But when the data read will be initiated, data will be captured from the cache without expensive access to the device.

    This cache policy could be useful for a MMIO device that doesn't write its memory, only reads what the CPU wrote. In practice it's rarely used for anything. GPUs aren't like that, and do write video memory, so it's not used for video RAM. (There's no mechanism for the GPU to invalidate CPU caches of the region, because the GPU isn't part of the CPU's cache-coherency domain)

  3. Uncacheable, write-combining (WC aka USCW): Weakly ordered memory typically used for mapping video RAM. Like uncacheable, except that NT stores let you efficiently write a whole cache line at once. movntdqa loads let you efficiently read whole cache lines, which you can't do any other way from WC regions. Normal loads fetch data separately for each load, even within the same line, because it's uncacheable.
  4. Disabled caching. (UC) Applied to the almost all IO device, because the writing to the memory-mapped IO device register must have immediate effect and read from the memory-mapped IO device register must return to the reader actual data from the device. If caching will be applied to memory-mapped IO device, then two negative effects will be introduced:
    1. The writing to the memory-mapped IO device register will be delayed until the moment when cache controller will decide to flush cache line with written data. As result, the driver won't be able to know when the command written to the device will take effect.
    2. The reading data from the memory-mapped IO device register can be cached. And subsequent data read from the same memory-mapped IO device register can return not actual data from the device, but outdated data from the cache. Due to this, it will be hard for the driver to capture the actual state of the device.

Due to the fact that the way by which software can specify caching policy is only processor dependent the same algorithm can be applied in any operating system. The simplest way is to capture the content of the CR3 register, and using it locate the Page Table Entry appropriate to the address which caching policy you want to know and check the PCD and PWT flags. But this way isn't complete because there are few other features that can affect caching (for example, caching can be completely disabled on CR0, see also MTRR, PAT).

Autres conseils

To add to ZarathustrA's existing answer: On Windows, SEC_NOCACHE turns of this caching. There's a SEC_WRITECOMBINE, but that appears broken (it only works with SEC_RESERVEor SEC_COMMIT, which means only with the page file, and you don't want to set SEC_WRITECOMBINE on that).

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top