123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292 |
- =============================
- NO-MMU MEMORY MAPPING SUPPORT
- =============================
- The kernel has limited support for memory mapping under no-MMU conditions, such
- as are used in uClinux environments. From the userspace point of view, memory
- mapping is made use of in conjunction with the mmap() system call, the shmat()
- call and the execve() system call. From the kernel's point of view, execve()
- mapping is actually performed by the binfmt drivers, which call back into the
- mmap() routines to do the actual work.
- Memory mapping behaviour also involves the way fork(), vfork(), clone() and
- ptrace() work. Under uClinux there is no fork(), and clone() must be supplied
- the CLONE_VM flag.
- The behaviour is similar between the MMU and no-MMU cases, but not identical;
- and it's also much more restricted in the latter case:
- (*) Anonymous mapping, MAP_PRIVATE
- In the MMU case: VM regions backed by arbitrary pages; copy-on-write
- across fork.
- In the no-MMU case: VM regions backed by arbitrary contiguous runs of
- pages.
- (*) Anonymous mapping, MAP_SHARED
- These behave very much like private mappings, except that they're
- shared across fork() or clone() without CLONE_VM in the MMU case. Since
- the no-MMU case doesn't support these, behaviour is identical to
- MAP_PRIVATE there.
- (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE
- In the MMU case: VM regions backed by pages read from file; changes to
- the underlying file are reflected in the mapping; copied across fork.
- In the no-MMU case:
- - If one exists, the kernel will re-use an existing mapping to the
- same segment of the same file if that has compatible permissions,
- even if this was created by another process.
- - If possible, the file mapping will be directly on the backing device
- if the backing device has the BDI_CAP_MAP_DIRECT capability and
- appropriate mapping protection capabilities. Ramfs, romfs, cramfs
- and mtd might all permit this.
- - If the backing device device can't or won't permit direct sharing,
- but does have the BDI_CAP_MAP_COPY capability, then a copy of the
- appropriate bit of the file will be read into a contiguous bit of
- memory and any extraneous space beyond the EOF will be cleared
- - Writes to the file do not affect the mapping; writes to the mapping
- are visible in other processes (no MMU protection), but should not
- happen.
- (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE
- In the MMU case: like the non-PROT_WRITE case, except that the pages in
- question get copied before the write actually happens. From that point
- on writes to the file underneath that page no longer get reflected into
- the mapping's backing pages. The page is then backed by swap instead.
- In the no-MMU case: works much like the non-PROT_WRITE case, except
- that a copy is always taken and never shared.
- (*) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
- In the MMU case: VM regions backed by pages read from file; changes to
- pages written back to file; writes to file reflected into pages backing
- mapping; shared across fork.
- In the no-MMU case: not supported.
- (*) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
- In the MMU case: As for ordinary regular files.
- In the no-MMU case: The filesystem providing the memory-backed file
- (such as ramfs or tmpfs) may choose to honour an open, truncate, mmap
- sequence by providing a contiguous sequence of pages to map. In that
- case, a shared-writable memory mapping will be possible. It will work
- as for the MMU case. If the filesystem does not provide any such
- support, then the mapping request will be denied.
- (*) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
- In the MMU case: As for ordinary regular files.
- In the no-MMU case: As for memory backed regular files, but the
- blockdev must be able to provide a contiguous run of pages without
- truncate being called. The ramdisk driver could do this if it allocated
- all its memory as a contiguous array upfront.
- (*) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
- In the MMU case: As for ordinary regular files.
- In the no-MMU case: The character device driver may choose to honour
- the mmap() by providing direct access to the underlying device if it
- provides memory or quasi-memory that can be accessed directly. Examples
- of such are frame buffers and flash devices. If the driver does not
- provide any such support, then the mapping request will be denied.
- ============================
- FURTHER NOTES ON NO-MMU MMAP
- ============================
- (*) A request for a private mapping of a file may return a buffer that is not
- page-aligned. This is because XIP may take place, and the data may not be
- paged aligned in the backing store.
- (*) A request for an anonymous mapping will always be page aligned. If
- possible the size of the request should be a power of two otherwise some
- of the space may be wasted as the kernel must allocate a power-of-2
- granule but will only discard the excess if appropriately configured as
- this has an effect on fragmentation.
- (*) The memory allocated by a request for an anonymous mapping will normally
- be cleared by the kernel before being returned in accordance with the
- Linux man pages (ver 2.22 or later).
- In the MMU case this can be achieved with reasonable performance as
- regions are backed by virtual pages, with the contents only being mapped
- to cleared physical pages when a write happens on that specific page
- (prior to which, the pages are effectively mapped to the global zero page
- from which reads can take place). This spreads out the time it takes to
- initialize the contents of a page - depending on the write-usage of the
- mapping.
- In the no-MMU case, however, anonymous mappings are backed by physical
- pages, and the entire map is cleared at allocation time. This can cause
- significant delays during a userspace malloc() as the C library does an
- anonymous mapping and the kernel then does a memset for the entire map.
- However, for memory that isn't required to be precleared - such as that
- returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to
- indicate to the kernel that it shouldn't bother clearing the memory before
- returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled
- to permit this, otherwise the flag will be ignored.
- uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this
- to allocate the brk and stack region.
- (*) A list of all the private copy and anonymous mappings on the system is
- visible through /proc/maps in no-MMU mode.
- (*) A list of all the mappings in use by a process is visible through
- /proc/<pid>/maps in no-MMU mode.
- (*) Supplying MAP_FIXED or a requesting a particular mapping address will
- result in an error.
- (*) Files mapped privately usually have to have a read method provided by the
- driver or filesystem so that the contents can be read into the memory
- allocated if mmap() chooses not to map the backing device directly. An
- error will result if they don't. This is most likely to be encountered
- with character device files, pipes, fifos and sockets.
- ==========================
- INTERPROCESS SHARED MEMORY
- ==========================
- Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
- mode. The former through the usual mechanism, the latter through files created
- on ramfs or tmpfs mounts.
- =======
- FUTEXES
- =======
- Futexes are supported in NOMMU mode if the arch supports them. An error will
- be given if an address passed to the futex system call lies outside the
- mappings made by a process or if the mapping in which the address lies does not
- support futexes (such as an I/O chardev mapping).
- =============
- NO-MMU MREMAP
- =============
- The mremap() function is partially supported. It may change the size of a
- mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size
- of the mapping exceeds the size of the slab object currently occupied by the
- memory to which the mapping refers, or if a smaller slab object could be used.
- MREMAP_FIXED is not supported, though it is ignored if there's no change of
- address and the object does not need to be moved.
- Shared mappings may not be moved. Shareable mappings may not be moved either,
- even if they are not currently shared.
- The mremap() function must be given an exact match for base address and size of
- a previously mapped object. It may not be used to create holes in existing
- mappings, move parts of existing mappings or resize parts of mappings. It must
- act on a complete mapping.
- [*] Not currently supported.
- ============================================
- PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT
- ============================================
- To provide shareable character device support, a driver must provide a
- file->f_op->get_unmapped_area() operation. The mmap() routines will call this
- to get a proposed address for the mapping. This may return an error if it
- doesn't wish to honour the mapping because it's too long, at a weird offset,
- under some unsupported combination of flags or whatever.
- The driver should also provide backing device information with capabilities set
- to indicate the permitted types of mapping on such devices. The default is
- assumed to be readable and writable, not executable, and only shareable
- directly (can't be copied).
- The file->f_op->mmap() operation will be called to actually inaugurate the
- mapping. It can be rejected at that point. Returning the ENOSYS error will
- cause the mapping to be copied instead if BDI_CAP_MAP_COPY is specified.
- The vm_ops->close() routine will be invoked when the last mapping on a chardev
- is removed. An existing mapping will be shared, partially or not, if possible
- without notifying the driver.
- It is permitted also for the file->f_op->get_unmapped_area() operation to
- return -ENOSYS. This will be taken to mean that this operation just doesn't
- want to handle it, despite the fact it's got an operation. For instance, it
- might try directing the call to a secondary driver which turns out not to
- implement it. Such is the case for the framebuffer driver which attempts to
- direct the call to the device-specific driver. Under such circumstances, the
- mapping request will be rejected if BDI_CAP_MAP_COPY is not specified, and a
- copy mapped otherwise.
- IMPORTANT NOTE:
- Some types of device may present a different appearance to anyone
- looking at them in certain modes. Flash chips can be like this; for
- instance if they're in programming or erase mode, you might see the
- status reflected in the mapping, instead of the data.
- In such a case, care must be taken lest userspace see a shared or a
- private mapping showing such information when the driver is busy
- controlling the device. Remember especially: private executable
- mappings may still be mapped directly off the device under some
- circumstances!
- ==============================================
- PROVIDING SHAREABLE MEMORY-BACKED FILE SUPPORT
- ==============================================
- Provision of shared mappings on memory backed files is similar to the provision
- of support for shared mapped character devices. The main difference is that the
- filesystem providing the service will probably allocate a contiguous collection
- of pages and permit mappings to be made on that.
- It is recommended that a truncate operation applied to such a file that
- increases the file size, if that file is empty, be taken as a request to gather
- enough pages to honour a mapping. This is required to support POSIX shared
- memory.
- Memory backed devices are indicated by the mapping's backing device info having
- the memory_backed flag set.
- ========================================
- PROVIDING SHAREABLE BLOCK DEVICE SUPPORT
- ========================================
- Provision of shared mappings on block device files is exactly the same as for
- character devices. If there isn't a real device underneath, then the driver
- should allocate sufficient contiguous memory to honour any supported mapping.
- =================================
- ADJUSTING PAGE TRIMMING BEHAVIOUR
- =================================
- NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages
- when performing an allocation. This can have adverse effects on memory
- fragmentation, and as such, is left configurable. The default behaviour is to
- aggressively trim allocations and discard any excess pages back in to the page
- allocator. In order to retain finer-grained control over fragmentation, this
- behaviour can either be disabled completely, or bumped up to a higher page
- watermark where trimming begins.
- Page trimming behaviour is configurable via the sysctl `vm.nr_trim_pages'.
|