What Is Direct Memory Access (DMA) and Why Should We Know About It?
What Is Direct Memory Access (DMA) and Why Should We Know About It?
What Is Direct Memory Access (DMA) and Why Should We Know About It?
John Franco
Electrical Engineering and Computing Systems
University of Cincinnati
Introduction
What is Direct Memory Access?
Hardware mechanism that allows peripheral components
to transfer their I/O data directly to and from main memory
without the need to involve the system processor.
DMA Programming
The programming of a device's DMA controller is hardware
specific. Normally, the OS needs to have the local
device address, the physical memory address on the PC,
and the size of the memory block to transfer.
Then the register that initiates the transfer is set.
Introduction
Introduction
Elaboration
DMA transfers overcome the problem of occupying the
CPU for the entire time it's performing a transfer.
The CPU initiates the transfer, then it executes other ops
while the transfer is in progress, finally it receives an
interrupt from the DMA controller when the transfer is done
Hardware using DMA: disk drives, graphics cards,
network cards, sound cards
DMA can lead to cache coherency problems
If a CPU has a cache and external memory, then the data
the DMA controller has access to (stored in RAM) may
not be updated with the correct data stored in the cache.
Introduction
Cache Coherency Solutions
Cache-coherent systems:
external writes are signaled to the cache controller which
performs a cache invalidation for incoming DMA transfers
or cache flush for outgoing DMA transfers (done by
hardware)
Non-coherent systems:
OS ensures that the cache lines are flushed before an
outgoing DMA transfer is started and invalidated before
a memory range affected by an incoming DMA transfer
is accessed. The OS makes sure that the memory
range is not accessed by any running threads in the
meantime.
DMA Transfer
Overview (input example)
• software: asks for data (e.g. read called)
1. Device driver allocates a DMA buffer, sends signal to
device indicating where to send the data, sleeps
2. Device writes data to DMA buffer, raises interrupt when
finished
3. Interrupt handler gets data from DMA buffer,
acknowledges interrupt, awakens software to
process the data
DMA Transfer
Overview (input example)
• hardware: asynchronously pushes data to the system
May push data even if no process is listening!
1. hardware raises an interrupt to announce that new data
has arrived
2. interrupt handler allocates a buffer, tells the hardware
where to transfer the data
3. device writes the data to the buffer, raises another
interrupt when transfer is done
4. interrupt handler dispatches the new data, awakens
any relevant process, and takes care of housekeeping
DMA Transfer
Overview (input example – network transfers)
• hardware: must deal with continuous data flow
Use a circular (ring) buffer in memory shared by device
and the processor
1. incoming packet placed in next available buffer
In the ring
2. interrupt is raised by the device
3. device driver sends packet to kernel code that
will process it
4. device driver inserts a new buffer into the ring
(note: buffer allocation occurs at initialization so
the insertion of a buffer is just the selection of
an already allocated buffer for the ring)
DMA Transfer
Overview (input example – network transfers)
• pseudo code for read using circular buffer
Err_Type Procedure read(data_type data) {
if (empty_flag) return (ERR_BUF_EMPTY);
else {
memory(read_ptr) = data;
full_flag = false;
read_ptr = (((read_ptrB)+1)mod N)+B;
if (read_ptr == write_ptr) empty_flag = true;
return(ErrOK);
}
}
DMA Transfer
Graphically:
Allocating a DMA Buffer
Allocation Buffers:
contiguous buffer: contiguous block of memory allocated
scatter/gather: allocated buffer can be fragmented in the
physical memory and does not need to be allocated
contiguously. The allocated physical memory blocks
are mapped to a contiguous buffer in the calling
process's virtual address space, thus enabling easy
access to the allocated physical memory blocks.
bounce buffer: to allow devices with limited addressing
to access all of virtual address space. A bounce buffer
resides in memory low enough for a device to copy
from and write data to. It is then copied to the desired
user page in high memory
Allocating a DMA Buffer
Considerations PCI:
• Must occupy contiguous pages in physical memory
due to PCI bus requirement using physical addresses
• Note: other buses (e.g. Sbus) use virtual memory
addresses for the transfer
• The right kind of memory must be allocated since not all
memory zones are suitable: high memory may not work
with some devices because of limited address space
• Memory can be allocated at boot or runtime but drivers
can only allocate at runtime
Allocating a DMA Buffer
How to Allocate (low level):
• Use mem=xxx to reserve memory beginning at xxx
- done at boot
Then use
dmabuf = ioremap(0xFF00000, 0x100000);
to create the buffer in the driver
• Allocate from the DMA zone (16 MB)
struct page *buf;
buf = alloc_pages(__GFP_DMA, 3);
where 3 is the order and means (1 << 3) pages (from
the buddy system allocator)
• Use scatter/gather I/O if the device supports it
Allocating a DMA Buffer
How to Allocate (generic DMA layer):
• DMA based hardware uses bus addresses
- these do not match with PCI physical addresses
• Can convert at low-level but then risk inconsistency
across systems
• Also, different systems have different expectations of
cache coherency – hence allocate at generic layer
• Device may have DMA limitations – 32 bit addressing
is assumed but may be changed with
int dma_set_mask(struct device *dev, u64 mask);
(e.g. for 24 bit addressing: mask = 0xFFFFFF)
Allocating a DMA Buffer
Creating a buffer and a bus address for the device:
• If the device has an IOMMU a set of mapping registers
is provided
• A bounce buffer may be necessary – if a driver tries to
perform DMA on an address that is not reachable by dev
• Cache coherency:
copies of recently accessed memory areas are in cache
if device writes to memory, cache area is invalidated
so it will have to be paged in
If device reads data from memory, cache flushed out first
• Generic DMA layer ensures all of above are not a problem
over many architectures provided some rules are obeyed
Allocating a DMA Buffer
Rules for generic DMA layer – coherent mapping:
• Data type dma_addr_t represents a bus address
It must not be manipulated by the driver – it can be
passed to the device
• Set up the mapping (and buffer) with this:
void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, int);