Direct Memory Access  Used to avoid programmed I/O (one byte at a time) for large data movement  Requires DMA controller  Bypasses CPU to transfer data directly between I/O device and memory  OS writes DMA command block into memory  Source and destination addresses  Read or write mode  Count of bytes  Writes location of command block to DMA controller  Bus mastering of DMA controller – grabs bus from CPU    Cycle stealing from CPU but still much more efficient When done, interrupts to signal completion Version that is aware of virtual addresses can be even more efficient DVMA Operating System Concepts – 9th Edition 13.14 Silberschatz, Galvin and Gagne ©2013
View full slide show




Direct Memory Access • Direct Memory Access (DMA): External to the CPU Use idle bus cycles (cycle stealing) Act as a master on the bus Transfer blocks of data to or from memory without CPU intervention – Efficient for large data transfer, e.g. from disk  Cache usage allows the processor to leave enough memory bandwidth for DMA – – – – • How does DMA work?: – CPU sets up and supply device id, memory address, number of bytes – DMA controller (DMAC) starts the access and becomes bus master – For multiple byte transfer, the DMAC increment the address – DMAC interrupts the CPU upon completion 9 CPU sends a starting address, direction, and length count to DMAC. Then issues "start". CPU Memory DMAC IOC device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. For For multiple multiple bus bus system, system, each each bus bus controller controller often often contains contains DMA DMA control control logic logic
View full slide show




I/O Using DMA Memory Memory Memory-mapped I/O ROM Control Data transfer CPU CPU RAM I/O DMA     DMA DMA Controller Controller Interface Interface I/O I/O Peripheral Peripheral I/O I/O Peripheral Peripheral CPU sends device name, address, length and transfer direction to DMA controller (via memory-mapped I/O) CPU issues start command to DMA controller DMA controller provides handshake signals to I/O device & memory including addresses DMA controller interrupts processor when transfer is complete Spring 2016, arz CS555A – Real-Time Embedded Systems Stevens Institute of Technology 79
View full slide show




I/O Using DMA Memory Memory Memory-mapped I/O ROM Control Data transfer CPU CPU RAM I/O DMA     DMA DMA Controller Controller Interface Interface I/O I/O Peripheral Peripheral I/O I/O Peripheral Peripheral CPU sends device name, address, length and transfer direction to DMA controller (via memory-mapped I/O) CPU issues start command to DMA controller DMA controller provides handshake signals to I/O device & memory including addresses DMA controller interrupts processor when transfer is complete Spring 2015, arz CS555A – Real-Time Embedded Systems Stevens Institute of Technology 15
View full slide show




DMA Problems  With virtual memory systems: (pages would have physical and virtual addresses)  Physical pages re-mapping to different virtual pages during DMA operations  Multi-page DMA cannot assume consecutive addresses Solutions:  Allow virtual addressing based DMA  Add translation logic to DMA controller  OS allocated virtual pages to DMA prevent re-mapping until DMA completes  Partitioned DMA  Break DMA transfer into multi-DMA operations, each is single page  OS chains the pages for the requester  In cache-based systems: (there can be two copies of data items)  Processor might not know that the cache and memory pages are different  Write-back caches can overwrite I/O data or makes DMA to read wrong data Solutions:  Route I/O activities through the cache  Not efficient since I/O data usually is not demonstrating temporal locality  OS selectively invalidates cache blocks before I/O read or force write-back prior to I/O write  Usually called cache flushing and requires hardware support DMA DMA allows allows another another path path to to main main memory memory with with no no cache cache and and address address translation translation 10
View full slide show




Direct Memory Access  Used to avoid programmed I/O for large data movement  Requires DMA controller  Bypasses CPU to transfer data directly between I/O device and memory Operating System Concepts with Java – 8th Edition 12.32 Silberschatz, Galvin and Gagne ©2009
View full slide show




Direct Memory Access  Used to avoid programmed I/O for large data movement  Requires DMA controller  Bypasses CPU to transfer data directly between I/O device and memory Operating System Concepts – 8th Edition 13.11 Silberschatz, Galvin and Gagne ©2009
View full slide show




DMA Transfer Cycle Times     DMA requires 1 or 2 MCLK cc to synchronize before each single transfer or complete block or burst-block transfer Each byte/word transfer requires 2 MCLK after synchronization, and one cycle of wait time after the transfer DMA cycle time is dependent on the MSP430 operating mode and clock system setup (use MCLK)   CPE 323 If the MCLK source is active, but the CPU is off, the DMA controller will use the MCLK source for each transfer, without re-enabling the CPU. If the MCLK source is off, the DMA controller will temporarily restart MCLK, sourced with DCOCLK, for the single transfer or complete block or burst-block transfer The CPU remains off, and after the transfer completes, MCLK is turned off. 11
View full slide show




Direct Memory Access I/O (DMA) CPU overhead is high in fast devices. DMA reduces the CPU overhead in initiating and monitoring individual data transfer between device and main memory. Controller opcode register 2 operand registers Control Logic Device busy register 1 busy status register 3 5 6 CPU data buffer 4 1 CPU writes operands required for input in operand registers 2 CPU writes opcode for input operation; controller executes; flag set to busy 3 Data transferred from device to data buffer 4 Controller copies data between main memory and data buffer; (repeated) 5 After operation completes, controller resets busy flag to 0 and sends interrupt to CPU 6 CPU reads status register to check for successful operation Main Memory
View full slide show




CS 286 Computer Organization and Architecture (B) By DMA controller on a device controller board (“On-Card DMA”) (from MEMORY to an I/O device)  CPU sends a DMA command (for data Tx to I/O device) to the DMA controller on the device controller for the NIC  The DMA controller reads one byte from memory  The DMA controller sends the one byte data to the I/O device  The above two steps ( and ) are repeated as many bytes as in a packet  Once all the bytes in a packet are copied to the buffer, OS can issue a command “Disk Write” I/O_1/021
View full slide show




Computer-System Operation  I/O devices and the CPU can execute concurrently  Each device controller is in charge of a particular device type  Each device controller has a local buffer  CPU moves data from/to main memory to/from local buffers  I/O is from the device to local buffer of controller  Device controller informs CPU that it has finished its operation by causing an interrupt Operating System Concepts with Java – 8th Edition 1.9 Silberschatz, Galvin and Gagne ©2009
View full slide show




Issue Read command to I/O module Read status of I/O module Not ready CPU I/O Check status Issue Read command to I/O module I/O Read status of I/O module CPU I/O Check status Error condition Ready No CPU I/O Do something else Issue Read block command to I/O module Interrupt CPU Read status of DMA module Error condition I/O Write word into memory CPU CPU memory No Done? Yes Read word from I/O Module I/O Write word into memory CPU memory Done? Yes Next instruction (a) Programmed I/O Next instruction (b) Interrupt-driven I/O Figure7.4 ThreeTechniques for Input of a Block of Data © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. DMA (c) Direct memory access CPU DMA Do something else Interrupt Next instruction Ready Read word from I/O Module CPU CPU
View full slide show




HDFS (Hadoop Distributed File System) is a distr file sys for commodity hdwr. Differences from other distr file sys are few but significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides hi thruput access to app data and is suitable for apps that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS originally was infrastructure for Apache Nutch web search engine project, is part of Apache Hadoop Core http://hadoop.apache.org/core/ 2.1. Hardware Failure Hardware failure is the normal. An HDFS may consist of hundreds or thousands of server machines, each storing part of the file system’s data. There are many components and each component has a non-trivial prob of failure means that some component of HDFS is always non-functional. Detection of faults and quick, automatic recovery from them is core arch goal of HDFS. 2.2. Streaming Data Access Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates. 2.3. Large Data Sets Apps on HDFS have large data sets, typically gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It provides high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It supports ~10 million files in a single instance. 2.4. Simple Coherency Model: HDFS apps need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A Map/Reduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in future [write once read many at file level] 2.5. “Moving Computation is Cheaper than Moving Data” A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the app is running. HDFS provides interfaces for applications to move themselves closer to where the data is located. 2.6. Portability Across Heterogeneous Hardware and Software Platforms: HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications. 3. NameNode and DataNodes: HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is 1 blocks stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction The NameNode and DataNode are pieces of software designed to run on commodity machines, typically run GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case. The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode. 4. The File System Namespace: HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features. The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This info is stored by NameNode. 5. Data Replication: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode
View full slide show




CS 286 Computer Organization and Architecture (A) By DMA controller on the system-board (“Centralized DMA”) (from MEMORY to an I/O device)  CPU sends a DMA command (for data Tx to I/O device) to the DMA controller - Destination device address (I/O port #) - Number of bytes to be transferred - Beginning memory address  The DMA controller reads one byte from memory  The DMA controller sends the one byte data to the I/O device  The above two steps ( and ) are repeated as many bytes as in a packet  Once all the bytes in a packet are copied to the buffer, OS can issue a command “Disk Write” I/O_1/019
View full slide show




C:\UMBC\331\java> java.ext.dirs=C:\JDK1.2\JRE\lib\ext java.io.tmpdir=C:\WINDOWS\TEMP\ os.name=Windows 95 java.vendor=Sun Microsystems Inc. java.awt.printerjob=sun.awt.windows.WPrinterJob java.library.path=C:\JDK1.2\BIN;.;C:\WINDOWS\SYSTEM;C:\... java.vm.specification.vendor=Sun Microsystems Inc. sun.io.unicode.encoding=UnicodeLittle file.encoding=Cp1252 java.specification.vendor=Sun Microsystems Inc. user.language=en user.name=nicholas java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport... java.vm.name=Classic VM java.class.version=46.0 java.vm.specification.name=Java Virtual Machine Specification sun.boot.library.path=C:\JDK1.2\JRE\bin os.version=4.10 java.vm.version=1.2 java.vm.info=build JDK-1.2-V, native threads, symcjit java.compiler=symcjit path.separator=; file.separator=\ user.dir=C:\UMBC\331\java sun.boot.class.path=C:\JDK1.2\JRE\lib\rt.jar;C:\JDK1.2\JR... user.name=nicholas user.home=C:\WINDOWS C:\UMBC\331\java>java envSnoop -- listing properties -java.specification.name=Java Platform API Specification awt.toolkit=sun.awt.windows.WToolkit java.version=1.2 java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment user.timezone=America/New_York java.specification.version=1.2 java.vm.vendor=Sun Microsystems Inc. user.home=C:\WINDOWS java.vm.specification.version=1.0 os.arch=x86 java.awt.fonts= java.vendor.url=http://java.sun.com/ user.region=US file.encoding.pkg=sun.io java.home=C:\JDK1.2\JRE java.class.path=C:\Program Files\PhotoDeluxe 2.0\Adob... line.separator=
View full slide show




Computer-System Operation  I/O devices and the CPU can execute concurrently  Each device controller is in charge of a particular device type  Each device controller has a local buffer  CPU moves data from/to main memory to/from local buffers  I/O is from the device to local buffer of controller  Device controller informs CPU that it has finished its operation by causing an interrupt Operating System Concepts – 8th Edition 1.11 Silberschatz, Galvin and Gagne ©2009
View full slide show




Diploma Diploma VOCATIONAL/DIPLOMAPath VOCATIONAL/DIPLOMAPath DMA DMA 010-030 010-030 MAT MAT 110 110 Vocational Vocational AAS AAS Example Example Programs Programs -Construction Tech Construction Tech Computer Tech Computer Tech Electronics Electronics Tech Tech TECHPath TECHPath DMA DMA 010-050 010-050 (STEM) (STEM) MAT MAT 121 121 ** Example Example Programs Programs Engineering Engineering Tech Tech Architectural Architectural Tech Tech MAT MAT 122 122 MAT MAT 223 223 Advisement Advisement and and Assessment Assessment MAT MAT 172 172 ** CALCULUSPath CALCULUSPath DMA DMA 010-050 010-050 DMA 060-080 DMA 060-080 (STEM) (STEM) ** MAT MAT 171 171 AS AS Degree Degree (Some (Some requiring requiring higher higher math math courses) courses) Engineering, Engineering, Science, Science, Mathematics Mathematics MAT MAT 272, 272, 273, 273, 285 285 ** MAT MAT 263 263 OR OR Statistics Statistics (( MAT MAT 152) 152) Statistics Statistics (MAT (MAT 152) 152) QUANTPath QUANTPath DMA DMA 010-050 010-050 MAT MAT 271 271 AA-AS AA-AS Degrees Degrees Business Business Transfer Transfer AA AA Degree Degree Behavioral Behavioral & & Social Social Science Science Communication Communication AAS Example Programs AAS Example Programs Health Health Sciences? Sciences? Public Public Service? Service? QUANT QUANT Lit Lit ** (MAT (MAT 143) 143) AA AA Degree Degree AAS AAS Example Example Programs Programs Health Health Science Science Education Education Pubic Pubic Service Service
View full slide show




Parallel ATA or EIDE Drive Standards • Transferring data between hard drive and memory – Direct memory access (DMA) transfer mode • Transfers data directly from drive to memory without involving the CPU • Seven DMA modes – Programmed Input/Output (PIO) transfer mode • Involves the CPU, slower than DMA mode • Five PIO modes used by hard drives – Ultra DMA • Data transferred twice for each clock beat, at the beginning and again at the end A+ Guide to Managing & Maintaining Your PC, 8th Edition © Cengage Learning 2014 15
View full slide show




DMA Controller Introduction  Direct memory access (DMA) controller transfers data from one address to another without CPU intervention, across the entire address range.     Move data from the ADC12 conversion memory to RAM Move data from RAM to DAC12 Devices that contain a DMA controller may have one, two, or three DMA channels available Using the DMA controller   Can increase the throughput of peripheral modules Can reduce system power consumption by allowing the CPU to remain in a low-power mode without having to awaken to move data to or from a peripheral CPE 323 3
View full slide show