Overview of Mass Storage Structure (Cont)  Magnetic tape  Was early secondary-storage medium  Relatively permanent and holds large quantities of data  Access time slow  Random access ~1000 times slower than disk  Mainly used for backup, storage of infrequently-used data, transfer medium between systems  Kept in spool and wound or rewound past read-write head  Once data under head, transfer rates comparable to disk  20-200GB typical storage Operating System Concepts with Java – 8th Edition 12.6 Silberschatz, Galvin and Gagne ©2009
View full slide show




Magnetic Tape  Was early secondary-storage medium  Evolved from open spools to cartridges  Relatively permanent and holds large quantities of data  Access time slow  Random access ~1000 times slower than disk  Mainly used for backup, storage of infrequently-used data, transfer medium between systems  Kept in spool and wound or rewound past read-write head  Once data under head, transfer rates comparable to disk  140MB/sec and greater  200GB to 1.5TB typical storage  Common technologies are LTO-{3,4,5} and T10000 Operating System Concepts Essentials – 8 th Edition 11.9 Silberschatz, Galvin and Gagne ©2011
View full slide show




Magnetic Tape  Was early secondary-storage medium  Evolved from open spools to cartridges  Relatively permanent and holds large quantities of data  Access time slow  Random access ~1000 times slower than disk  Mainly used for backup, storage of infrequently-used data, transfer medium between systems  Kept in spool and wound or rewound past read-write head  Once data under head, transfer rates comparable to disk  140MB/sec and greater  200GB to 1.5TB typical storage  Common technologies are LTO-{3,4,5} and T10000 Operating System Concepts Essentials – 2nd Edition 9.10 Silberschatz, Galvin and Gagne ©2013
View full slide show




Producer Consumer Synchronized Circular Buffer Produced 1 into cell 0 write 1 read 0 buffer: Produced 2 into cell 1 write 2 read 0 buffer: Consumed 1 from cell 0 write 2 read 1 buffer: Produced 3 into cell 2 write 3 read 1 buffer: Produced 4 into cell 3 write 4 read 1 buffer: Produced 5 into cell 4 write 0 read 1 buffer: Produced 6 into cell 0 write 1 read 1 buffer: BUFFER FULL WAITING TO PRODUCE 7 Consumed 2 from cell 1 write 1 read 2 buffer: Produced 7 into cell 1 write 2 read 2 buffer: BUFFER FULL WAITING TO PRODUCE 8 Consumed 3 from cell 2 write 2 read 3 buffer: Produced 8 into cell 2 write 3 read 3 buffer: BUFFER FULL WAITING TO PRODUCE 9 Consumed 4 from cell 3 write 3 read 4 buffer: Produced 9 into cell 3 write 4 read 4 buffer: BUFFER FULL WAITING TO PRODUCE 10 Consumed 5 from cell 4 write 4 read 0 buffer: Produced 10 into cell 4 write 0 read 0 buffer: BUFFER FULL ProduceInteger finished producing values Terminating ProduceInteger 1 -1 -1 -1 -1 1 2 -1 -1 -1 1 2 -1 -1 -1 1 2 3 -1 -1 1 2 3 4 -1 1 2 3 4 5 6 2 3 4 5 Consumed 6 from cell 0 write 0 read 1 buffer: Consumed 7 from cell 1 write 0 read 2 buffer: Consumed 8 from cell 2 write 0 read 3 buffer: Consumed 9 from cell 3 write 0 read 4 buffer: Consumed 10 from cell 4 write 0 read 0 buffer: BUFFER EMPTY ConsumeInteger retrieved values totaling: 55 Terminating ConsumeInteger 6 6 6 6 6 6 2 3 4 5 6 7 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 5 6 7 8 9 10 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 Ref: http://userhome.brooklyn.cuny.edu/irudowdky/OperatingSystems.htm & Silberschatz, Gagne, & Galvin, Operating Systems Concepts, 7th ed, Wiley (ch 1-3)
View full slide show




Tapes  Compared to a disk, a tape is less expensive and holds more data, but random access is much slower.  Tape is an economical medium for purposes that do not require fast random access, e.g., backup copies of disk data, holding huge volumes of data.  Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library   stacker – library that holds a few tapes  silo – library that holds thousands of tapes A disk-resident file can be archived to tape for low cost storage; the computer can stage it back into disk storage for active use. Operating System Concepts Essentials – 8 th Edition 11.44 Silberschatz, Galvin and Gagne ©2011
View full slide show




HDFS (Hadoop Distributed File System) is a distr file sys for commodity hdwr. Differences from other distr file sys are few but significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides hi thruput access to app data and is suitable for apps that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS originally was infrastructure for Apache Nutch web search engine project, is part of Apache Hadoop Core http://hadoop.apache.org/core/ 2.1. Hardware Failure Hardware failure is the normal. An HDFS may consist of hundreds or thousands of server machines, each storing part of the file system’s data. There are many components and each component has a non-trivial prob of failure means that some component of HDFS is always non-functional. Detection of faults and quick, automatic recovery from them is core arch goal of HDFS. 2.2. Streaming Data Access Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates. 2.3. Large Data Sets Apps on HDFS have large data sets, typically gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It provides high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It supports ~10 million files in a single instance. 2.4. Simple Coherency Model: HDFS apps need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A Map/Reduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in future [write once read many at file level] 2.5. “Moving Computation is Cheaper than Moving Data” A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the app is running. HDFS provides interfaces for applications to move themselves closer to where the data is located. 2.6. Portability Across Heterogeneous Hardware and Software Platforms: HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications. 3. NameNode and DataNodes: HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is 1 blocks stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction The NameNode and DataNode are pieces of software designed to run on commodity machines, typically run GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case. The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode. 4. The File System Namespace: HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features. The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This info is stored by NameNode. 5. Data Replication: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode
View full slide show




We have laid out (somewhat arbitrarily) 16 different combinations of order quantities for the two products (B2:Q3). Each of the columns from B to Q represents a model of one order quantity strategy. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 A Strategy Product 1 ordered Product 2 ordered Demand 1 Demand 2 1 sold full price 2 sold full price 1 sold at refund price 2 sold at refund price Full-price revenue Refund revenue Order cost Profit $ $ $ $ Unit price Unit cost Unit refund value Product 1 Product 2 $10.00 $10.00 $7.50 $7.50 $2.50 $2.50 Product 1 Product 2 B 1 700 900 1000 1200 700 900 0 0 16,000 12,000 4,000 C $ $ $ $ D E F 2 3 4 5 700 700 700 800 1000 1100 1200 900 1000 1000 1000 1000 1200 1200 1200 1200 =MIN(B2,B4) 700 700 700 800 =MIN(B3,B5) 1000 1100 1200 900 =MAX(0,B2-B4) 0 0 0 0 =MAX(0,B3-B5) 0 0 0 0 =SUMPRODUCT(B6:B7,$B21:$B22) 17,000 $ 18,000 $ 19,000 $ 17,000 =SUMPRODUCT(B8:B9,$D21:$D22) $ $ $ =SUMPRODUCT(B2:B3,$C21:$C22) 12,750 $ 13,500 $ 14,250 $ 12,750 =B10+B11-B12 4,250 $ 4,500 $ 4,750 $ 4,250 Means Stdevs 1000 250 G H I J K L M N O P Q 6 800 1000 1000 1200 800 1000 0 0 $ 18,000 $ $ 13,500 $ 4,500 7 800 1100 1000 1200 800 1100 0 0 $ 19,000 $ $ 14,250 $ 4,750 8 800 1200 1000 1200 800 1200 0 0 $ 20,000 $ $ 15,000 $ 5,000 9 900 900 1000 1200 900 900 0 0 $ 18,000 $ $ 13,500 $ 4,500 10 900 1000 1000 1200 900 1000 0 0 $ 19,000 $ $ 14,250 $ 4,750 11 900 1100 1000 1200 900 1100 0 0 $ 20,000 $ $ 15,000 $ 5,000 12 900 1200 1000 1200 900 1200 0 0 $ 21,000 $ $ 15,750 $ 5,250 13 1000 900 1000 1200 1000 900 0 0 $ 19,000 $ $ 14,250 $ 4,750 14 1000 1000 1000 1200 1000 1000 0 0 $ 20,000 $ $ 15,000 $ 5,000 15 1000 1100 1000 1200 1000 1100 0 0 $ 21,000 $ $ 15,750 $ 5,250 16 1000 1200 1000 1200 1000 1200 0 0 $ 22,000 $ $ 16,500 $ 5,500 1200 350 Corr -0.3 Price Cost Refund $10.00 $7.50 $2.50 $10.00 $7.50 $2.50 Decision Models -- Prof. Juran 27
View full slide show




Speed (Cont.)  Access latency – amount of time needed to locate data  Access time for a disk – move the arm to the selected cylinder and wait for the rotational latency; < 35 milliseconds  Access on tape requires winding the tape reels until the selected block reaches the tape head; tens or hundreds of seconds  Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk  The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives  A removable library is best devoted to the storage of infrequently used data, because the library can only satisfy a relatively small number of I/O requests per hour Operating System Concepts Essentials – 8 th Edition 11.51 Silberschatz, Galvin and Gagne ©2011
View full slide show




Magnetic Tape • • • • • • Early secondary-storage medium of choice Persistent, inexpensive, and has large data capacity Very slow access due to sequential nature Used for backup and for storing infrequently-used data Kept on spools Transfer rates comparable to disk if read write head is positioned to the data • 20-200GB are typical storage capacities
View full slide show




Overview of Mass Storage Structure  Magnetic disks provide bulk of secondary storage of modern computers  Drives rotate at 60 to 200 times per second  Transfer rate is rate at which data flow between drive and computer  Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)  Head crash results from disk head making contact with the disk surface  That’s bad  Disks can be removable  Drive attached to computer via I/O bus  Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI  Host controller in computer uses bus to talk to disk controller built into drive or storage array Operating System Concepts with Java – 8th Edition 12.4 Silberschatz, Galvin and Gagne ©2009
View full slide show




Not All 20 Point Fonts Are Equal 20  A - Can You Read B - Can You Read C - Can You Read D - Can You Read E - Can You Read F - Can You Read G - Can You Read H - Can You Read I - Can You Read 16  J - Can You Read K - Can You Read L - Can You Read M - Can You Read N - Can You Read O - Can You Read P - Can You Read Q - Can You Read R - Can You Read 14  J - Can You Read K - Can You Read L - Can You Read M - Can You Read O - Can You Read P - Can You Read Q - Can You Read R - Can You Read 12  J - Can You Read K - Can You Read L - Can You Read M - Can You Read N - Can You Read O - Can You Read P - Can You Read Q - Can You Read R - Can You Read My Students Tell Me That They Like The Readability Of Ariel Font I never use fonts smaller than 20 point for lecture.
View full slide show




Overview of Mass Storage Structure  Magnetic disks provide bulk of secondary storage of modern computers  Drives rotate at 60 to 250 times per second  Transfer rate is rate at which data flow between drive and computer  Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)  Head crash results from disk head making contact with the disk surface  That’s bad  Disks can be removable  Drive attached to computer via I/O bus  Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI, SAS, Firewire  Host controller in computer uses bus to talk to disk controller built into drive or storage array Operating System Concepts Essentials – 8 th Edition 11.4 Silberschatz, Galvin and Gagne ©2011
View full slide show




Overview of Mass Storage Structure  Magnetic disks provide bulk of secondary storage of modern computers  Drives rotate at 60 to 250 times per second  Transfer rate is rate at which data flow between drive and computer  Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)  Head crash results from disk head making contact with the disk surface -- That’s bad  Disks can be removable  Drive attached to computer via I/O bus  Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI, SAS, Firewire  Host controller in computer uses bus to talk to disk controller built into drive or storage array Operating System Concepts Essentials – 2nd Edition 9.4 Silberschatz, Galvin and Gagne ©2013
View full slide show