24 Functions are not Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods are are are are are are are are are are are are are are are are are are are are are are are are not not not not not not not not not not not not not not not not not not not not not not not not Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods are are are are are are are are are are are are are are are are are are are are are are are are not not not not not not not not not not not not not not not not not not not not not not not not Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods F are are are are are are are are are are are are are are are are are are are are are are are are not not not not not not not not not not not not not not not not not not not not not not not not Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods Methods are are are are are are are are are are are are are are are are are are are are are are are are not not not not not not not not not not not not not not not not not not not not not not not not Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions Functions
View full slide show




Time Barbie Tootle Hayes Cape Cartoon Room I Cartoon Room II Suzanne M. Scharer Rosa M. Ailabouni Monday Aug 5th, 2013 10:10-11:50 A1L-A Analog Circuits I Chr: Ming Gu, Shantanu Chakrabartty Track: Analog and Mixed Signal Integrated Circuits A1L-B Low Power Digital Circuit Design Techniques A1L-C Chr: Joanne Degroat Student Contest I Track: Digital Integrated Chr: Mohammed Ismail Circuits, SoC and NoC Track: INVITED ONLY A1L-D Design and Analysis for Power Systems and Power Electronics Chr: Hoi Lee, Ayman Fayed Track: Power Systems and Power Electronics A1L-E Design and Analysis of Linear and Non-Linear Systems Chr: Samuel Palermo Track: Linear and Non-linear Circuits and Systems A1L-F Emerging Technologues Chr: Khaled Salama Track: Emerging Technologies Monday Aug 5th, 2013 13:10-14:50 A2L-A Analog Circuits II Chr: Ming Gu, Shantanu Chakrabartty Track: Analog and Mixed Signal Integrated Circuits A2L-B Low Power VLSI Design Methodology Chr: Genevieve Sapijaszko Track: Digital Integrated Circuits, SoC and NoC A2L-C Student Contest II Chr: Sleiman Bou-Sleiman Track: INVITED ONLY A2L-D Power Management and Energy Harvesting Chr: Ayman Fayed, Hoi Lee Track: Power Management and Energy Harvesting A2L-E Oscillators and Chaotic Systems Chr: Samuel Palermo, Warsame Ali Track: Linear and Non-linear Circuits and Systems A2L-F Bioengineering Systems Chr: Khaled Salama Track: Bioengineering Systems and Bio Chips A4L-A Analog Design Techniques I Chr: Dong Ha Track: Analog and Mixed Signal Integrated Circuits A4L-B Imaging and Wireless Sensors Chr: Igor Filanovsky Track: Analog and Mixed Signal Integrated Circuits A4L-C Special Session: Characterization of Nano Materials and Circuits Chr: Nayla El-Kork Track: SPECIAL SESSION A4L-D Special Session: Power Management and Energy Harvesting Chr: Paul Furth Track: SPECIAL SESSION A4L-E Communication and Signal Processing Circuits Chr: Samuel Palermo Track: Linear and Non-linear Circuits and Systems A4L-F Sensing and Measurement of Biological Signals Chr: Hoda Abdel-Aty-Zohdy Track: Bioengineering Systems and Bio Chips B2L-A Analog Design Techniques II Chr: Valencia Koomson Track: Analog and Mixed Signal Integrated Circuits B2L-B VLSI Design Reliability Chr: Shantanu Chakrabartty, Gursharan Reehal Track: Digital Integrated Circuits, SoC and NoC B2L-D B2L-C Special Session: University and Delta-Sigma Modulators Industry Training in the Art of Chr: Vishal Saxena Electronics Track: Analog and Mixed Signal Chr: Steven Bibyk Integrated Circuits Track: SPECIAL SESSION B2L-E Radio Frequency Integrated Circuits Chr: Nathan Neihart, Mona Hella Track: RFICs, Microwave, and Optical Systems B2L-F Bio-inspired Green Technologies Chr: Hoda Abdel-Aty-Zohdy Track: Bio-inspired Green Technologies B3L-A Analog Design Techniques III Chr: Valencia Koomson Track: Analog and Mixed Signal Integrated Circuits B3L-B VLSI Design, Routing, and Testing Chr: Nader Rafla Track: Programmable Logic, VLSI, CAD and Layout B3L-C Special Session: High-Precision and High-Speed Data Converters I Chr: Samuel Palermo Track: SPECIAL SESSION B3L-D B3L-E Special Session: Advancing the RF/Optical Devices and Circuits Frontiers of Solar Energy Chr: Mona Hella, Nathan Neihart Chr: Michael Soderstrand Track: RFICs, Microwave, and Track: SPECIAL SESSION Optical Systems B5L-A Nyquist-Rate Data Converters Chr: Vishal Saxena Track: Analog and Mixed Signal Integrated Circuits B5L-B Digital Circuits Chr: Nader Rafla Track: Programmable Logic, VLSI, CAD and Layout B5L-C Special Session: High-Precision and High-Speed Data Converters II Chr: Samuel Palermo Track: SPECIAL SESSION B5L-D Special Session: RF-FPGA Circuits and Systems for Enhancing Access to Radio Spectrum (CAS-EARS) Chr: Arjuna Madanayake, Vijay Devabhaktuni Track: SPECIAL SESSION B5L-E B5L-F Analog and RF Circuit Memristors, DG-MOSFETS and Techniques Graphine FETs Chr: Igor Filanovsky Chr: Reyad El-Khazali Track: Analog and Mixed Signal Track: Nanoelectronics and Integrated Circuits Nanotechnology C2L-A Phase Locked Loops Chr: Chung-Chih Hung Track: Analog and Mixed Signal Integrated Circuits C2L-B Computer Arithmetic and Cryptography Chr: George Purdy Track: Programmable Logic, VLSI, CAD and Layout C2L-C Special Session: Reversible Computing Chr: Himanshu Thapliyal Track: SPECIAL SESSION C2L-D Special Session: Self-healing and Self-Adaptive Circuits and Systems Chr: Abhilash Goyal, Abhijit Chatterjee Track: SPECIAL SESSION C2L-E Digital Signal Processing-Media and Control Chr: Wasfy Mikhael, Steven Bibyk Track: Digital Signal Processing C2L-F Advances in Communications and Wireless Systems Chr: Sami Muhaidat Track: Communication and Wireless Systems C3L-A SAR Analog-to-Digital Converters Chr: Vishal Saxena Track: Analog and Mixed Signal Integrated Circuits C3L-B Real Time Systems Chr: Brian Dupaix, Abhilash Goyal Track: System Architectures C3L-C Image Processing and Interpretation Chr: Annajirao Garimella Track: Image Processing and Multimedia Systems C3L-D Special Session: Verification and Trusted Mixed Signal Electronics Development Chr: Greg Creech, Steven Bibyk Track: SPECIAL SESSION C3L-E Digital Signal Processing I Chr: Ying Liu Track: Digital Signal Processing C3L-F Wireless Systems I Chr: Sami Muhaidat Track: Communication and Wireless Systems C5L-A Wireless Systems II Chr: Sami Muhaidat Track: Communication and Wireless Systems C5L-B System Architectures Chr: Swarup Bhunia, Abhilash Goyal Track: System Architectures C5L-C Image Embedding Compression and Analysis Chr: Annajirao Garimella Track: Image Processing and Multimedia Systems C5L-D Low Power Datapath Design Chr: Wasfy Mikhael Track: Digital Integrated Circuits, SoC and NoC C5L-E Digital Signal Processing II Chr: Moataz AbdelWahab Track: Digital Signal Processing C5L-F Advances in Control Systems, Mechatronics, and Robotics Chr: Charna Parkey, Genevieve Sapijaszko Track: Control Systems, Mechatronics, and Robotics Monday Aug 5th, 2013 16:00-17:40 Tuesday Aug 6th, 2013 10:10-11:50 Tuesday Aug 6th, 2013 13:10-14:50 Tuesday Aug 6th, 2013 16:00-17:40 Wednesday Aug 7th, 2013 10:10-11:50 Wednesday Aug 7th, 2013 13:10-14:50 Wednesday Aug 7th, 2013 16:00-17:40 B3L-F Carbon Nanotube-based Sensors and Beyond Chr: Nayla El-Kork Track: Nanoelectronics and Nanotechnology 5
View full slide show




Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks on Blocks! Dynamically Rearranging Synteny Blocks in Comparative Genomes Nick Egan’s Final Project Presentation for BIO 131 Intro to Computational Biology Taught by Anna Ritz
View full slide show




HDFS (Hadoop Distributed File System) is a distr file sys for commodity hdwr. Differences from other distr file sys are few but significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides hi thruput access to app data and is suitable for apps that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS originally was infrastructure for Apache Nutch web search engine project, is part of Apache Hadoop Core http://hadoop.apache.org/core/ 2.1. Hardware Failure Hardware failure is the normal. An HDFS may consist of hundreds or thousands of server machines, each storing part of the file system’s data. There are many components and each component has a non-trivial prob of failure means that some component of HDFS is always non-functional. Detection of faults and quick, automatic recovery from them is core arch goal of HDFS. 2.2. Streaming Data Access Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates. 2.3. Large Data Sets Apps on HDFS have large data sets, typically gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It provides high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It supports ~10 million files in a single instance. 2.4. Simple Coherency Model: HDFS apps need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A Map/Reduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in future [write once read many at file level] 2.5. “Moving Computation is Cheaper than Moving Data” A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the app is running. HDFS provides interfaces for applications to move themselves closer to where the data is located. 2.6. Portability Across Heterogeneous Hardware and Software Platforms: HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications. 3. NameNode and DataNodes: HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is 1 blocks stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction The NameNode and DataNode are pieces of software designed to run on commodity machines, typically run GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case. The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode. 4. The File System Namespace: HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features. The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This info is stored by NameNode. 5. Data Replication: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode
View full slide show




Kernel Data Structures  Kernel keeps state info for I/O components, including open file tables, network connections, character device state  Many, many complex data structures to track buffers, memory allocation, “dirty” blocks  Some use object-oriented methods and message passing to implement I/O  Windows uses message passing  Message with I/O information passed from user mode into kernel  Message modified as it flows through to device driver and back to process  Pros / cons? Operating System Concepts – 9th Edition 13.33 Silberschatz, Galvin and Gagne ©2013
View full slide show




Kernel Data Structures  Kernel keeps state info for I/O components, including open file tables, network connections, character device state  Many, many complex data structures to track buffers, memory allocation, “dirty” blocks  Some use object-oriented methods and message passing to implement I/O Operating System Concepts with Java – 8th Edition 12.48 Silberschatz, Galvin and Gagne ©2009
View full slide show




Kernel Data Structures  Kernel keeps state info for I/O components, including open file tables, network connections, character device state  Many, many complex data structures to track buffers, memory allocation, “dirty” blocks  Some use object-oriented methods and message passing to implement I/O Operating System Concepts – 8th Edition 13.28 Silberschatz, Galvin and Gagne ©2009
View full slide show




6. The Persistence of File System Metadata: The HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For example, creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating this. Similarly, changing the replication factor of a file causes a new record to be inserted into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog. The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too. The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint. In the current implementation, a checkpoint only occurs when the NameNode starts up. Work is in progress to support periodic checkpointing in the near future. The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separate file in its local file system. The DataNode does not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of files per directory and creates subdirectories appropriately. It is not optimal to create all local files in the same directory because the local file system might not be able to efficiently support a huge number of files in a single directory. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files and sends this report to the NameNode: this is the Blockreport. 7. The Communication Protocols: All HDFS communication protocols are layered on top of the TCP/IP protocol. A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients. 8. Robustness: The primary objective of HDFS is to store data reliably even in the presence of failures. The three common types of failures are NameNode failures, DataNode failures and network partitions. 8.1. Data Disk Failure, Heartbeats and Re-Replication: Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased. 8.2. Cluster Rebalancing: HDFS arch is compatible with data rebalancing . A scheme might automatically move data from 1 DataNode to another if the free space on a DataNode falls below a certain threshold. In the event of a sudden high demand for a particular file, a scheme might dynamically create additional replicas and rebalance other data in the cluster. These types of data rebalancing schemes are not yet implemented. 8.3. Data Integrity: It is possible that a block of data fetched from a DataNode arrives corrupted. This corruption can occur because of faults in a storage device, network faults, or buggy software. The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each DataNode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another DataNode that has a replica of that block.
View full slide show




Comparison with other methods Recently, Tjong and Zhou (2007) developed a neural network method for predicting DNA-binding sites. In their method, for each surface residue, the PSSM and solvent accessibilities of the residue and its 14 neighbors were used as input to a neural network in the form of vectors. In their publication, Tjong and Zhou showed that their method achieved better performance than other previously published methods. In the current study, the 13 test proteins were obtained from the study of Tjong and Zhou. Thus, we can compare the method proposed in the current study with Tjong and Zhou’s neural network method using the 13 proteins. Figure 1. Tradeoff between coverage and accuracy In their publication, Tjong and Zhou also used coverage and accuracy to evaluate the predictions. However, they defined accuracy using a loosened criterion of “true positive” such that if a predicted interface residue is within four nearest neighbors of an actual interface residue, then it is counted as a true positive. Here, in the comparison of the two methods, the strict definition of true positive is used, i.e., a predicted interface residue is counted as true positive only when it is a true interface residue. The original data were obtained from table 1 of Tjong and Zhou (2007), the accuracy for the neural network method was recalculated using this strict definition (Table 3). The coverage of the neural network was directly taken from Tjong and Zhou (2007). For each protein, Tjong and Zhou’s method reported one coverage and one accuracy. In contrast, the method proposed this study allows the users to tradeoff between coverage and accuracy based on their actual need. For the purpose of comparison, for each test protein, topranking patches are included into the set of predicted interface residues one by one in the decreasing order of ranks until coverage is the same as or higher than the coverage that the neural network method achieved on that protein. Then the coverage and accuracy of the two methods are compared. On a test protein, method A is better than B, if accuracy(A)>accuracy(B) and coverage (A)≥coverage(B). Table 3 shows that the graph kernel method proposed in this study achieves better results than the neural network method on 7 proteins (in bold font in table 3). On 4 proteins (shown in gray shading in table 3), the neural network method is better than the graph kernel method. On the remaining 2 proteins (in italic font in table 3), conclusions can be drawn because the two conditions, accuracy(A)>accuracy(B) and coverage (A)≥coverage(B), never become true at the same time, i.e., when coverage (graph kernel)>coverage(neural network), we have accuracy(graph kernel)accuracy(neural network). Note that the coverage of the graph kernel method increases in a discontinuous fashion as we use more patches to predict DNA-binding sites. One these two proteins, we were not able to reach at a point where the two methods have identical coverage. Given these situations, we consider that the two methods tie on these 2 proteins. Thus, these comparisons show that the graph kernel method can achieves better results than the neural network on 7 of the 13 proteins (shown in bold font in Table 3). Additionally, on another 4 proteins (shown in Italic font in Table 3), the graph kernel method ties with the neural network method. When averaged over the 13 proteins, the coverage and accuracy for the graph kernel method are 59% and 64%. It is worth to point out that, in the current study, the predictions are made using the protein structures that are unbound with DNA. In contrast, the data we obtained from Tjong and Zhou’s study were obtained using proteins structures bound with DNA. In their study, Tjong and Zhou showed that when unbound structures were used, the average coverage decreased by 6.3% and average accuracy by 4.7% for the 14 proteins (but the data for each protein was not shown).
View full slide show




C:\UMBC\331\java> java.ext.dirs=C:\JDK1.2\JRE\lib\ext java.io.tmpdir=C:\WINDOWS\TEMP\ os.name=Windows 95 java.vendor=Sun Microsystems Inc. java.awt.printerjob=sun.awt.windows.WPrinterJob java.library.path=C:\JDK1.2\BIN;.;C:\WINDOWS\SYSTEM;C:\... java.vm.specification.vendor=Sun Microsystems Inc. sun.io.unicode.encoding=UnicodeLittle file.encoding=Cp1252 java.specification.vendor=Sun Microsystems Inc. user.language=en user.name=nicholas java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport... java.vm.name=Classic VM java.class.version=46.0 java.vm.specification.name=Java Virtual Machine Specification sun.boot.library.path=C:\JDK1.2\JRE\bin os.version=4.10 java.vm.version=1.2 java.vm.info=build JDK-1.2-V, native threads, symcjit java.compiler=symcjit path.separator=; file.separator=\ user.dir=C:\UMBC\331\java sun.boot.class.path=C:\JDK1.2\JRE\lib\rt.jar;C:\JDK1.2\JR... user.name=nicholas user.home=C:\WINDOWS C:\UMBC\331\java>java envSnoop -- listing properties -java.specification.name=Java Platform API Specification awt.toolkit=sun.awt.windows.WToolkit java.version=1.2 java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment user.timezone=America/New_York java.specification.version=1.2 java.vm.vendor=Sun Microsystems Inc. user.home=C:\WINDOWS java.vm.specification.version=1.0 os.arch=x86 java.awt.fonts= java.vendor.url=http://java.sun.com/ user.region=US file.encoding.pkg=sun.io java.home=C:\JDK1.2\JRE java.class.path=C:\Program Files\PhotoDeluxe 2.0\Adob... line.separator=
View full slide show




The NameNode and DataNode are pieces of software designed to run on commodity machines, typically run GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case. The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode. 4. The File System Namespace: HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features. The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This info is stored by NameNode. 5. Data Replication: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode
View full slide show




Event Reconstruction - Reconstruction of deleted files.  In most file systems file deletion does not erase the information stored in the file. Instead, the file entry and the data blocks used by the file are marked as unallocated, so that they can be reused later for another file. Thus, unless the data blocks and the deleted file entry have been reallocated to another file, the deleted file can usually be recovered by restoring its file entry and data blocks to active status.  Even if the file entry and some of the data blocks have been re-allocated, it may still be possible to reconstruct parts of the file. The lazarus tool for example, uses several heuristics to find and piece together blocks that (could have) once belonged to a file. Lazarus uses heuristics about file systems and common file formats.  In most file systems, a file begins at the beginning of a disk block; Most file systems write file into contiguous blocks, if possible; Most file formats have a distinguishing pattern of bytes near the beginning of the le; For most file formats, same type of data is stored in all blocks of a file.
View full slide show




The Linux Ext2fs File System  Ext2fs uses a mechanism similar to that of BSD Fast File System (ffs) for locating data blocks belonging to a specific file  The main differences between ext2fs and ffs concern their disk allocation policies  In ffs, the disk is allocated to files in blocks of 8Kb, with blocks being subdivided into fragments of 1Kb to store small files or partially filled blocks at the end of a file  Ext2fs does not use fragments; it performs its allocations in smaller units   The default block size on ext2fs is 1Kb, although 2Kb and 4Kb blocks are also supported Ext2fs uses allocation policies designed to place logically adjacent blocks of a file into physically adjacent blocks on disk, so that it can submit an I/O request for several disk blocks as a single operation Operating System Concepts – 8th Edition 21.50 Silberschatz, Galvin and Gagne ©2009
View full slide show




Character Devices  A device driver which does not offer random access to fixed blocks of data  A character device driver must register a set of functions which implement the driver’s various file I/O operations  The kernel performs almost no preprocessing of a file read or write request to a character device, but simply passes on the request to the device  The main exception to this rule is the special subset of character device drivers which implement terminal devices, for which the kernel maintains a standard interface Operating System Concepts – 8th Edition 21.56 Silberschatz, Galvin and Gagne ©2009
View full slide show




A list of the different modes of opening a file: wb+ Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing. a Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing. ab Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing. a+ Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing. ab+ Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
View full slide show




8.4. Metadata Disk Failure: The FsImage and EditLog are central data structures. A corruption of these files can cause the HDFS instance to be non-functional. For this reason, the NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously. This synchronous updating of multiple copies of the FsImage and EditLog may degrade the rate of namespace transactions per second that a NameNode can support. However, this degradation is acceptable because even though HDFS applications are very data intensive in nature, they are not metadata intensive. When a NameNode restarts, it selects the latest consistent FsImage and EditLog to use. The NameNode machine is a single point of failure for an HDFS cluster. If the NameNode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the NameNode software to another machine is not supported. 8.5. Snapshots: Snapshots support storing a copy of data at a particular instant of time. One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time. HDFS does not currently support snapshots but will in a future release. 9. Data Organization 9.1. Data Blocks: HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different DataNode. 9.2. Staging: A client request to create a file does not reach the NameNode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the NameNode. The NameNode inserts the file name into the file system hierarchy and allocates a data block for it. The NameNode responds to the client request with the identity of the DataNode and the destination data block. Then the client flushes the block of data from the local temporary file to the specified DataNode. When a file is closed, the remaining un-flushed data in the temporary local file is transferred to the DataNode. The client then tells the NameNode that the file is closed. At this point, the NameNode commits the file creation operation into a persistent store. If the NameNode dies before the file is closed, the file is lost. The above approach has been adopted after careful consideration of target applications that run on HDFS. These applications need streaming writes to files. If a client writes to a remote file directly without any client side buffering, the network speed and the congestion in the network impacts throughput considerably. This approach is not without precedent. Earlier distributed file systems, e.g. AFS, have used client side caching to improve performance. A POSIX requirement has been relaxed to achieve higher performance of data uploads. 9.3. Replication Pipelining: When a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of DataNodes from the NameNode. This list contains the DataNodes that will host a replica of that block. The client then flushes the data block to the first DataNode. The first DataNode starts receiving the data in small portions (4 KB), writes each portion to its local repository and transfers that portion to the second DataNode in the list. The second DataNode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third DataNode. Finally, the third DataNode writes the data to its local repository. Thus, a DataNode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one DataNode to the next.
View full slide show




Array Lesson 2 Outline 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Array Lesson 2 Outline Reading Array Values Using for Loop 19. #1 20. Reading Array Values Using for Loop 21. #2 22. for Loop: Like Many Statements #1 23. for Loop: Like Many Statements #2 24. for Loop: Like Many Statements #3 25. Reading Array on One Line of Input #1 Reading Array on One Line of Input #2 26. Reading Array on One Line of Input #3 27. Aside: Why Named Constants Are Good 28. Named Constants as Loop Bounds #1 29. Named Constants as Loop Bounds #2 30. Computing with Arrays #1 31. Computing with Arrays #2 32. Computing with Arrays #3 33. Computing with Arrays #4 34. Computing with Arrays #5 35. Static Memory Allocation Static Memory Allocation Example #1 Static Memory Allocation Example #2 Static Sometimes Not Good Enough #1 Static Sometimes Not Good Enough #2 Static Sometimes Not Good Enough #3 Static Sometimes Not Good Enough #4 Static Memory Allocation Can Be Wasteful Dynamic Memory Allocation #1 Dynamic Memory Allocation #2 Dynamic Memory Allocation #3 Dynamic Memory Allocation #4 Dynamic Memory Deallocation Dynamic Memory Allocation Example #1 Dynamic Memory Allocation Example #2 Dynamic Memory Allocation Example #3 Exercise: mean #1 Exercise: mean #2 Array Lesson 2 CS1313 Spring 2019 1
View full slide show




The Linux ext3 File System (Cont.)  The main differences between ext2fs and FFS concern their disk allocation policies  In ffs, the disk is allocated to files in blocks of 8Kb, with blocks being subdivided into fragments of 1Kb to store small files or partially filled blocks at the end of a file  ext3 does not use fragments; it performs its allocations in smaller units  The default block size on ext3 varies as a function of total size of file system with support for 1, 2, 4 and 8 KB blocks  ext3 uses cluster allocation policies designed to place logically adjacent blocks of a file into physically adjacent blocks on disk, so that it can submit an I/O request for several disk blocks as a single operation on a block group  Maintains bit map of free blocks in a block group, searches for free byte to allocate at least 8 blocks at a time Operating System Concepts Essentials – 2nd Edition 15.49 Silberschatz, Galvin and Gagne ©2013
View full slide show




Character Devices  A device driver which does not offer random access to fixed blocks of data  A character device driver must register a set of functions which implement the driver’s various file I/O operations  The kernel performs almost no preprocessing of a file read or write request to a character device, but simply passes on the request to the device  The main exception to this rule is the special subset of character device drivers which implement terminal devices, for which the kernel maintains a standard interface Operating System Concepts Essentials – 2nd Edition 15.56 Silberschatz, Galvin and Gagne ©2013
View full slide show




File Systems  To the user, Linux’s file system appears as a hierarchical directory tree obeying UNIX semantics  Internally, the kernel hides implementation details and manages the multiple different file systems via an abstraction layer, that is, the virtual file system (VFS)  The Linux VFS is designed around object-oriented principles and is composed of two components:   A set of definitions that define what a file object is allowed to look like  The inode-object and the file-object structures represent individual files  the file system object represents an entire file system A layer of software to manipulate those objects Operating System Concepts – 8th Edition 21.49 Silberschatz, Galvin and Gagne ©2009
View full slide show




The Linux Ext2fs File System  Ext2fs uses a mechanism similar to that of BSD Fast File System (ffs) for locating data blocks belonging to a specific file.  The main differences between ext2fs and ffs concern their disk allocation policies.  In ffs, the disk is allocated to files in blocks of 8Kb, with blocks being subdivided into fragments of 1Kb to store small files or partially filled blocks at the end of a file.  Ext2fs does not use fragments; it performs its allocations in smaller units. The default block size on ext2fs is 1Kb, although 2Kb and 4Kb blocks are also supported.  Ext2fs uses allocation policies designed to place logically adjacent blocks of a file into physically adjacent blocks on disk, so that it can submit an I/O request for several disk blocks as a single operation. Operating System Concepts 20.42 Silberschatz, Galvin and Gagne 2002
View full slide show




Character Devices  A device driver which does not offer random access to fixed blocks of data.  A character device driver must register a set of functions which implement the driver’s various file I/O operations.  The kernel performs almost no preprocessing of a file read or write request to a character device, but simply passes on the request to the device.  The main exception to this rule is the special subset of character device drivers which implement terminal devices, for which the kernel maintains a standard interface. Operating System Concepts 20.48 Silberschatz, Galvin and Gagne 2002
View full slide show




File Systems  To the user, Linux’s file system appears as a hierarchical directory tree obeying UNIX semantics  Internally, the kernel hides implementation details and manages the multiple different file systems via an abstraction layer, that is, the virtual file system (VFS)  The Linux VFS is designed around object-oriented principles and is composed of four components:  A set of definitions that define what a file object is allowed to look like  The inode object structure represent an individual file  The file object represents an open file  The superblock object represents an entire file system  A dentry object represents an individual directory entry Operating System Concepts Essentials – 2nd Edition 15.46 Silberschatz, Galvin and Gagne ©2013
View full slide show




File System Types  Operating systems have multiple file system types  One or more general-purpose (for storing user files)  One or more special-purpose, i.e.  tmpfs—“temporary” file system in volatile main memory, contents erased if the system reboots or crashes  objfs—a “virtual” file system (essentially an interface to the kernel that looks like a file system) that gives debuggers access to kernel symbols  ctfs— a virtual file system that maintains “contract” information to manage which processes start when the system boots and must continue to run during operation  lofs—a “loop back” file system that allows one file system to be accessed in place of another one  procfs—a virtual file system that presents information on all processes as a file system Operating System Concepts Essentials – 8 th Edition 9.22 Silberschatz, Galvin and Gagne ©2011
View full slide show