Slide #1.

Chapter 16-17 File Structures, Hashing, Indexing, and Physical Database Design 1
More slides like this


Slide #2.

Oracle SQL*Loader • http://www.oracle.com/technetwork/database/enterprise-ed ition/sql-loader-overview-095816.html 2
More slides like this


Slide #3.

Storage • Primary storage (main memory) – Can be operated on directly by computer CPU small, fast • Secondary storage – http://en.wikipedia.org/wiki/Hard_disk – Can not be operated on directly by computer CPU – Magnetic disks, optical disks, tapes, etc. – Larger capacities, inexpensive, slower than main memory 3
More slides like this


Slide #4.

Table 16.1 Types of Storage with Capacity, Access Time, Max Bandwidth (Transfer Speed), and Commodity Cost
More slides like this


Slide #5.

Table 16.2 Specifications of Typical High-End Enterprise Disks from Seagate (a) Seagate Enterprise Performance 10 K HDD 1200 GB continued on next slide
More slides like this


Slide #6.

Storage capacity units • • • • Kilobytes – 1000 bytes Megabytes – 1 million bytes Gigabytes (Gbytes) – 1 billion bytes Terabytes – 1000 gigabytes 6
More slides like this


Slide #7.

Memory Hierarchies and Storage Devices • Primary storage – Cache (static RAM)– most expensive, fast, used by CPU to speed up execution programs http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=cache - Main memory (dynamic RAM) – work area for CPU 7
More slides like this


Slide #8.

Secondary storage (Mass storage) – CD-ROM – Tapes – Disks Main memory database: entire database is stored in main memory 8
More slides like this


Slide #9.

Figure 16.1 (a) A single-sided disk with read/write hardware. (b) A disk pack with read/write
More slides like this


Slide #10.

Figure 16.2 Different sector organizations on disk. (a) Sectors subtending a fixed angle. (b) Sectors maintaining a uniform recording density.
More slides like this


Slide #11.

Tracks The part of a disk which passes under one read/write head while the head is stationary. The number of tracks on a disk surface therefore corresponds to the number of different radial positions of the head(s). The collection of all tracks on all surfaces at a given radial position is known a cylinder and each track is divided into sectors. 11
More slides like this


Slide #12.

Cylinder • The set of tracks on a multi-headed disk that may be accessed without head movement. That is, the collection of disk tracks which are the same distance from the spindle about which the disks rotate. 12
More slides like this


Slide #13.

Sector • one sector lies within a continuous range of rotational angle of the disk 13
More slides like this


Slide #14.

Data transfer between main memory and disks (in blocks) Hardware Address of a block – Surface number – Track number – Block number • Time requires – Seek time – Rotational delay time (latency) – Block transfer time 14
More slides like this


Slide #15.

Table 16.2 (continued) Specifications of Typical HighEnd Enterprise Disks from Seagate (a) Seagate Enterprise Performance 10 K HDD - 1200 GB continued on next slide
More slides like this


Slide #16.

Table 16.2 (continued) Specifications of Typical High-End Enterprise Disks from Seagate (a) Seagate Enterprise Performance 10 K HDD - 1200 GB
More slides like this


Slide #17.

17
More slides like this


Slide #18.

Figure 16.3 Interleaved concurrency versus parallel execution. 18
More slides like this


Slide #19.

Figure 16.4 Use of two buffers, A and B, for reading from disk. 19
More slides like this


Slide #20.

Figure 16.5 Three record storage formats. (a) A fixed-length record with six fields and size of 71 bytes. (b) A record with two variable-length fields and three fixed-length fields. (c) A variablefield record with three types of separator characters. 20
More slides like this


Slide #21.

Figure 16.6 Types of record organization. (a) Unspanned. (b) Spanned.
More slides like this


Slide #22.

Figure 16.7 Some blocks of an ordered (sequential) file of EMPLOYEE records with Name as the ordering key field.
More slides like this


Slide #23.

Table 16.3 Average Access Times for a File of b Blocks under Basic File Organizations
More slides like this


Slide #24.

File organization • Heap file (unordered file) place new records in no order at the end of the file • Sorted file ( sequential file) keeps the records ordered by the value of a particular file • Hashed file Uses hash function applied to a field (hash key) to determine a record’s placement on disk • B-trees, B+ trees – use tree structure 24
More slides like this


Slide #25.

Hashing techniques Static hashing – hash address space is fixed Extendible hashing Linear hashing 25
More slides like this


Slide #26.

Hashing algorithm 26
More slides like this


Slide #27.

Hash Table (Wikipedia) http://en.wikipedia.org/wiki/Hash_table 27
More slides like this


Slide #28.

28
More slides like this


Slide #29.

Figure 16.8 Internal hashing data structures. (a) Array of M positions for use in internal hashing. (b) Collision resolution by chaining records.
More slides like this


Slide #30.

Figure 16.9 Matching bucket numbers to disk block addresses.
More slides like this


Slide #31.

Figure 16.10 Handling overflow for buckets by chaining.
More slides like this


Slide #32.

Figure 16.11 Structure of the extendible hashing scheme.
More slides like this


Slide #33.

33
More slides like this


Slide #34.

Figure 16.11 Structure of the extendible hashing scheme.
More slides like this


Slide #35.

Figure 16.12 Structure of the dynamic hashing scheme.
More slides like this


Slide #36.

Figure 16.13 Striping of data across multiple disks. (a) Bit-level striping across four disks. (b) Block-level striping across four disks.
More slides like this


Slide #37.

Figure 16.14 Some popular levels of RAID. (a) RAID level 1: Mirroring of data on two disks. (b) RAID level 5: Striping of data with distributed parity across four disks.
More slides like this


Slide #38.

38
More slides like this


Slide #39.

• A search tree of order p is a tree such that each node contains at most p - 1 search values and p pointers in the order < P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq >, where q 1 p; each Pi is a pointer to a child node (or a null pointer); and each Ki is a search value from some ordered set of values. 39
More slides like this


Slide #40.

40
More slides like this


Slide #41.

41
More slides like this


Slide #42.

B tree of order p 1. Each internal node in the B-tree is of the form , P2, , ..., , Pq> where q 1 p. Each Pi is a tree pointer—a pointer to another node in the B-tree. Each Pri is a data pointer —a pointer to the record whose search key field value is equal to Ki (or to the data file block containing that record). 2. Within each node, K1
More slides like this


Slide #43.

EXAMPLE 5: Suppose that the search field of Example 4 is a nonordering key field, and we construct a B-tree on this field. Assume that each node of the B-tree is 69 percent full. Each node, on the average, will have p * 0.69 = 23 * 0.69 or approximately 16 pointers and, hence, 15 search key field values. The average fan-out fo =16. We can start at the root and see how many values and pointers can exist, on the average, at each subsequent level: Root: Level 1: Level 2: Level 3: Level 4: 1 node 16 nodes 256 nodes 4096 nodes 65536 nodes 15 entries 240 entries 3840 entries 61,440 entries 983,040 entries 16 pointers 256 pointers 4096 pointers 65536 pointers 43
More slides like this


Slide #44.

B+ Trees The structure of the internal nodes of a B+tree of order p is as follows: 1. Each internal node is of the form where q 1 p and each Pi is a tree pointer. 2. Within each internal node, K1 < K2 < ...
More slides like this


Slide #45.

The structure of the leaf nodes of a B+-tree of order p (Figure 14.11b) is as follows: 1. Each leaf node is of the form < , , ..., , Pnext> 2. 3. 4. 5. Where q 1 p, each Pri is a data pointer, and Pnext points to the next leaf node of the B+-tree. Within each leaf node, K1 < K2 < ... < Kq-1, q 1 p. Each Pri is a data pointer that points to the record whose search field value is Ki or to a file block containing the record (or to a block of record pointers that point to records whose search field value is K i if the search field is not a key). Each leaf node has at least (p/2) values. All leaf nodes are at the same level. 45
More slides like this


Slide #46.

46
More slides like this


Slide #47.

47
More slides like this


Slide #48.

EXAMPLE 7: Suppose that we construct a B+-tree on the field of Example 6. To calculate the approximate number of entries of the B+-tree, we assume that each node is 69 percent full. On the average, each internal node will have 34 * 0.69 or approximately 23 pointers, and hence 22 values. Each leaf node, on the average, will hold 0.69 * pleaf = 0.69 * 31 or approximately 21 data record pointers. A B+-tree will have the following average number of entries at each level: Root: Level 1: Level 2: Level 3: Level 4: 1 node 23 nodes 529 nodes 12,167nodes 279,841 nodes 22 entries 506 entries 11,638 entries 255,507entries 5,876,661entries 23 pointers 529 pointers 12,167 pointers 279,841 pointers 48
More slides like this


Slide #49.

49
More slides like this


Slide #50.

50
More slides like this


Slide #51.

51
More slides like this


Slide #52.

52
More slides like this


2019 slides.show. All Rights Reserved