More Sort Names • Bubble sort · Cocktail sort · Odd-even sort · Comb sort · Gnome sort · Quicksort · Selection sort · Heapsort · Smoothsort · Cartesian tree sort · Tournament sort · Cycle sort · Insertion sort · Shell sort · Tree sort · Library sort · Patience sorting · Monkey-puzzle sort · Merge sort · Polyphase merge sort · Strand sort · American flag sort · Bead sort · Bucket sort · Burstsort · Counting sort · Pigeonhole sort · Proxmap sort · Radix sort · Flashsort · Bitonic sorter · Batcher odd-even mergesort · Timsort · Introsort · Spreadsort · UnShuffle sort · JSort · Spaghetti sort · Pancake sort
View full slide show




4.7.   Sorting and duplicate removal are expressed in ORDER BY and DISTINCT in SQL Parallel algorithms for database sorting   Parallel merge-all sort, parallel binary-merge sort, parallel redistribution binarymerge sort, parallel redistribution merge-all sort, and parallel partitioned sort Cost models for each parallel sort algorithm   Summary Buffer size Parallel redistribution algorithm is prone to processing skew   If processing skew degree is high, then use parallel redistribution merge-all sort. If both data skew and processing skew degrees are high or no skew, then use parallel partitioned sort D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
View full slide show




in[] sort.grid even.odd even.odd even.odd even.odd even.odd even.odd even.odd even.odd Mar 18, 2019 Copyright P.H.Welch out[] 75
View full slide show




Task 2: Implement the following sorting algorithms: selection sort, insertion sort, quick sort and merge sort, and compare them by real running time Requirements • Find the running time as follows: check the current time t1; repeat the following steps 1000 times { randomly generate 10000 integers and save into array A; sort the integers in A using selection sort (or insertion sort, quick sort, merge sort); sort) } check the current time t2; running time of selection sort (or insertion sort, quick sort, merge sort) : = t2 – t1; • Implement the algorithms using C, C++, C#, or any other programming language. Submission Project description, Algorithms, Algorithm analysis, Experiment output, Code.
View full slide show




What is the parallel time complexity of Odd-Even Transposition Sort with N numbers and P processors where each processor handles N/P numbers in the compare and exchange operation Answer:
View full slide show




A Distributed Memory Implementation • Scatter the data among available processors • Locally sort N/P items on each processor • Even Passes – Even processors, p=2, exchange data with processor, p-1. – Processors, p, and p-1 perform a partial merge where p extracts the upper half and p-1 extracts the lower half. • Exchanging Data: MPI_Sendrecv
View full slide show




7.7. Comparative Analysis (cont’d)  Parallel One-Index Search   Processor involvement, index traversal, and record loading Shaded cells show more expensive operations in comparison with others within the same operation Processor Involvement Index Traversal Record Loading NRI-1 Selected processors Local search Local record load NRI Schemes NRI-2 NRI-3 All Selected processors processors Local Local search search Local Remote record load record load PRI-1 Selected processors Local search Local record load PRI Schemes PRI-2 PRI-3 Selected Selected processors processors Remote Local search search Local Remote record load record load FRI Schemes FRI-1 FRI-3 Selected Selected processors processors Local Local search search Local Remote record load record load Figure 7.24. A C omparative Table for Parallel One-Index Selection Query Processing D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
View full slide show




One Parallel Iteration Distributed Memory • Odd Processors: sendRecv(pr data, pr-1 data); mergeHigh(pr data, pr-1 data) if(r<=P-2) { sendRecv(pr data, pr+1 data); mergeLow(pr data, pr+1 data) } • Even Processors: sendRecv(pr data, pr+1 data); mergeLow(pr data, pr+1 data) if(r>=1) { sendrecv(pr data, Pr-1 data); mergeHigh(pr data, pr-1 data) } Shared Memory • Odd Processors: mergeLow(pr data, pr-1 data) ; Barrier if (r<=P-2) mergeHigh(pr data,pr+1 data) Barrier • Even Processors: mergeHigh(pr data, pr+1 data) ; Barrier if (r>=1) mergeLow(pr data, pr-1 data) Barrier Notation: r = Processor rank, P = number of processors, pr data is the block of data belonging to processor, r Note: P/2 Iterations are necessary to complete the sort
View full slide show




Counting Sort Works on primitive fixed point types: int, char, long, etc. Assumption: the data entries contain a fixed number of values 1. Master scatters the data among the processors 2. In parallel, each processor counts the total occurrences for each of the N/P data points 3. Processors perform a collective reduce sum operation 4. Processors performs an all-to-all collective prefix sum operation 5. In parallel, each processor stores V/P data items appropriately in the output array where V is the number of unique values 6. Sorted data gathered at the master processor Note: Counting sort is used to reduce the memory needed for radix sort
View full slide show




Local Synchronization Synchronize with neighbors before proceeding • Even Processors Send null message to processor i-1 Receive null message from processor i-1 Send null message to processor i+1 Receive null message from processor i+1 • Odd Numbered Processors Receive null message from processor i+1 Send null message to processor i+1 Receive null message from processor i-1 Send null message to processor i-1 • Notes: – Local Synchronization is an incomplete barrier: processors exit after receiving messages from their neighbors – Reminder: Deadlock can occur with incorrrect message passing orders. MPI_Sendrecv() and MPI_Sendrecv_replace() are deadlock free
View full slide show




1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. Describe each of the five types of neuroglial cells. Describe the groupings of neurons by structural differences and by functional differences. Compare and contrast resting potential and action potential. How does a nerve impulse travel the length of a nerve? How is this process different in myelenated fibers vs unmyelinated fibers? Describe the process of synaptic transmission. Compare and contrast exitatory and inhibitory actions by neurons. What are neurotransmitters? Where are they synthesized and stored? How do they act? Compare and contrast facilitatation, convergence, and divergence. What is a reflex arc? List the steps involved. What are meninges? Describe the three types found in the skull. Compare and contrast white matter and gray matter. Describe each of the functional divisions of the brain. What is hemisphere dominance? What functions are performed by the dominant hemisphere? by the nondominant hemisphere? Compare and contrast the sympathetic and parasympathetic nervous systems. List the 5 general types of receptors and state what each detects. What is sensory adaptation and why is it important? What is referred pain and why does it occur? Compare and contrast acute and chronic pain. Compare and contrast the olfactory and taste senses. Describe the process of hearing a sound beginning with the sound entering the external auditory meatus. Compare and contrast static and dynamic equilibrium. Compare and contrast rods and cones. Compare and contrast steroid and non-steroid hormones. Give a detailed example of a negative feedback mechanism involving hormones. Describe the three basic types of blood cells. What is hemoglobin? How is it broken down? What are the 5 types of leukocytes and how are they characterized? Describe blood plasma and list its major constituents. Describe the stages of hemostasis. Why is AB blood the universal acceptor? Why is O blood the universal donor? What would happen if AB type blood were transfused into a patient with type O blood? (be specific) Compare and contrast the pulmonary circuit and systemic circuit. Explain the steps of the cardiac cycle. What is an electrocardiogram and what does each point in the graph represent? Compare and contrast arteries, capillaries, and veins. Describe the exchange of substances between capillaries and surrounding tissues. 195 Describe the factors that effect arterial blood pressure.
View full slide show




Intel MMX ISA Extension Class Vector Copy Shift Op type 32 bits Register copy Arithmetic Table 25.1 Instruction Parallel pack 4, 2 Parallel unpack low 8, 4, 2 Parallel unpack high 8, 4, 2 Parallel add 8, 4, 2 Parallel subtract 8, 4, 2 Parallel multiply low 4 Parallel multiply high 4 Parallel multiply-add 4 Parallel compare equal 8, 4, 2 Parallel compare greater 8, 4, 2 Parallel left shift logical 4, 2, 1 Parallel right shift logical 4, 2, 1 Saturate Wrap/Satur ate# Wrap/Satur ate# Architecture, Advanced Architectures Parallel rightComputer shift arith 4, 2 Function or results Integer register  MMX register Convert to narrower elements Merge lower halves of 2 vectors Merge upper halves of 2 vectors Add; inhibit carry at boundaries Subtract with carry inhibition Multiply, keep the 4 low halves Multiply, keep the 4 high halves Multiply, add adjacent products* All 1s where equal, else all 0s All 1s where greater, else all 0s Shift left, respect boundaries Shift right, respect boundaries Arith shift within each (half)word
View full slide show




Counting Sort Works on primitive fixed point types: int, char, long, etc. 1. Master scatters the data among the processors 2. In parallel, each processor counts the total occurrences for each of the N/P data points 3. Processors perform a collective sum operation 4. Processors performs an all-to-all collective prefix sum operation 5. In parallel, each processor stores the N/P data items appropriately in the output array 6. Sorted data gathered at the master processor Note: This logic can be repeated to implement a radix sort
View full slide show




Stencil Pragma Replaced with Code to do: 1. The array given as an argument to the stencil pragma is broadcast to all available processors. 2. A loop is created to iterate max_iteration number of times. Within that loop, code is inserted to perform the following steps: a. b. c. d. Each processor (except the last one) will send its last row to the processor with rank one more than its own rank. Each processor (except the first one) will receive the last row from the processor with rank one less than its own rank. Each processor (except the first one) will send its first row to the processor with rank one less than its own rank. Each processor (except the last one) will receive the first row from the processor with rank one more than its own rank. e. Each processor will iterate through the values of the rows for which it is responsible and use the function provided compute the next value. 3. The data is gathered back to the root processor (rank 0). 11
View full slide show




How to gather/scatter efficiently (q = # of processors) • • • If not already known, identify a minimal spanning tree (MST) rooted at the processor to which data is to be gathered. This is done as follows: Root sends message to each neighbor. Each non-root processor waits for a message. First message to arrive identifies processor’s parent. Upon receipt, send message to each neighbor identifying sender’s parent. •Each processor receives messages described above. If A receives a message from B identifying A as parent of B, A knows B is A’s child. Advanced techniques show this takes O(q) time. •Performing the gather: In parallel, each processor sends data to its parent processor in the MST until each value reaches the root processor. This takes Θ(q) time. •Thus, a gather operation takes Θ(q) time. To scatter efficiently: reverse the direction of data flow for a gather operation: Θ(q) time.
View full slide show




Sorting a Table on Multiple Fields Using the Custom Sort Command • With a cell in the table active, click the Sort & Filter button (Home tab | Editing group) to display the Sort & Filter menu • Click Custom Sort on the Sort & Filter menu to display the Sort dialog box • Click the Sort by box arrow to display the field names in the table • Click the first field on which to sort to select the first sort level • Select the desired options for Sort On and Order • Click the Add Level button to ask a new sort level, and then repeat the previous two steps • Click the OK button to sort the table Creating, Sorting, and Querying a Table 27
View full slide show




Getting each processor the m-1 characters of T that follow the processor’s last character of T (case 1): Suppose processors holding consecutive segments of T are adjacent (this is possible for linear arrays, meshes using snake-like order for processors, hypercubes; not for trees, etc). Then: • In parallel, each odd-numbered processor Pi gets the 1st m-1 characters of T that are stored in Pi 1 . This takes Θ(m) time via direct communication (since these processors are adjacent). • Similarly, in parallel, each even-numbered processor Pi gets the 1st m-1 characters of T that are stored in Pi 1 . This takes Θ(m) time via direct communication. • Thus, total time for this process is Θ(m).
View full slide show




submit script run in scr/mpp-mpred-3.2.0 also produces these subdirs in mpp-mpred-3.2.0 : drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty drwxr-xr-x 2 perrizo faculty 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 4096 Nov 3 10:15 a.1.1 3 10:15 a.1.2 3 10:15 a.1.4 3 10:15 a.1.7 3 10:15 a.1.9 3 10:15 a.2.1 3 10:15 a.2.2 3 10:16 a.2.4 3 10:16 a.2.7 3 10:16 a.2.9 3 10:16 a.4.1 3 10:16 a.4.2 3 10:16 a.4.4 3 10:16 a.4.7 3 10:17 a.4.9 3 10:17 a.7.1 3 10:17 a.7.2 3 10:17 a.7.4 3 10:17 a.7.7 3 10:17 a.7.9 3 10:17 a.9.1 3 10:18 a.9.2 3 10:18 a.9.4 3 10:18 a.9.7 3 10:18 a.9.9 p95test.txt.rmse Movie: 12641: 0: Answer: 1 Prediction: 1.22 Error: 0.04840 1: Answer: 4 Prediction: 3.65 Error: 0.12250 2: Answer: 2 Prediction: 2.55 Error: 0.30250 3: Answer: 4 Prediction: 4.04 Error: 0.00160 4: Answer: 2 Prediction: 1.85 Error: 0.02250 Sum: 0.49750 Total: 5 RMSE: 0.315436 Running RMSE: 0.315436 / 5 predictions Movie: 12502: 0: Answer: 4 Prediction: 4.71 Error: 0.50410 1: Answer: 5 Prediction: 3.54 Error: 2.13160 2: Answer: 5 Prediction: 3.87 Error: 1.27690 3: Answer: 3 Prediction: 3.33 Error: 0.10890 4: Answer: 2 Prediction: 2.97 Error: 0.94090 Sum: 4.96240 Total: 5 RMSE: 0.996233 Running RMSE: 0.738911 / 10 predictions .. . Movie: 10811: 0: Answer: 5 Prediction: 4.05 Error: 0.90250 1: Answer: 3 Prediction: 3.49 Error: 0.24010 2: Answer: 4 Prediction: 3.94 Error: 0.00360 3: Answer: 3 Prediction: 3.39 Error: 0.15210 Sum: 1.29830 Total: 4 RMSE: 0.569715 Running RMSE: 0.964397 / 743 predictions Movie: 12069: 0: Answer: 4 Prediction: 3.20 Error: 0.64000 1: Answer: 3 Prediction: 3.48 Error: 0.23040 Sum: 0.87040 Total: 2 RMSE: 0.659697 Prediction summary: Sum: 691.90610 Total: 745 RMSE: 0.963708 and e.g., a.9.9 contains: -rw-r--r-- 1 perrizo faculty 7441 Nov -rw-r--r-- 1 perrizo faculty 5191 Nov -rw-r--r-- 1 perrizo faculty 1808 Nov -rw-r--r-- 1 perrizo faculty 1465 Nov -rw-r--r-- 1 perrizo faculty 688 Nov -rw-r--r-- 1 perrizo faculty 4330 Nov -rw-r--r-- 1 perrizo faculty 46147Nov 3 10:17 a.9.9.config 3 10:18 hi-a.9.9.txt 3 10:18 hi-a.9.9.txt.answers 3 10:18 lo-a.9.9.txt 3 10:18 lo-a.9.9.txt.answers 3 10:18 p95test.txt.predictions 3 10:18 p95test.txt.rmse .predictions 12641: 1.22 3.65 2.55 4.04 1.85 12502: 4.71 3.54 3.87 3.33 2.97 .. . 10811: 4.05 3.49 3.94 3.39 12069: 3.20 3.48
View full slide show




4.3.      Parallel External Sort Parallel Merge-All Sort Parallel Binary-Merge Sort Parallel Redistribution Binary-Merge Sort Parallel Redistribution Merge-All Sort Parallel Partitioned Sort D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
View full slide show