## Slide #1.

More slides like this

## Slide #2.

More slides like this

## Slide #3.

Issues for Display Ads Lack of focus on newspaper Ad blocking Website can use an inverted index of word Ranking ads Position of the ad in a list
More slides like this

## Slide #4.

On-line and Off-line Algorithms On-line Off-line Can get parts or all of its input data while running. All the data needed by the algorithm is presented initially. When all the information for the algorithm is not available before the algorithm makes a decision. Algorithm can access data in any order. Once the algorithm is finished it produces an answer
More slides like this

## Slide #5.

Greedy Algorithm and the Competitive Ratio Greedy Algorithm Many on-line algorithms are greedy algorithms. Greedy algorithms make their decision in response to each input element by maximizing some function of the input element and the past. Competitive Ratio On-line Result Off-line Result When comparing on-line and offline algorithms, we can expect that there will be some constant c less than 1 such that on any input, the result of an on-line algorithm is at least c times the result of the optimum off-line algorithm.
More slides like this

## Slide #6.

The Matching Problem Definition: a large number of advertisers want to deliver multiple messages to a large number of consumers. Matching: subset of the edges such that no node is an end of two or more edges. Perfect Matching: when every node appears in the matching. Search queries were introduced by Overture in 2000 -> Google AdWords Advertisers ”bid” on search keywords -> When someone searches for that keyword, the highest bidder’s ad is shown -> Advertiser is charged only if the ad is clicked on. Ads: 1, 2, 3, 4,… Bi-partite Users / search queries: a, graph b, c, d,…
More slides like this

## Slide #7.

Maximal Matching Greedy algorithm: We consider the edges in whatever order they are given. When we consider (x, y), add this edge to the matching if neither x nor y are ends of any edge selected for the matching so far. Perfect Match: Ford-Fulkerson, Augmenting Path, and Hopcroft-Karp algorithms. {(1,a), (2,b), (3,d)} {(1,c), (2,b), (3,d), (4,a)}
More slides like this

## Slide #8.

Fundamental Problem of Search Advertising The Adwords Problem - Is how to pick which ads are displayed on a search page when a user types in a query. The ideal off-line algorithm uses all of the budgets money so its competitive ratio = 1 The order of the ad’s must be determined by an on-line algorithm that must consider: 1. The limited number of advertisement slots for any given query. 2. The budget on how much a company will pay for all the uses of their add. 3. The total amount expected to be made by the ad not which
More slides like this

## Slide #9.

Solutions with example Assume two bidders A1 and A2, two possible queries X or Y, bids of 1 or 0, budgets B = 2, and a query pattern of X1X2Y1Y2. If A1 bids 1 on X. A2 bids 1 on X and Y. X1→ A2. Greedy Algorithm - considers price of bid. X2→A2, Y → nothing. Competitive Ratio = ½ Balance Algorithm - considers price of bid and remaining budget. X2→A1, Y1→A2, Y2→nothing. Competitive Ratio = ¾ Generalised form - advertiser → Ai, bid → xi, unspent fraction of Ai’s budget → fi then ψi=xi(1-e-fi). Query is assigned to Ai such that ψi is a maximum.
More slides like this

## Slide #10.

Matching Bids and Search Queries If a search query occurs having exactly that set of words in some order, then the bid is said to match the query, and it becomes a candidate for selection. We can avoid having to deal with word order by storing all sets of words representing a bid in alphabetic order. The list of words in sorted order forms the hash-key for the bid, and these bids may be stored in a hash table used as an index. Search queries also have their words sorted prior to lookup. When we hash the sorted list, we find in the hash table all the bids for exactly that set of words. They can be retrieved quickly, since we have only to look at the contents of that bucket. If there are a million advertisers, each bidding on 100 queries, and the record of the bid requires 100 bytes, then we require ten gigabytes of main memory, which is well within the limits of what is
More slides like this

## Slide #11.

More Complex Matching Problems We can still maintain a hash-table index for the bids, but the number of subsets of words in a hundred-word email is much too large to look up all the sets, or even all the small sets of about three words. They all involve standing queries that users post to a site, expecting the site to notify them whenever something matching the query becomes available at the site. EX: 1. Twitter allows one to follow all the “tweets” of a given person. However, it is feasible to allow users to specify a set of words, such as “ipod free music” 2. Online news sites often allow users to select from among certain keywords or phrases, e.g., “healthcare” and receive alerts whenever a new news article contains that word or consecutive sequence of words.
More slides like this

## Slide #12.

A Matching Algorithm for Documents and Bids A bid is a (typically small) set of words. A document is a larger set of words, such as an email, tweet, or news article. We assume there are many bids, perhaps on the order of a hundred million or a billion. We shall, as before, represent a bid by its words listed in some order. There are two new elements in the representation. First, we shall include a status with each list of words. The status is an integer indicating how many of the first words on the list have been matched by the current document. When a bid is stored in the index, its status is always 0(zero). Second, while the order of words could be lexicographic, we can lower the amount of work by ordering words by the rarest-first.
More slides like this

## Slide #13.

Bid Example For each word w, in the sorted order: 1. Using w as the hash-key for the table of partially matched bids, find those bids having w as key. 2. For each such bid b, if w is the last word of b, move b to the table of matched bids. 3. If w is not the last word of b, add 1 to b’s status, and rehash b using the word whose position is one more than the new status, as the hash-key. 4. Using w as the hash key for the table of all bids, find those bids for which w is their first word in the sorted order. 5. For each such bid b, if there is only one word on its list, copy it to the table of matched bids. 6. If b consists of more than one word, add it, with status 1, to the
More slides like this

## Slide #14.

MapReduce Solution to Graph Model The MapReduce solution to the graph model is used to decrease the communication and the movement of large data The solution utilized a modified join, semi-join before processing. A semijoin removed all R’s that do not have a corresponding b value in S for a dataset R(a,b) and S(b,c) If a is really large, this will reduce the computation of the MapReduce
More slides like this

## Slide #15.

The Bulk-Synchronous Solution to Graph Model Like the MapReduce solution, the Bulk-Synchronous Solution to the Graph Model reduces the amount of communications between nodes by using a semijoin but it also is done in parallel In the Bulk-Synchronous solution to the Graph Model for a set R(a, b) and S(b, Creates a graph node for each tuple Create a graph node for each value of b For each tuple, connect/send messages to the corresponding value of b Send messages if the b node has at least one S node and R node. After this, it will do a MapReduce like process for all non-dangling tuples.. Some examples of Bulk-Synchronous Systems are: Pregel - Google’s Giraph - opensource Pregel build on Hadoop GraphX - for sparks GraphLab - is built to handle higher degree nodes
More slides like this

## Slide #16.

Video
More slides like this

## Slide #17.

Thank You!
More slides like this