Abstract
1. Introduction
The search process begins with a query - an internet user entering a phrase into the search engine. Search engines use proprietary approaches to analyze ads and the corresponding bids to determine which ads are to be returned in response to a specific search query. Ads returned in response to a query may be reviewed by the internet user who will decide whether or not to click on one or more links, based on its appropriateness from their perspective. If the searcher clicks on a link, s/he will leave the search engine and enter into advertiser's website. If the searcher decides not to click, he can either exit the search engine or enter a new query. Advertisers are only charged if an internet user clicks on one of their ads with most ad auctions operating under some form of a quality adjusted second price auction. We focus our analysis on hospitality related searches as discussed in [1] over 80% of all online travel related purchases are proceeded by some form of search with the average user performing over 20 searches per purchase.
Previous research in search engine advertising is divided into two main areas: analytical research and empirical research. Analytical approaches focus on analyzing the bidding mechanism, the resulting ad position and equilibrium requirements. For example, Aggarwal et.al examine the ranking problem of paid advertisements and the associated pricing mechanism, under a setting where advertisers specify a bid, and express their preference for positions in the list of advertisement[2]. In their paper, they show that the auction has an envy-free or symmetric Nash Equilibrium and indicate equilibrium conditions. Varian analyzes the equilibrium of a game based on the advertisement auction used by Google and Yahoo[3].
Since the value of a link is due to consumers clicking on the links and making purchases, it is natural to assume that consumer behavior is affected by the process by which links are displayed. Based on this assumption, Chen and He integrated consumer behavior into their analytical analysis[4]. They study a product differentiation model where consumers can learn about advertiser products through costly search. They assume that before their search, consumers are uncertain about the desirability and valuation of advertiser's product. In equilibrium, they show that advertisers bid more for placement when the product is more relevant for a given keyword. Similarly, in [5], Athey and Ellison theoretically analyze several search formats illustrating how the design of the advertising auction marketplace affects overall welfare.
In [6], Katona and Sarvary model search engine advertising by focusing on the interaction between organic search and paid search, and the inherent differences in click-through rates between advertisers. They find that both of these characteristics have a significant effect on advertiser bidding behavior and the equilibrium price on the paid links.
In [7], Ghose and Yang empirically analyze keyword performance using a simultaneous equations approach, estimating a hierarchical Bayesian model via MCMC methods. As a system, they consider internet users, advertisers, and the search engine. They find that monetary value of a click is not uniform across all positions owing to rank dependent conversion rates (highest at the top and decreasing with rank). Work by Yang and Ghose considers the situation when the same firm simultaneously appears in both organic search and paid search[8]. In this paper, they focus on understanding the relationship between the presence of organic listings and the click-through rates of paid search advertisement. By building a Bayesian estimation model, they find that click-through on organic search has a positive interdependence with click-through on paid search. Similarly Rutz and Bucklin build a dynamic linear model to observe consumer search behavior, investigating the relationship between generic search and branded search [9]. They find that generic search activity positively affects future branded search activity through awareness of relevance, which means generic search has a spillover effect on branded search. However, branded search does not affect generic search, which means this spillover effect is asymmetric.
In the following paper we build on this literature with a focus on paid search. Using a very unique data set we jointly model ad rank (as controlled by the search engine) and user click-through as impacted by advertiser behavior. Ad rank is a function of the ad as well as how much the advertiser bids. We model click-through as a function of the ad (and its quality) as well as its rank and the type of search. We differentiate branded versus generic search and illustrate the bidding strategies for advertisers as a function of search and keyword specifics. In the following section we discuss our data followed by our modeling approach and results and then summarize with bidding strategies and implications.
2. Data
Unlike prior pay-per-click research we do not use aggregated data nor do we use data from a single advertiser, but rather we have impression and click data for all advertisers for a series of consumer searches. Our dataset includes impression and click records for sponsored search engine advertising in the Chinese hospitality industry. The dataset collected in January of 2012 by a leading Chinese search engine contains 1,440,660 impressions generating 183,654 clicks resulting in an average click-through rate of 12.75% across 62,253 keywords from 1,037 advertisers. Advertisers have known quality scores (ad quality) as determined by the search engine. While the search engine does not disclose the exact determination behind ad quality it asserts it is a function of the ads landing page, its click-through rate and the relevance of the ad to the search (basically how well it matches the keywords). As a result of this process not only does ad quality very across advertisers but it also varies across ads for an individual advertiser as an advertiser will have different ad quality scores for different ads (as a function of the landing page, click-through. search terms, etc…) for example in our sample of 1,037 advertisers had an average coefficient of variation of ad quality (standard deviation of ad quality divided by average ad quality) of 0.305 indicating considerable variance in ad quality scores for individual advertisers. Unlike Google, Yahoo and Bing, who display sponsored search on the left hand side of the search results and sometimes at the top of the search results, the search engine in question embeds its sponsored search within the general search results. Embedding sponsored search within organic results in elevated click-through-rates (CTR). We summarize CTR behavior in a series of tables.
Table I summarizes CTRs, cost-per-click (CPC), bids and page quality for advertisers as a function of display position. As can be seen from the table, the search engine is placing higher quality ads near the top, with these advertisers paying for that higher position with elevated bids. It is important to note that CPCs are less than bids reflecting the second price nature of the auction. Advertisers jockey for higher ad placement as a result of the decreasing (with increasing rank) CTRs.

Table II separates advertisers into two categories: intermediary or third parties and hotel companies. Hotel companies being individual hotels, hotel management companies, brands and ownership groups with third parties predominantly intermediaries whom attempt to sell hotel rooms (for a commission) on behalf of hotel companies. As illustrated in Table II the third parties have elevated bids producing higher CPC while generating higher CTRs.
Tables III and IV highlight the impacts of keywords, branded versus generic searched by users and keyword match choices (exact, phrase and broad) as specified by advertisers. The CTRs as summarized in Tables III and IV indicate the search engine will probably give preference (in position) to match types with higher CTRs. Similarly the keyword type impact on CTRs indicates preference for branded searches by advertisers. Table V further refines CTRs by advertiser, search and match type.
3. Model setup and estimation
Paid search advertisers need to effectively estimate CTRs, or the probability of a click, generated by queries to determine how much they should bid on a keywords, and what types of keywords they should consider. The difficulty in advertisers modeling click behavior stems from the search engine's active management of ad position in an effort to maximize its revenue. In an effort to account for advertiser, search engine, and user behavior we build a joint probability model to estimate the probability of internet user click. We then use this modeling framework to shed some light on aggregate-level advertiser bidding strategy and keyword design strategy. We utilize a binary logit model to estimate the click behavior of consumers given the search results displayed, followed by an ordered-logit model to estimate the ranking decisions of the search engine and then we will expand our model into the joint estimation of these two models.




3.1 The click function
An internet user faced with query results faces a series of click, no-click decisions for set of ads returned by the search engine. If we assume an internet user's click decision is not impacted by prior click decisions we can consider a simple binary logit choice model to model this click decision. From an advertisers standpoint we don't necessarily have information (ad quality, ranking performance, and bidding strategy) on other co-listed advertisements, precluding us from building a multinomial logit model (MNL). Furthermore, given diverse internet user search phrases, it will be even more difficult for an advertiser to figure out what exact advertisers are listed for each display. Therefore, we argue that it would be more appropriate to build a binary logit model, instead of a MNL model.
If we model internet user 'S click utility as where is the deterministic portion of the utility and is the error term, which follows the i.i.d Type I extreme value distribution. Then given the utility function, the binary logit choice model of a click is given as:
Let be the probability a user clicks on ad with the consumer's utility or propensity to click modeled as a function of attributes of the search and the advertisement. One of the issues in modeling click behavior is the dependence of CTRs upon search engine controlled ad position as shown in Table I where CTRs dramatically decrease with lower ad position. Table VI summarizes two binary logistic regression models of user click behavior. The table includes estimates for two models, one with Rank as an attribute and the second without Rank. As illustrated in Table I CTRs decrease with ad position or rank, but rank is outside the control of the advertiser and as such makes little sense to include in a model used to determine optimal keyword bidding strategies. Unfortunately as rank appears to be a strong determinant of user click behavior its exclusion from modeling CTRs results in biased parameter estimates, for example with the removal of Rank, Accurate keyword matches result in lower utility (and lower CTR) than Phrase (0.2832 > 0.324) where appropriate relative values of parameter estimates result with Rank in model (0.4441 > 0.3839). Similarly the impact of Bid is also impacted by the exclusion of Rank, this is best illustrated by looking at the Odds ratio impact of Bid where the Odds ratio impact is exp(parameter estimate). With exclusion of rank the odds ratio increases to 1.0890 from 1.0186 owing to the correlation between Bid and Rank (-0.35, )and the direct result of the search engine utilizing Bid as a key driver in ad display position. It is this need to include ad rank in our modeling of CTR combined with the control of rank by the search engine that leads us to jointly model click and rank, where the probability of ad rank becomes the output of an ordered logit model and the input into the binary logit model - in essence including the importance of Rank by realizing its exogenous value as controlled by search engine but influenced by our behavior.

3.2 The rank function
When an advertiser's ad link is chosen to be displayed in the search results, the search engine will assign it a slot, its ranking position within the paid ads display. In this position allocation process, search engines will not randomly assign the ranking to a paid ad link; instead, they will give a priority to each position as they assign the rankings to different paid ad links to better match user queries as well as generate the search engine revenue upon the user's click. Without knowing the exact method behind the search engine's ranking algorithm we build an ordered-logit model to estimate the probability of paid ad rank.
From the advertiser's standpoint, since they don't know the comprehensive algorithm that search engines use for the ranking assignment, there is an unobserved index that could reflect the ranking situation. We denote this unobserved variable as . As the search engine is most likely deploying some sort of quality adjusted second price auction we consider that the two major factors that influence the rank are most likely ad quality and bidding price with other factors (keyword match) potentially also factoring into the search engines algorithm. The rank function of a vector of attributes is given as:
. Thus, the ordered-logit model is developed in Equation 2. In which:
Where represents the rank of a paid advertisement with is a set of cut-offs. In these functions, denotes query. represents the total number of internet user queries. represents the position of the paid ad when it displayed.
3.3 The joint probability function
Owing to the exogenous (to the advertiser) nature of rank, combined with its impact upon CTRs, we develop a joint probability model to more accurately model CTRs. Using the binary logit model for CTRs, conditioned on ad rank (and other attributes) in concert with the ordered logit model for ad rank. In the conditional probability of click function, rank or position of the ad will be one of the key factors driving click behavior as evidenced by the decreasing CTRs with increasing rank as illustrated in Table I.
In the rank function, we consider factors observable by the search engine that it may use in its allotment procedure. Assuming the decision made by the search engine on ranking allocation is independent from the current click decision made by internet users we can generate the unconditional probability of click by multiplying the two logit functions: In which:
3.4 Estimation of the joint probability function
We use traditional Maximum Likelihood Estimation (MLE) to estimate the joint probability model. The likelihood as a result of multiplying each internet user's click decision is:
Resulting in log-likelihood,
We can then maximize (as a function of parameter estimates) this log-likelihood function to determine parameter estimates. Owing to the complexity of the log-likelihood function we numerically maximize using the quasi-Newton Broyden Fletcher Goldfarb Shanno (BFGS) routine in R. We can approximate standard errors for the parameter estimates by inverting the approximated Hessian in the BFGS routine, with the diagonals of the inverted Hessian being the standard errors of the estimates.
3.5 Model fit and parameter estimates
We develop a consumer click-through model based upon the earlier descriptive statistics. Our independents include Rank (values 1 through 10), an indicator if if the advertiser is hotel (versus an intermediary) and indicators for Branded keywords with Branded Self a 1 for a branded search of the advertiser in question, Branded Other for a branded search for another advertise with both indicators a 0 for a generic search. As summarized in Table VII and as anticipated from Tables II–IV utility (and hence CTRs) are lower for Hotels. The Hotel dummy variable coefficient of −0.591567 results in an odds ratio of 0.553459 (exp(−0.591567)) indicating that all else being equal Hotels have odds of a click 0.553459 times those of an intermediary. Similarly the −0.375387 coefficient estimate for Rank results in an odds ratio of 0.68702 (exp(−0.375387)) indicating as an ad moves lower down the search results (e.g from 1 or 2 or 5 to 6) that each reduced position reduces the odds of being clicked by 0.31298 (1–0.68702). Similarly the Rank model illustrates higher quality landing pages (as measured by AdQuality) lead to higher placement with the ad list, as do increased Bids. Keyword match types are modeled with two indicators, with both indicators being a 0 for broad searches and rank utility increasing with more focussed matches with phrase and slightly higher yet with accurate matches.
We use the Likelihood Ratio (LR) Statistic as a measure of model fit, with where with P-value .
Table VIII summarizes average CTRs as calculated using parameter estimates from the model, comparing to Table V indicates modeled average CTRs are very similar to those calculated from the raw data with the exception of accurate self branded keywords for intermediaries. The difference in these averages most likely stems from limited searches for these very specific searches (i.e. consumers searching for Expedia.com versus simply typing the Expedia.com url). In the following section we illustrate how the model results can be used to illustrate differences in search stratezies and advertiser quality.
4. Discussion
Our joint rank and click models allow the untangling of advertiser characteristics, search specifics and advertiser behavior (bid and match types) to determine their individual impacts upon CTR. As the search engine controls rank, modeled CTR are conditional upon rank. We can create unconditional CTR estimates by integrating over discrete potential ad ranks-. Figure 1 displays unconditional CTR as a function of advertiser Bid and ad quality for a generic intermediary search with a broad match. For the Bid series average ad quality is used, and the average bid is used for the ad quality series. Figure 2 shows CTRs as a function of advertiser bids, again for intermediaries with broad matched generic searches, for ad qualities of 0.1, 1, 5 and 10 in four different series.



Fig. 1.Fig. 1.

Fig. 2.Fig. 2.
These unconditional CTRs allow advertisers to now estimate the financial impact of ad quality, for instance from Figure 2 a CTR of 0.12 would require a bid of $4.10 from an advertiser of Ad Quality 0.1, $3.50 with quality of 1 and only $0.30 from an advertiser with quality of 5 as the poorer quality advertiser needs to bid more to improve rank (and CTR) where as the higher quality advertiser naturally receives better position (at a reduced cost). The corresponding bids for a hotel advertiser are $7.70, $7.10 and $3.90 for qualities of 0.1, 1 and 5 respectively. The search engine is in essence maximizing their expected revenue, only placing poorer quality ads near the top of the sort if they are appropriately compensated. These differences in costs decrease with more refined matches, decrease (increase) with more (less) specific keywords, i.e. are lower for accurate matches of branded keywords but higher for branded keywords of your competing advertisers.
To our knowledge ours is the first paper to use disaggregated data at user query level - i.e. we have full details on all displayed advertisers resulting from a user search. This unique data allows us to measure the impacts of advertiser behavior while controlling for the search engine which is trying to maximize its (expected) revenue. As illustrated in Table VI not controlling for search engine controlled rank results in biased parameter estimates for advertisers. At present our model is not keyword specific, but rather keyword type specific, our framework could easily be extended to the individual keyword level if a larger data set was available with more observations across individual keywords as here our sample of 1,440,660 impressions across 62,253 keywords results in only a little more than 23 impressions per keyword.
Footnotes
References
- [1]C. Anderson, “Search, otas, and online booking: An expanded analysis of the billboard effect,” Cornell Hospitality Report, vol. 11, no. 8, 2011.
- [2]G. Aggarwal, J. Feldman, and S. Muthukrishnan, “Bidding to the top: Vcg and equilibria of position-based auctions,” in Approximation and Online Algorithms. Springer, 2007, pp. 15–28.
- [3]H. R. Varian, “Position auctions” International Journal of Industrial Organization, vol. 25, no. 6, pp. 1163–1178, 2007.
- [4]Y. Chen and C. He, “Paid placement: Advertising and search on the internet,” The Economic Journal, vol. 121, no. 556, pp. F309–F328, 2011.
- [5]S. Athey and G. Ellison, “Position auctions with consumer search,” The Quarterly Journal of Economics, vol. 126, no. 3, pp. 1213–1270, 2011.
- [6]Z. Katona and M. Sarvary, “The race for sponsored links: Bidding patterns for search advertising,” Marketing Science, vol. 29, no. 2, pp. 199–215, 2010.
- [7]A. Ghose and S. Yang, “An empirical analysis of search engine advertising: Sponsored search in electronic markets,” Management Science, vol. 55, no. 10, pp. 1605–1622, 2009.
- [8]S. Yang and A. Ghose, “Analyzing the relationship between organic and sponsored search advertising: Positive, negative, or zero interdependence?” Marketing Science, vol. 29, no. 4, pp. 602–623, 2010.
- [9]O. Rutz and R. E. Bucklin, “From generic to branded: A model of spillover dynamics in paid search advertising,” Available at SSRN 1024766,. 2008