by Ajay Agrawal
in the interference graph. It was hoped that, by including this term in the pricing formula, the auction would be able to off er higher prices to and buy the rights of stations that pose particularly diffi
cult problems by interfering with many other stations.
How AI and Machine Learning Can Impact Market Design 575
high probability. But how can we know the distribution and how can such
an algorithm be found?
The FCC auction used a feasibility checker developed by a team of Auc-
tionomics researchers at the University of British Columbia, led by Pro-
fessor Kevin Leyton- Brown. There were many steps in the development,
as reported by Newman, Fréchette, and Leyton- Brown (forthcoming), but
here we emphasize the role of machine learning. Auctionomics’ goal was to
be able to solve 99 percent of the problem instances in one minute or less.
The development eff ort began by simulating the planned auction to gen-
erate feasibility problems like those that might be encountered in a real
auction. Running many simulations generated about 1.4 million problem
instances that could be used for training and testing a feasibility- checking
algorithm. The fi rst step of the analysis was to formulate the problem as
mixed integer programs and test standard commercial software—CPLEX
and Gurobi—to see how close those could come to meeting the performance
objectives. The answer was: not close. Using a 100-seconds cutoff , Gurobi
could solve only about 10 percent of the problems and CPLEX only about
25 percent. These were not nearly good enough for decent performance in
a real- time auction.
Next, the same problems were formulated as satisfi ability problems and
tested using seventeen research solvers that had participated in recent SAT-
solving tournaments. These were better, but none could solve as many as
two- thirds of the problems within the same 100-second cutoff . The goal
remained 99 percent in sixty seconds.
The next step was to use automated algorithm confi guration, a proce-
dure developed by Hutter, Hoos, and Leyton- Brown (2011) and applied in
this setting by Leyton- Brown and his students at the University of British
Columbia. The idea is to start with a highly parameterized algorithm for
solving satisfi ability problems4 and to train a random forest model of the
algorithm performance, given the parameters. To do that, we fi rst ran simu-
lated auctions with what we regarded as plausible behavior by the bidders
to generate a large data set of representative problems. Then, we solved
those problems using a variety of diff erent parameter settings to determine
the distribution of solution times for each vector of parameters. This gen-
erated a data set with parameters and performance measures. Two of the
most interesting performance characteristics were the median run time and
4. There are no known algorithms for NP- complete problems that are guaranteed to be fast, so the best existing algorithms are all heuristics. These algorithms weight various characteristics of the problem to decide about such things as the order in which to check diff erent branches of a search tree. These weights are among the parameters that can be set and adapted to work well for a particular class of problems, such as those that arise in the incentive auction application.
The particular software algorithm that we used was CLASP, which had more than 100 exposed parameters that could be modifi ed.
576 Paul R. Milgrom and Steven Tadelis
the fraction of instances solved within one minute. Then, using a Bayesian
model, we incorporated uncertainty in which the experimenter “believes”
that the actual performance is normally distributed with a mean determined
by the random forest and a variance that depends on the distance of the
parameter vector from the nearest points in the data set. Next, the system
identifi es the parameter vector that maximizes the expected improvement in
performance, given the mean and variance of the prior and the performance
of the best- known parameter vector. Finally, the system tests the actual
performance for the identifi ed parameters and adds that as an observation
to the data set. Proceeding iteratively, the system identifi es more parameters
to test, investigates them, and adds them to the data to improve the model
accuracy until the time budget is exhausted.
Eventually, this machine- learning method leads to diminishing returns to
time invested. One can then create a new data set from the instances on which
the parameterized algorithm was “slow,” for example, taking more than fi f-
teen seconds to solve. By training a new algorithm on those instances, and
running the two parameterized algorithms in parallel, the machine- learning
techniques led to dramatic improvements in performance.
For the actual auction, several other problem- specifi c tricks were also
applied to contribute to the speed-up. For example, to some extent it proved
possible to decompose the full problem into smaller problems, to reuse old
solutions as starting points for a search, to store partial solutions that might
help guide solutions of further problems, and so on. In the end, the full set
of techniques and tricks resulted in a very fast feasibility checker that solved
all but a tiny fraction of the relevant problems within the allotted time.
23.3 Using AI to Promote Trust in Online Marketplaces
Online marketplaces such as eBay, Taobao, Airbnb, and many others
have grown dramatically since their inception just over two decades ago,
providing businesses and individuals with previously unavailable opportu-
nities to purchase or profi t from online trading. Wholesalers and retailers
can market their goods or get rid of excess inventory; consumers can easily
search marketplaces for whatever is on their mind, alleviating the need for
businesses to invest in their own e-commerce website; individuals transform
items they no longer use into cash; and more recently, the so called “gig
economy” is comprised of marketplaces that allow individuals to share their
time or assets across diff erent productive activities and earn extra income.
The amazing success of online marketplaces was not fully anticipated,
primarily because of the hazards of anonymous trade and asymmetric infor-
mation. Namely, how can strangers who have never transacted with one
another, and who may be thousands of miles apart, be willing to trust each
other? Trust on both sides of the market is essential for parties to be willing
to transact and for a marketplace to succeed. The early success of eBay is
How AI and Machine Learning Can Impact Market Design 577
often attributed to the innovation of introducing its famous feedback and
reputation mechanism, which was adopted in one form or another by practi-
cally every other marketplace that came after eBay. These online feedback
and reputation mechanisms provide a modern- day version of more ancient
reputation mechanisms used in the physical marketplaces that were the
medieval trade fairs of Europe (see Milgrom, North, and Weingast 1990).
Still, recent studies have shown that online reputation measures of mar-
ketplace sellers, which are based on buyer- generated feedback, don’t accu-
rately refl ect their actual performance. Indeed, a growing literature has
shown that user- generated feedback mechanisms are often biased, suff er
from “grade infl ation,” and can be prone to manipulation by sellers.5 For
example, the average percent positive for sellers on eBay is about 99.4 per-
cent, with a median of 100 percent. This causes a challenge to interpret the
true levels of satisfaction on online marketplaces.
A natural question emerges: Can online marketplaces use the treasure
trove of data it collects to measure the quality of a transaction and predict
which sellers will provide a better service to their buyers? It has become
widely known that all online marketplaces, as well as other web- based ser-
vices, collect vast amounts of data as part of the process of trade. Some
refer to this as the “exhausts data” generated by the millions of transactions,
searches, and browsing that occur on these marketplaces daily. By leverag-
ing this data, marketplaces can create an environment that would promote
trust, not unlike the ways in which institutions emerged in the medieval trade
fairs of Europe that helped foster trust. The scope for market design goes
far beyond the more mainstream application like setting rules of bidding
and reserve prices for auctions or designing tiers of services, and in our view,
includes the design of mechanisms that help foster trust in marketplaces.
What follows are two examples from recent research that show some of the
many ways that marketplaces can apply AI to the data they generate to help
create more trust and better experiences for their customers.
23.3.1 Using AI to Assess the Quality of Sellers
One of the ways that online marketplaces help participants build trust
is by letting them communicate through online messaging platforms. For
example, on eBay buyers can contact sellers to ask them questions about
their products, which may be particularly useful for used or unique products
for which buyers may want to get more refi ned information than is listed.
Similarly, Airbnb allows potential renters to send messages to hosts and ask
questions about the property that may not be answered in the original listing.
Using Natural Language Processing (NLP), a mature area in AI, market-
5. On bias and grade infl ation see, for example, Nosko and Tadelis (2015), Zervas, Proserpio, and Byers (2015), and Filippas, Horton, and Golden (2017). On seller manipulation of feedback scores see, for example, Mayzlin, Dover, and Chevalier (2014) and Xu et al. (2015).
578 Paul R. Milgrom and Steven Tadelis
places can mine the data generated by these messages in order to better
predict the kind of features that customers value. However, there may also
be subtler ways to apply AI to manage the quality of marketplaces. The
messaging platforms are not restricted to pretransaction inquiries, but also
off er the parties to send messages to each other after the transaction has been completed. An obvious question then emerges: How could a marketplace
analyze the messages sent between buyers and sellers post the transaction to
infer something about the quality of the transaction that feedback doesn’t
seem to capture?
This question was posed and answered in a recent paper by Masterov,
Mayer, and Tadelis (2015) using internal data from eBay’s marketplace. The
analysis they performed was divided into two stages. In the fi rst stage, the
goal was to see if NLP can identify transactions that went bad when there
was an independent indication that the buyer was unhappy. To do this, they
collected internal data from transactions in which messages were sent from
the buyer to the seller after the transaction was completed, and matched it
with another internal data source that recorded actions by buyers indicat-
ing that the buyer had a poor experience with the transactions. Actions that
indicate an unhappy buyer include a buyer claiming that the item was not
received, or that the item was signifi cantly not as described, or leaves nega-
tive or neutral feedback, to name a few.
The simple NLP approach they use creates a “poor- experience” indica-
tor as the target (dependent variable) that the machine- learning model will
try to predict, and uses the messages’ content as the independent variables.
In its simplest form and as a proof of concept, a regular expression search
was used that included a standard list of negative words such as “annoyed,”
“dissatisfi ed,” “damaged,” or “negative feedback” to identify a message as
negative. If none of the designated terms appeared, then the message was
considered neutral. Using this classifi cation, they grouped transactions into
three distinct types: (a) no posttransaction messages from buyer to seller,
(b) one or more negative messages, or (c) one or more neutral messages with
no negative messages.
Figure 23.2, which appears in Masterov, Mayer, and Tadelis (2015),
describes the distribution of transactions with the diff erent message classi-
fi cations together with their association with poor experiences. The x-axis of
fi gure 23.1 shows that approximately 85 percent of transactions fall into the
benign fi rst category of no posttransaction messages. Buyers sent at least one
message in the remaining 15 percent of all transactions, evenly split between
negative and neutral messages. The top of the y- axis shows the poor expe-
rience rate for each message type. When no messages are exchanged, only
4 percent of buyers report a poor experience. Whenever a neutral message is
sent, the rate of poor experiences jumps to 13 percent, and if the message’s
content was negative, over one- third of buyers express a poor experience.
In the second stage of the analysis, Masterov, Mayer, and Tadelis (2015)
How AI and Machine Learning Can Impact Market Design 579
Fig. 23.2 Message content and poor experiences on eBay
Source: Masterov et al. 2015. ©2015 Association for Computing Machinery, Inc. Reprinted by permission. https://doi.org/10.1145/2764468.2764499.
used the fact that negative messages are associated with poor experiences
to construct a novel measure of seller quality based on the idea that sellers
who receive a higher frequency of negative messages are worse sellers. For
example, imagine that seller A and seller B both sold 100 items and that seller
A had fi ve transactions with at least one negative message, while seller B had
eight such transactions. The implied quality score of seller A is then 0.05
while that of seller B is 0.08, and the premise is that seller B is a worse seller
than seller A. Masterov, Mayer, and Tadelis (2015) show that the relation-
ship between this ratio, which is calculated for every seller at any point in
time using aggregated negative messages from past sales, and the likelihood
that a current transaction will result in a poor experience, is monotonically
increasing.
This simple exercise is a proof of concept that shows that by using the
message data and a simple natural language processing AI procedure, they
were able to better predict which sellers will create poor experiences than one
can infer from the very infl ated feedback data. eBay is not unique in allowing
<
br /> the parties to exchange messages and the lessons from this research are easily
generalizable to other marketplaces. The key is that there is information in
580 Paul R. Milgrom and Steven Tadelis
communication between market participants, and past communication can
help identify and predict the sellers or products that will cause buyers poor
experiences and negatively impact the overall trust in the marketplace.
23.2.2 Using AI to Create a Market for Feedback
Aside from the fact that feedback is often infl ated as described earlier,
another problem with feedback is that not all buyers choose not to leave
feedback at all. In fact, through the lens of mainstream economic theory, it
is surprising that a signifi cant fraction of online consumers leave feedback.
After all, it is a selfl ess act that requires time, and it creates a classic free- rider problem. Furthermore, because potential buyers are attracted to buy from
sellers or products that already have an established good track record, this
creates a “cold- start” problem: new sellers (or products) with no feedback
will face a barrier- to-entry in that buyers will be hesitant to give them a fair
shot. How could we solve these free- rider and cold- start problems?
These questions were analyzed in a recent paper by Li, Tadelis, and Zhow
(2016) using a unique and novel implementation of a market for feedback
on the huge Chinese marketplace Taobao where they let sellers pay buyers
to leave them feedback. Naturally, one may be concerned about allowing
sellers to pay for feedback as it seems like a practice in which they will only
pay for good feedback and suppress any bad feedback, which would not add
any value in promoting trust. However, Taobao implemented a clever use
of NLP to solve this problem: it is the platform, using an NLP AI model,
that decides whether feedback is relevant and not the seller who pays for the
feedback. Hence, the reward to the buyer for leaving feedback was actually
managed by the marketplace, and was handed out for informative feedback
rather than for positive feedback.
Specifi cally, in March 2012, Taobao launched a “Rebate- for- Feedback”