Airbnb had a big presence on the 2024 KDD convention hosted in Barcelona, Spain. Our Information Scientist and Engineers introduced on subjects like Deep Studying & Search Rating, On-line Experimentation & Measurement, Product High quality & Buyer Journey, and Two-sided Marketplaces. This weblog submit summarizes our contributions to KDD for 2024 and supplies entry to the educational papers introduced in the course of the convention.
Authors: Huiji Gao, Peter Coles, Carolina Barcenas, Sanjeev Katariya
KDD (Information and Information Mining) is without doubt one of the most prestigious international conferences in information mining and machine studying. Hosted yearly by a particular curiosity group of the Affiliation for Computing Equipment (ACM), it’s the place attendees study a number of the most ground-breaking AI developments in information mining, machine studying, information discovery, and large-scale information analytics.
This 12 months, the thirtieth KDD convention was held at Barcelona, Spain, attracting 1000’s of researchers and scientists from academia and trade. Numerous corporations contributed to and attended the convention together with Google, Meta, Apple, Amazon, Airbnb, Pinterest, LinkedIn, Reserving, Expedia, ByteDance and so on. There have been 151 Utilized Information Science (ADS) observe papers and 411 Analysis observe papers accepted, 34 tutorials, and 30 workshops.
Airbnb had a big presence at KDD 2024 with three full ADS observe papers (acceptance charge underneath 20%), one workshop, and 7 workshop papers and invited talks accepted into the primary convention proceedings. The subjects of our work spanned Deep studying & Search Rating, On-line Experimentation & Measurement, Causal Inference & Machine Studying, and Two-sided Marketplaces.
On this weblog submit, we’ll summarize our groups’ contributions and share highlights from an thrilling week-long convention with analysis and trade talks, workshops, panel discussions, and extra.
Clever search rating — the method of precisely matching a visitor with an inventory based mostly on their choice, an inventory’s options, and extra search context — nonetheless stays a nuanced problem that researchers are always making an attempt to resolve.
Making optimum guest-host matches has remained a problem in a two-sided market for quite a lot of causes — the timespan of visitor searches (ranging between days and weeks), unpredictable host conduct and rankings (the potential for hosts to cancel a reserving or obtain low rankings), and restricted understanding of visitor choice throughout a number of interfaces. We printed a number of papers addressing the problem of search rating as a part of our presence at KDD.
Studying to Rank for Maps at Airbnb
Airbnb brings collectively hosts who hire listings to potential company from across the globe. Outcomes from a visitor’s seek for listings are displayed primarily via two interfaces: (1) as an inventory of rectangular playing cards that include on them the itemizing picture, value, ranking, and different particulars, known as list-results, and (2) as oval pins on a map exhibiting the itemizing value, referred to as map-results. Each these interfaces, since their inception, have used the identical rating algorithm that orders listings by their reserving possibilities and selects the highest listings for show.
Nonetheless, a number of the primary assumptions underlying rating are constructed for a world the place search outcomes are introduced as lists and easily break down for map-results. On this work, we rebuilt rating for maps by revising the mathematical foundations of how customers work together with map search outcomes. Our iterative and experiment-driven strategy led us via a path filled with twists and turns, ending in a unified concept for the 2 interfaces.
Our journey exhibits how assumptions taken as a right when designing machine studying algorithms might not apply equally throughout all consumer interfaces, and the way they are often tailored. The web influence was one of many largest enhancements in consumer expertise for Airbnb which we focus on as a sequence of experimental validations. The work launched on this paper is merely the start of future thrilling analysis tasks, similar to making studying to rank unbiased for map-results and demarcating the map pins to direct the consumer consideration in the direction of extra related ones.
Multi-objective Studying to Rank by Mannequin Distillation
In on-line marketplaces, the target of search rating shouldn’t be solely on optimizing buying or conversion charge (main goal), but in addition the acquisition outcomes (secondary targets), e.g. order cancellation, evaluation ranking, customer support inquiries, platform long run progress. To steadiness these main and secondary targets, a number of multi-objective studying to rank approaches have been extensively studied
Conventional approaches in industrial search and recommender techniques encounter challenges similar to costly parameter tuning that results in sub-optimal options, affected by imbalanced information sparsity points, and lack of compatibility with ad-hoc targets. On this work, we suggest a distillation-based rating resolution for multi-objective rating, which optimizes the end-to-end rating system at Airbnb throughout a number of rating fashions on completely different targets, together with numerous issues to optimize coaching and serving effectivity that meets trade requirements.
In contrast with conventional approaches, the proposed resolution not solely considerably meets and will increase the first goal of conversion by a big margin, but in addition addresses the secondary goal constraints whereas enhancing mannequin stability. Moreover, we demonstrated the proposed system could possibly be additional simplified by mannequin self-distillation. We additionally did extra simulations to indicate that this strategy may assist us effectively inject ad-hoc non-differentiable enterprise targets into the rating system, whereas enabling us to steadiness our optimization targets.
On-line experimentation (e.g., A/B testing) is a typical manner for organizations like Airbnb to make data-driven choices. However excessive variance is continuously a problem. For instance, it’s exhausting to show {that a} change in our search UX will drive worth as a result of bookings could be rare and rely on numerous interactions over a protracted time frame.
Metric Decomposition in A/B Exams
Greater than a decade in the past, CUPED (Managed Experiments Using Pre-Experiment Information) mainstreamed the thought of variance discount leveraging pre-experiment covariates. Since its introduction, it has been applied, prolonged, and modernized by main on-line experimentation platforms. Regardless of the vast adoption, it’s recognized by practitioners that the variance discount charge from CUPED, using pre-experimental information, varies case by case and has a theoretical restrict. In concept, CUPED could be prolonged to reinforce a therapy impact estimator using in-experiment information, however sensible steering on methods to assemble such an augmentation is missing.
On this work, we fill this hole by proposing a brand new route for sensitivity enchancment by way of therapy impact augmentation, whereby a goal metric of curiosity is decomposed into
two or extra elements in an try and isolate these with excessive sign and low noise from these with low sign and excessive noise. We present via concept, simulation, and empirical examples that if such a decomposition exists (or could be engineered), sensitivity could also be elevated by way of roughly null augmentation (in a frequentist setting) and diminished posterior variance (in a Bayesian setting).
We offer three actual world purposes demonstrating completely different flavors of metric decomposition. These purposes illustrate the achieve in agility metric decomposition yields relative to an un-decomposed evaluation, indicating each empirically and theoretically the worth of this follow in each frequentist and Bayesian settings. An necessary extension to this work could be to subsequent contemplate pattern dimension willpower in each the frequentist or Bayesian contexts; whereas a lift in sensitivity usually means much less information is required for a given evaluation, a strategy that determines the smallest pattern dimension required to regulate numerous working traits on this context could be of sensible worth.
Airbnb staff hosted a workshop on Two-sided Market Optimization: Search, Pricing, Matching & Progress. This workshop introduced practitioners of two-sided marketplaces collectively and mentioned the evolution of content material rating, advice techniques, and information mining when fixing for producers and customers on these platforms.
Two-sided marketplaces have lately emerged as viable enterprise fashions for a lot of real-world purposes. They mannequin transactions as a community with two distinct sorts of individuals: one kind to symbolize the availability and one other the demand of a particular good. Historically, analysis associated to on-line marketplaces targeted on methods to higher fulfill demand. However with two-sided marketplaces, there may be extra nuance at play. Fashionable international examples, like Airbnb, function platforms the place customers present providers; customers could also be hosts,or company. Such platforms should develop fashions that tackle all their customers’ wants and targets at scale. Machine learning-powered strategies and algorithms are important in each side of such complicated, internet-scale-sized, two-sided marketplaces.
Airbnb is a group based mostly on connection and belonging–we attempt to attach folks and locations. Our contributions to this workshop showcase the work we’re doing to assist this mission by optimizing visitor experiences, discovering equilibrium spots for itemizing costs, lowering the incidence of poor interactions (and buyer assist prices as a aspect impact), detecting when operational workers ought to observe up on exercise at scale, and extra.
Visitor Intention Modeling for Personalization
Airbnb has remodeled the best way folks journey by providing distinctive and personalised stays in locations worldwide. To offer a seamless and tailor-made expertise, understanding consumer intent performs an necessary function.
Nonetheless, restricted consumer information and unpredictable visitor conduct could make it obscure the important intent from company on listings from hosts. Our work exhibits how we strategy this difficult drawback. We describe how we apply a deep studying strategy to foretell difficult-to-infer particulars for a consumer’s journey plan, similar to the following vacation spot and journey dates. The framework analyzes high-level data from customers’ in-app shopping historical past, reserving historical past, search queries, and different engagement indicators, and produces a number of consumer intent indicators.
Advertising and marketing emails, versatile journey search (e.g., for “Europe in the summertime”), and proposals on the app residence web page are three visitor interactions that profit from right intention modeling. Hosts additionally profit, since a transparent understanding of visitor demand can assist them optimize listings to extend satisfaction and bookings.
Hosts can discover it troublesome to accurately value their listings in two-sided marketplaces serviced by finish customers. Most hosts usually are not skilled hospitality staff, and would profit from entry to information and recommendation on how company see their listings and the way they examine to different listings of their neighborhood. We always search for methods to offer steering on how hosts can optimally value their listings. The identical data can then be used to assist company discover their very best keep.
In our paper, we introduced an instance of how this drawback could be solved on the whole.
As illustrated above, each demand and provide change over time, influencing the equilibrium value for a property at a particular level. A historic optimum (similar to A above) needs to be adjusted to search out the present optimum (level C). It’s troublesome to run experiments since any large-scale experiment we would run will trigger the surroundings to alter in complicated methods. We deal with this drawback by combining financial modeling with causal inference strategies. We section company and estimate how price-sensitive every visitor section is, and fine-tune them with empirical information from small focused experiments and larger-scale pure ones, that are used to regulate estimates for the value sensitivity of every visitor section. Hosts can then use the fashions’ output to make knowledgeable tradeoffs between increased occupancy and better nightly charges.
Itemizing Embedding for Host-side Merchandise
With the intention to facilitate the matching of listings and company, Airbnb supplies quite a few services to each hosts and company. Many of those instruments are based mostly on the power to check listings, i.e. discovering related listings or listings which may be seen as equal substitutes. Our work presents a examine on the appliance and studying of itemizing embeddings in Airbnb’s two-sided market. Particularly, we focus on the structure and coaching of a neural community embedding mannequin utilizing visitor aspect engagement information, which is then utilized to host-side product surfaces. We tackle the important thing technical challenges we encountered, together with the formulation of unfavourable coaching examples, correction of coaching information sampling bias, and the scaling and dashing up coaching with the assistance of in-model caching. Moreover, we focus on our complete strategy to analysis, which ranges from in-batch metrics and vocabulary-based analysis to the properties of comparable listings. Lastly, we share our insights from using itemizing embeddings in Airbnb merchandise, similar to host calendar related listings.
Buyer Assist Optimization in Search Rating
As of the date of the paper, Airbnb had greater than 7.7 million listings from greater than 5 million hosts worldwide. Airbnb is investing each in speedy progress and in ensuring that the reserving expertise is nice for hosts and company. It might, nevertheless, be very best to keep away from poor experiences within the first place. Our work highlights how we forestall poor experiences with out considerably lowering progress.
We use the mass of amassed assist information at Airbnb to mannequin the chance that, if the present consumer have been to guide an inventory, they might require CS assist. Our mannequin found a number of options concerning the searcher, residence, and hosts that precisely predict CS necessities. For instance, same-day bookings are likely to require extra assist, and a responsive host tends to cut back assist wants. So, if a visitor chooses a same-day reserving, matching them with a extremely responsive host can result in a greater expertise total. We incorporate the output of our CS assist mannequin in search end result rankings; booked properties will generally rank decrease if we predict a reserving will result in a unfavourable expertise.
LLM Pretraining utilizing Exercise Logs
It’s typically necessary to observe up with customers after they’ve had a protracted sequence of interactions with a two-sided market to assist be sure that their experiences are of top of the range. When consumer interactions meet sure enterprise standards, operations brokers create tickets to observe up with them. For instance, consumer retention and reactivation brokers may evaluation consumer exercise logs and determine to observe up with the consumer, to encourage them to re-engage with the platform.
We suggest reworking structured information (exercise logs) right into a extra manageable textual content format after which leveraging trendy language fashions (i.e., BERT) to pretrain a big language mannequin based mostly on consumer actions. We then carried out fine-tuning on the mannequin utilizing historic information about which customers have been adopted up with and checked its predictions. Our work demonstrates the big language mannequin skilled on pre-processed exercise can efficiently determine when a consumer needs to be adopted up with, at an experimentally important charge. Our preliminary outcomes counsel that our framework might outperform by 80% the typical precision of an analogous mannequin that was designed relying closely on function engineering.
Usually, product high quality is evaluated based mostly on structured information. Buyer rankings, sorts of assist points, decision instances, and different elements are used as a proxy for a way somebody reserving on Airbnb may worth an inventory. This type of information has limitations — extra standard listings have extra information, typically customers don’t go away suggestions, and suggestions is normally biased in the direction of the constructive (customers with unfavourable experiences are likely to churn and never give suggestions).
Within the Workshop on Causal Inference and Machine Studying in Observe, we highlighted an instance of how we push the boundaries of product high quality evaluation strategies and purposes, mixing conventional informal inference with cutting-edge machine studying analysis. In our work “Understanding Product High quality with Unstructured Information: An Utility of LLMs and Embeddings at Airbnb”, we introduced how an strategy based mostly on textual content embeddings and LLMs could be mixed with approaches based mostly on structured information to considerably enhance product high quality evaluations. We generate textual content embeddings on a mixture of itemizing and evaluation texts, then cluster the embeddings based mostly on rebooking and churn charges. As soon as now we have clear clusters, we extract key phrases from the unique information, and use these key phrases to calculate an inventory high quality rating, based mostly on their similarity to the key phrase checklist.
As well as, we have been invited to offer a chat on High quality Foundations at Airbnb, at KDD’s 3rd Workshop on Finish-Finish Buyer Journey Optimization. It’s typically exhausting to distinguish the standard of buyer experiences utilizing easy evaluation rankings, partially as a result of tightness of their distribution. On this discuss, we current an alternate notion of high quality based mostly on buyer revealed choice: did a buyer return to make use of the platform once more after their expertise? We describe how a metric — Visitor Return Propensity (GRP) — leverages this idea and may differentiate high quality, seize platform externalities, and predict future returns.
In follow, this measure is probably not suited to many frequent enterprise use instances as a result of its lagging nature and an incapacity to simply clarify why it has modified. We describe a high quality measurement system that builds on the conceptual basis of GRP by modeling it as an end result of upstream realized high quality indicators. These indicators — from sources like evaluations and buyer assist — are weighted by their influence on return propensity and mapped to a high quality taxonomy to help in explainability. The ensuing rating is able to finely differentiating the standard of buyer experiences, aiding tradeoff choices, and offering well timed insights.
The 2024 version of KDD was an incredible alternative for information scientists and machine studying engineers from throughout the globe and trade, authorities, and academia, to attach and change learnings and discoveries. We have been honored to have the chance to share a few of our information and strategies, generalizing what now we have been studying once we apply machine studying to issues we see at Airbnb. We proceed to concentrate on enhancing our clients’ expertise and rising our enterprise, and the data we’ve shared has been essential to our success. We’re excited to proceed studying from friends and contribute our work again to our group. We eagerly await developments and enhancements that may come about as others construct upon the work we’ve shared.
Beneath, you’ll discover a full checklist of the talks and papers shared on this article together with the crew members who contributed. If any such work pursuits you, we encourage you to use for an open place immediately.
Studying to Rank for Maps at Airbnb (hyperlink)
Authors: Malay Haldar, Hongwei Zhang, Kedar Bellare, Sherry Chen, Soumyadip Banerjee, Xiaotang Wang, Mustafa Abdool, Huiji Gao, Pavan Tapadia, Liwei He, Sanjeev Katariya
Multi-objective Studying to Rank by Mannequin Distillation (hyperlink)
Authors: Jie Tang, Huiji Gao, Liwei He, Sanjeev Katariya
Metric Decomposition in A/B Exams (hyperlink)
Authors: Alex Deng (former worker at Airbnb), Luke Hagar (College of Waterloo), Nathaniel T. Stevens (College of Waterloo), Tatiana Xifara (Airbnb), Amit Gandhi (College of Pennsylvania)
Understanding Visitor Preferences and Optimizing Two-sided Marketplaces: Airbnb as an Instance (hyperlink)
Authors: Yufei Wu, Daniel Schmierer
Predicting Potential Buyer Assist Wants and Optimizing Search Rating in a Two-Sided Market (hyperlink)
Authors: Do-kyum Kim, Han Zhao, Huiji Gao, Liwei He, Malay Haldar, Sanjeev Katariya
Understanding Consumer Reserving Intent at Airbnb (hyperlink)
Authors: Xiaowei Liu, Weiwei Guo, Jie Tang, Sherry Chen, Huiji Gao, Liwei He, Pavan Tapadia, Sanjeev Katariya
Can Language Fashions Speed up Prototyping for Non-Language Information? Classification & Summarization of Exercise Logs as Textual content (hyperlink)
Authors: José González-Brenes
Studying and Making use of Airbnb Itemizing Embeddings in Two-Sided Market (hyperlink)
Authors: Siarhei Bykau, Dekun Zou
Understanding Product High quality with Unstructured Information: An Utility of LLMs and Embeddings at Airbnb (hyperlink)
Authors: Jikun Zhu, Zhiying Gu, Brad Li, Linsha Chen
Invited Discuss: High quality Foundations at Airbnb
Audio system: Peter Coles, Mike Egesdal