By: Alexandre Salama, Tim Abraham
At Airbnb, our provide comes from hosts who determine to record their areas on our platform. Not like conventional lodges, these areas aren’t all interchangeable items in a constructing which might be out there to e-book year-round. Our hosts are folks, with totally different earnings goals and schedule constraints — resulting in totally different ranges of availability to host. Understanding these variations is a key enter into how we develop our merchandise, campaigns, and operations.
Over time, we’ve created varied methods to measure host availability, creating “options” that seize totally different features of how and when listings can be found. Nonetheless, these options present an incomplete image when considered in isolation. For instance, a ~30% availability price might point out two very totally different situations: a bunch who solely accepts bookings on weekends, or a bunch whose itemizing is simply out there throughout a particular season, equivalent to summer season.
That is the place segmentation is available in.
By combining a number of options, segmentation permits us to create discrete classes that characterize the totally different availability patterns of hosts.
However conventional segmentation methodologies, equivalent to “RFM” (Recency, Frequency, Financial), are centered on buyer worth quite than calendar dynamics, and are sometimes restricted to one-off analyses on small datasets. In distinction, we’d like an method that may deal with calendar information and each day inference for thousands and thousands of listings.
To handle the above challenges, this weblog submit explores how Airbnb used segmentation to higher perceive host conduct at scale. By enriching availability information with novel options and making use of machine studying strategies, we developed a sensible and scalable method to phase availability for thousands and thousands of listings each day.
Take into account Alice and Max, two hosts with similar 2-bedroom residences on Airbnb. Nonetheless, Alice solely lists her property in the summertime, whereas Max has it out there year-round — reflecting two distinct internet hosting types.
Alice’s seasonal availability means that she may dwell within the property more often than not, solely renting it out through the summer season months. Airbnb can help her with seasonal pricing suggestions, onboarding guides for infrequent hosts, and settings strategies.
Conversely, Max’s full-time availability signifies a extra skilled internet hosting model, probably his major earnings supply. Airbnb can present him with superior reserving analytics, instruments for managing a number of reservations, and steering on earnings and tax implications.
How can we create a dataset that captures these essential variations in internet hosting conduct?
Availability Price
A primary step is to seize the host’s “intention to be out there” on a particular evening. Availability may be each analyzed from a backward-looking (prior to now) or forward-looking (sooner or later) perspective. For simplicity, this submit focuses on backward-looking availability, because it displays the ultimate state of a calendar in any case modifications in stock, bookings and cancellations have occurred. Ahead-looking availability isn’t as easy as a result of modifications can nonetheless occur between the evaluation date and the long run dates being analyzed.
We take into account each:
- Nights Vacant: nights when the itemizing was listed as out there for reserving on Airbnb, and remained vacant.
- Nights Booked: nights when the itemizing was listed as out there for reserving on Airbnb, and was later booked on Airbnb.
Consequently, we are able to calculate the corresponding Nights Supposed to be Accessible, or Nights Accessible, for the 365-day look-back interval because the sum of Nights Vacant and Nights Booked. We then divide it by 365, to acquire the corresponding Availability Price.
From this distribution we observe:
- A substantial proportion of listings has little-to-no availability (~0% availability price).
- Conversely, a major proportion of listings has close to full availability (~100% availability price).
- Between these extremes, a major set of listings emerges with out robust breakpoints.
How can we additional differentiate these listings that fall within the center vary?
Streakiness
For listings that aren’t at both finish of the spectrum, availability price by itself is inadequate for capturing the nuances of how a list is made out there all through the month. Take into account listings A and B, which each have a 50% availability price in a given month.
Though these listings have distinct availability patterns, they each have the identical availability price (50%)!
Itemizing A’s concentrated, block-like availability might lend itself to suggestions for weekly keep reductions, or recommendation for hosts who’re away for an extended stretch — steering which might not be appropriate for Itemizing B.
To seize this distinction, we introduce “Streakiness”. Within the instance above, Itemizing A had 1 lengthy streak of availability which was interrupted on evening 16, whereas Itemizing B had 8 brief streaks of availability, every lasting 2 nights earlier than a 2-night break.
We outline a streak as a consecutive sequence of availability with a minimal of two consecutive nights, adopted by a subsequent interval of a minimum of 2 consecutive nights of unavailability, as described within the diagram beneath. Observe that we initially thought-about utilizing a single evening of availability/unavailability as a threshold however discovered it to be a much less dependable sign of the consistency that streakiness goals to measure.
This leads us to the corresponding Streakiness function, computed because the ratio of Streaks divided by the variety of Nights Accessible (computed within the earlier part). At this level, we now have two comparatively orthogonal options for our evaluation: availability price and streakiness.
Seasonality
We discovered that whereas availability and streakiness present a stable foundation for measuring quantity and consistency, they don’t seize a calendar’s “compactness” — in different phrases, its seasonality. For instance, take into account Listings C and D, which each have round 15% availability and 14 streaks:
- Itemizing C concentrates its availability inside a narrower block of time (summer season season) — see first calendar beneath.
- Itemizing D distributes its availability extra evenly throughout a number of quarters — see second calendar beneath.
Seasonality performs a vital position in Airbnb’s enterprise, as visitor demand and host availability fluctuate with modifications in seasonal attraction, holidays, and native occasions. Given this, we suggest to create a Quarters with at Least One Evening of Availability function.
Moreover, we create a Most Consecutive Months function which captures streakiness at a yearly scale, highlighting the longest steady interval a list is offered. Collectively, these options give clearer perception into seasonal patterns.
Last dataset
The ultimate function set consists of all listings that have been listed on the platform as of a broad set of dates. For every itemizing, we calculate the options we’ve designed within the earlier sections. Then, we take a big, random pattern throughout these dates. Lastly, we scale the numerical options to make sure they’re on a comparable scale.
We will now apply a Ok-means clustering algorithm to establish segments, testing fashions with Ok values from 2 to 10. Utilizing the elbow plot to search out the optimum variety of clusters, we choose 8 clusters as the very best illustration of our information.
We now have our clusters, however they don’t have names but. Our cluster naming course of entails a number of steps:
- Checking the distribution of every function by cluster to establish robust variations (e.g., “cluster 1 has the very best availability price”)
- Randomly sampling listings from every cluster and visualizing their calendars
- Iterating on naming with a cross-functional inner working group
The output of this course of is summarized within the desk beneath, whereas the next diagram shows a “typical” calendar for every cluster.
Since we’re measuring a latent attribute — underlying host conduct patterns that don’t have “floor fact” labels — there isn’t a completely correct option to validate our segmentation. Nonetheless, we are able to use varied methodologies to make sure that it “is sensible” from a enterprise perspective, and reliably displays real-life host behaviors.
We achieve this in three steps:
- A/B Testing
- Correlates of Availability Segments
- Person Expertise (UX) Analysis
A/B Testing
In an A/B take a look at, we assessed how the totally different segments beforehand used a function that inspired hosts to finish “advisable actions” (e.g., letting company e-book their residence last-minute) so they might earn a financial incentive.
We present using the function by every phase beneath. These outcomes align with our instinct: hosts who use Airbnb for particular events or hardly ever might not be fascinated by following suggestions, even when incentivized. Equally, “At all times On” hosts, who’re already extremely engaged and proactive in managing their listings, may choose to depend on their very own methods quite than observe Airbnb’s strategies. Hosts who fall someplace in between, with average ranges of engagement, could be the splendid goal for incentives, as they’re probably open to changes that might enhance their efficiency.
(“CI” = Confidence Interval)
Correlates of Availability Segments
We additionally validate our clusters by checking correlations with recognized attributes. As an illustration, we verify that “At all times On” listings are probably extra managed by professionals, or that “Brief Seasonal” listings are probably extra frequent in ski or seashore locations.
Moreover, we all know it is not uncommon to watch a rise within the variety of listings round massive occasions. As anticipated, we observe an increase in “Occasion Motivated” listings main as much as and through main occasions intervals, reflecting hosts’ responsiveness to elevated demand.
UX Analysis
Lastly, we all know the UX Analysis workforce conducts host surveys to create qualitative personas, which we examine in opposition to our clusters to make sure they align with real-world conduct. As an illustration, we confirm if segments with excessive weekend availability match hosts who self-report preferring weekend leases.
Now, we have to scale this segmentation to all our listings.
To realize this, we use a choice tree algorithm. We practice a mannequin utilizing our 4 options, with cluster labels from our Ok-means mannequin as outputs. We additionally carry out a train-test cut up to ensure the mannequin precisely predicts every cluster.
This new mannequin offers a easy, interpretable set of if-else guidelines to categorise listings into clusters. Utilizing the choice tree construction, we translate the mannequin’s logic right into a SQL question by changing the choice tree’s “IF” circumstances into “CASE WHEN” statements. This integration permits the mannequin to be propagated in our information warehouse.
At Airbnb, varied groups leverage these segments: product groups to tell technique and analyze heterogeneous remedy results in A/B assessments, advertising groups for focused messaging, and UX analysis groups for insights into hosts’ motivations.
As an illustration, we revealed a chance to spice up Prompt E-book adoption amongst “Occasion Motivated” hosts, who could sometimes record their major residence and like handbook visitor screening. Including an choice for hosts to solely settle for company with a sure ranking could make Prompt E-book extra interesting to them, providing a steadiness between host management and reserving effectivity.
Initially designed for itemizing availability information, this segmentation methodology has additionally been tailored to host exercise information. We developed a second segmentation centered on days with “host engagement” (e.g., adjusting costs, updating insurance policies, revising itemizing descriptions) to distinguish occasional “Settings Tinkerers” from frequent “Settings Optimizers.”
This method can be tailored to different industries the place understanding temporal engagement is crucial, as an example, to differentiate:
- Social Media: informal lurkers vs. lively content material creators
- Ridesharing: occasional drivers throughout peak demand vs. full-time drivers
- Streaming Companies: nighttime streamers vs. steady streamers
- E-commerce: gross sales/holidays lovers vs. year-round customers
This weblog submit was a collaborative effort, with important contributions from Tim Abraham, the principle co-author. We’d additionally wish to acknowledge the invaluable help of workforce members from a number of organizations, together with (however not restricted to) Regina Wu, Maggie Jarley, and Peter Coles.