Talk | Technical | English

It is common to have many doubts when dealing with real estate markets. When looking for a house, we usually ask ourselves “How much should that house be worth?” or “Is there a neighborhood similar to the one I like?” On the contrary, when selling a house, apart from the obligatory doubt, “At which price should I put the ad?”, we might ask ourselves “Why can’t I sell my apartment? or “How can I speed up the sale of my house?”. To answer all these questions, within the idealista Data Crew team we strive to improve our products and create new ones through advanced data exploitation.


Today we share our experience with a project that tries to answer a few questions from those who face the hard task of selling a house. Selling a home is not easy. First we have to get an idea of ​​its value, which can be difficult without information about the price of similar houses. Moreover, if we want to quickly sell the house, we have to set a relatively low price to make the listing attractive. In order to choose the right balance between asking price and marketing time, we need to know how long it takes on average for houses to sell, but also how each feature of the ad (in particular its asking price) matters in determining the marketing time.


In many questions we face as data scientists we rely on regression or classification models. Here we make an exception by applying Survival Analysis, a branch of statistics that deals with predicting the time until an event occurs. While this technique has its origin in medical research, today survival analysis can be used in many areas: for instance manufacturing companies use it to predict the life of their machines, or sales departments to predict customer churn.


In idealista we dispose of data from millions of ads related to properties for sale and rent in Spain, Italy, and Portugal. We analyze the life of each ad throughout all its phases, since the market entry up to the exit. Each ad is characterized by an expected time on the market (its “life expectancy”) due to factors that affect its sale probability. Some of these factors depend on property characteristics, others on market conditions. Our goal is to identify which of these factors matter in determining the likelihood of a sale. We use a standard Cox Proportional Hazard model, which, due to its simplicity and interpretability of the results, is usually considered the workhorse model in survival analysis.


We feed the model with info about the most recent ads that passed through the idealista portal. The model learns the key characteristics that make a listing attractive to the market. Equipped with this knowledge, the model estimates the expected marketing time, sale and rental probabilities of any advertisement currently listed in the portal. In addition, the model opens the door to the possibility of simulating “what if?” scenarios in which, for instance, we wonder how the marketing time varies if we apply a renovation to the property, reduce the asking price, and so on…


What do we learn from all this? That there is life beyond regression and classification models, such as survival analysis. Although this technique comes from the medical field, quite far from the digital and real estate fields, it allows to obtain very interesting results. For this reason, in the idealista Data Crew team we are always open to taking into account what is being done with data in other sectors, no matter how different they are from ours.