Optimization of real-world outdoor campaign allocations

: In this paper, we investigate the outdoor campaign allocation problem (OCAP), which asks for the distribution of campaign items to billboards considering a number of constraints. In particular, for a metropolitan city with a large number of billboards, the problem becomes challenging. We propose a genetic algorithm-based method to allocate campaign items effectively, and we compare our results with those of nonlinear integer programming and greedy approaches. Real-world data sets are collected with the given constraints of the price class ratios of billboards located in İstanbul and the budgets of the given campaigns. The methods are evaluated in terms of the efficiency of the constructed plans and the construction time of the planning. The results reveal that the genetic algorithm-based approach gives close to optimal results in the shortest scheduling time for the OCAP, and it scales linearly with the increasing data sizes.


Introduction
Out-of-home advertising, with its various advantages over the other advertisement mediums, for which ignorance, zapping, or clicking away is more likely, is one of the oldest communication tools to reach business and consumer markets. Printed or digital outdoor advertisement signs are commonly used by organizations to communicate their campaigns outdoors. These could be displayed on billboards, street furniture, mobile vehicles, etc. [1].
In the present study, we particularly focus on printed billboard advertising and the allocation of advertisement campaign posters to billboards. The methods that we present are also applicable to all forms of outdoor advertising formats.
Billboard advertising is one of the oldest methods of out-of-home media. It appears in various forms such as citylight posters, megalights, electric poles, shopping mall kiosks, and bus stop posters. This form is mostly seen in city centers. For instance, bus stop posters are displayed at bus stops that are visible to people waiting for a bus. Billboards are usually located in heavy traffic areas such as high-visible highways, main arteries and intersections. Billboard ads with high visibility have a significant impact on traffic vehicles.
Repetition in advertising is an important tool in out-of-home advertising [2]; that is, the higher number of campaign items to display, the higher number of customers you reach. On the other hand, display locations of campaigns are more crucial than how frequently they are displayed. Hence, it is of great importance to allocate advertisement items of a campaign by considering the geographic concentration, the demographic profiles of the In the manual allocation process by an operator, the best combination of the addresses is selected for the customer to ensure reaching the right audience of the requested campaign. The following considerations are taken into account: How many people can be reached by distributing items to different product types? Which media platforms and product types should be allocated? Does the distribution of the campaign items match the profit from the assigned addresses?
Considering this information, an operator has to allocate items for at least 30 customers per day on average. Moreover, for each campaign request sent by the sales representative, the agency may request a revision to an existing allocation on average 30 times a day due to location, product type, period, and changes occurring.
In the meantime, 80% of the incoming campaign requests are planned as optional, while only 20% of them are planned as real sales most of the time. Indeed, taking into account that even the status of sales may change, it may be necessary to design a continuous revision process until all campaign requests are finalized. This entails a great workload for the operators.
In the present study, methods for performing automated campaign-billboard assignment approaches are investigated, and the proposed methods are applied and analyzed on real-world data sets. The OCAP may be addressed by linear integer programming frameworks to achieve the optimal results. Greedy algorithms [5] that do not guarantee optimal solutions but give acceptable solutions, or heuristic methods can also be used to solve the problem. We propose a genetic algorithm (GA)-based method that can provide close to optimal solutions within the shortest allocation time. We also provide a comparison of our method with manual planning, nonlinear integer programming, and greedy approaches. Detailed analyses of these methods are presented for various real-world data sets. Our main contribution relies on solving the OCAP, which has a real-world domain with a GA-based methodology. The data set that we use consists of real customer requests from the year 2018 in İstanbul, and the proposed method can generate solutions for these requests within the shortest allocation time.
This paper is organized as follows. First, the problem is introduced and the literature on this problem is presented. Then the problem is formulated in a formal manner. This section is followed by the presentation of the methods to handle the corresponding problem: manual planning, integer programming (IP) formulation, greedy algorithm-based solution, and GA-based solution. Then an experimental evaluation is presented on the real-world data. Last, the paper is concluded with possible future directions.

Related work
Media allocations that we are interested in herein are usually represented as matching problems. The stable marriage problem [6], student-dormitory assignment problem [7,8], kidney transfer assignment problem [9], worker-staff assignment problem [10,11], hospitals/residents problem [12], and college admissions problem [6] are examples of well-known matching problems relevant to our studies for the OCAP.
An IP model may also be used to handle the problem with natural heuristics for a multiperiod problem.
Adany et al. [13] present a solution for a media allocation problem and compare the results with those of a heuristic method and a deterministic method. In another work, a display ad allocation problem is handled by using a linear programming (LP) formulation, where online bidding algorithms are proposed [14]. Measurement of reaching the audience or audience frequency is also a key problem of outdoor advertising and marketing, and it is handled in Lichtenthal et al. [1]. Trajectories of the billboards and the campaigns to be assigned to them may also be taken into account in order to maximize the influence of an advert [15] by comparing a greedy algorithm, a partition-based framework, and a LazyProbe method. In another work, data collected from the mobile phones of users are used to construct an optimal assignment for the digital advert-billboard assignment problem [16].
The OCAP is formulated as an optimization problem based on the needs of the software company CPM. The problem most similar to this one is the media planning problem, where the objective is to decide on the best advertisement media for a company. This is a complex problem considering its parameters such as media revenue, limited area inventory, demographic characteristics, different target groups, customer preferences, and competition [5,17]. Although there are many media channels to reach the customers, similar problems are faced regarding similar parameters in every channel. TV advertisements are common media tools, and Cosgun et al.
[18] developed a mathematical model for the problem called an advertisement reservation problem by using a mixed integer LP approach to solve how to reserve the television advertisements by customers. Similarly, Zhang [19] proposes a mathematical model to solve the television advertisement allocation problem using two-step hierarchical approaches.
The advertisements are allocated based on auctions or contracts and supply/demand constraints are taken into account to determine the best solutions. Our inspiration for the OCAP representation comes from the media planning domain. The media planning problem has been represented as a weapon target assignment (WTA) problem previously, and the solutions are generated with various optimization methods [24]. This problem aims to find a feasible assignment between weapons and targets considering maximization of the expected damage. The assignment is also considered as an NP-complete problem. This problem can be solved using an IP approach. However, as the problem size grows, it can be handled by a GA-based solution [25]. Although the OCAP differs from this problem, we are inspired by the presented methodology for solving the media planning problem in these works. Our formulation and solution target solving the OCAP but can be extended to solve other optimization problems as well.

Outdoor campaign allocation problem
The main objective of the present work is to allocate advertisement campaign items to the existing sets of billboards distributed and located in different parts of the city. The following definitions are needed to present our formulation to solve the OCAP.
• Billboard: A billboard is an outdoor structure where posters are displayed. Each billboard may have a different number of faces to display posters in a time-sharing manner.
• Address: An address of billboards represents an area where a group of billboards are located.
• Campaign: A campaign corresponds to an advertisement decision for a company for promotion and marketing.
• Campaign poster: A campaign poster is an illustration that is displayed on a billboard. Campaign posters are the tools to promote a campaign.
Based on these definitions, the OCAP can be formulated as the following matching (M) problem. Given a graph G(V, E) of nodes, which is divided into two sides with one side involving campaigns from each company i ( c i ∈ C ) and the other side involving billboards ( b l j ∈ B ), the matching must satisfy a certain criteria set. Each campaign c i calls for a number ( v i ) of posters (each symbolized as p i k ) are allocated to the billboards for advertisement with the same unit price u ci .
Billboards are located in groups at certain addresses ( a l ∈ A, b l j ∈ B ). Addresses may have different visibility scores ( r tm ), depending on their classes ( t m ∈ T ). In our particular case study for CPM, ∥T ∥ = 3 .
The class of an address and its score depend on its location (the higher the value is the better class of address).
These scores are mostly determined by an impact measure called visibility adjusted contact (VAC) value for outdoor advertising 1 . This value is an indicator for the coverage or reach of a campaign for the corresponding address, and it is dependent on the locations, orientations, and other physical characteristics of displays (e.g., under lightning), the volume of traffic (targeting either cars or pedestrians) at that particular location, and the visibility of other surrounding items.
The matching problem asks for finding an allocation of campaign posters to billboards. Each poster p i k is matched with a different billboard b l j . However, multiple copies of posters can be displayed in the same address   ). An illustration of an example graph is given in Figure 2. In this figure, the leftmost nodes are the campaigns, and they are associated with the given number of posters, where the rightmost nodes are the addresses for the given number of billboards. Note that the values for the number of posters and billboards are chosen from small numbers randomly for simplicity.
Considering the defined variables, there are a number of constraints that need to be satisfied during the allocation process. First, each campaign has an exact copy of posters, and all of them should be distributed to the billboards in the city. Since each address is limited to display only a given number of posters, this capacity cannot be exceeded.
According to the needs of the companies, some soft constraints may exist over the distributions of the posters to the addresses. One common constraint involves obeying the given distribution ratios for address classes ( d tm ). Let v tm i denote the total number of posters of campaign c i assigned to the billboards of class t m . In our particular case study, the campaign posters are distributed to addresses with even number of copies (as the multiples of 2) to increase their visibility with repetition. Given these constraints, the main objective can be summarized as maximizing the overall profit of the company that achieves the allocation considering the unit prices u ci as follows:

Solution methods for the OCAP
In this section, three methods for solving the OCAP are presented. These methods comprise: (i) a nonlinear IP-based optimization method, (ii) a greedy approach, and (iii) a GA-based optimization method.

Nonlinear IP formulation
The OCAP can be solved by using a nonlinear IP formulation. The main aim is to optimize the allocation of the campaign posters to the billboards as presented in Equation 2. We use a generalized reduced gradient (GRG) algorithm as a nonlinear IP solver. Depending on the distribution of the campaign posters and considering the billboard classes that they are assigned to, the objective function can be constructed as follows: In this equation, the fraction inside the summation returns a weight between 0 and 1 depending on the visibility of the location. A high value shows that the majority of the campaigns are assigned to the billboards with a high class value ( t m ). Note that the r tm value is predefined for each billboard type.
In order to formulate the presented problem as a nonlinear IP formulation, the given constraints are encoded. Each address has a capacity in terms of the campaign posters to be displayed. This capacity constraint can be formalized with Equation 3 for each billboard b l j at address a l . ∀l( All posters of a campaign must be allocated considering the number of copies demanded. Equation 4 represents this constraint for each campaign c i . Note that multiple copies of campaign posters may be assigned to multiple addresses. Each campaign poster must be distributed to a billboard at an address exactly once (Equation 5), and each distinct billboard face at an address must contain exactly one poster (Equation 6). ∀i∀k( While assigning campaign posters to the billboards, class distribution rates for each class type ( d tm ) must be satisfied for each campaign. Reaching the exact rate is not possible, but at least an approximation would be possible. Equation 7 formalizes this condition.

Greedy algorithm-based solution
Greedy approaches constitute intuitive solutions for solving optimization problems. A solution with a greedy algorithm is computed in an iterative manner by repeating the same greedy methodology. The following issues are taken into consideration in the design of the greedy algorithm for the OCAP. First, it is necessary to allocate campaign posters with higher unit prices to higher class addresses. Second, at the same time, to spread out the campaign promotion to a great extent, the posters must be distributed over a wide area, particularly to the best class addresses. This is the case for all campaigns. Moreover, the distribution of posters to address classes must satisfy the given target distribution rates. Although a greedy approach provides a low cost solution, it does not always guarantee the optimal solution. The proposed greedy approach applies the following main steps: 1. Sorting the campaigns: In order to give priority to the campaigns that have higher unit prices, the campaigns are sorted based on their unit prices in a descending manner.
2. Sorting the addresses: Addresses are sorted based on their classes in descending order.
3. Calculating distribution rates: In order to maintain the distribution rate of each class for each campaign, a list is constructed in order to include amounts to fulfill these rates.

4.
Assigning campaign posters to the billboards: Using the constructed address list in the previous step, each campaign is assigned to a billboard at each execution step of the algorithm.

GA-based solution
GAs are evolutionary optimization methods that iterate over a number of possible candidate solutions to find a result without any guarantee of an optimal solutions. A population of individuals (chromosomes) is maintained to represent a number of possible solutions to the problem. Individuals are made of genes, each of which corresponds to part of the solution. Nature inspired genetic operators, namely selection, crossover, and mutation, are used to iterate the search process over these individuals. Selection is applied to select individuals that undergo crossover and mutation. Crossover simply exchanges a block of genes between two individuals. Mutation is applied to randomly change a gene's value, to mutate the representation of the individual.
The proposed GA-based algorithm for solving the OCAP represents each individual to correspond to a campaign. Since a campaign is supposed to be displayed on many billboards, there is a gene block for each campaign. For example, if a campaign is supposed to be displayed 10 times, then 10 sequential genes are reserved for this particular campaign. Therefore, each billboard (including its faces) has a distinct ID that represents it.
Example individual solutions are illustrated in Figure 3. We need to allocate 3 campaigns with different numbers of face requirements (for displaying corresponding campaign posters), 2, 4, and 4, respectively. Assume that there are 2 addresses with different number of billboards, 4 and 6 in this example. A sample solution includes 10 genes in total. The first two genes of the individual are reserved for the first campaign, the next four genes are for the second campaign, and so on. Note that the gene clusters for each campaign are assigned to a different color in the figure. Each integer in a gene denotes the billboard ID of the corresponding address. For example, for the first individual in the figure, the first campaign's first poster is allocated to the first address, to appear on the second and fourth billboards. The second campaign's first poster is allocated to appear on the sixth billboard (its ID is: 10) in the second address.
After the initial population is constructed randomly, at each iteration of the GA, parent selection, crossover, and mutation operators are applied sequentially. Partially mapped crossover (PMX) is used as a crossover method in order to secure the feasibility of an individual since random exchanges may lead to infeasible solutions (e.g., a particular copy of a campaign poster can only be published on a single billboard and a billboard can only contain a single poster.). Figure 3 illustrates the PMX operation on a randomly selected individual pair. In this crossover method, two points on the individuals are randomly selected to exchange gene numbers. In the illustration, genes from 5th to 7th are selected for the crossover operation. Since every campaign should be assigned to a unique billboard ID, a repair procedure may be needed to fix the infeasibility of an individual. In this example, genes that are labeled with 3, 8 and 1, 2 cause violations and need to be repaired. These fixed genes are marked with an asterisk. These genes need to be swapped since the individuals become infeasible candidates after the partial swap due to having duplicates of the same campaign. These genes are determined by using a map considering the swapped genes during the crossover operation. Swap mutation is used as a mutation operator where two genes are randomly swapped. In this mutation procedure, two genes are selected at random and they are interchanged. Since the swap operation only changes the order, this mutation procedure does not violate feasibility. An example mutation operation is illustrated in Figure 4, where the genes marked with asterisks denote the interchanged ones. Roulette wheel selection is applied to select the new generation from the current population. Ind v dual 1 Ind v dual 2 Ind v dual 1 Ind v dual 2

maximize(
This overall process is summarized in Algorithm 1. The GA starts with generating the initial population randomly. Then, for each generation, two pairs of individuals are selected from the existing generation to undergo crossover. After the crossover is applied to the pairs, each individual is considered for mutation. Fitness calculation of an individual is presented in Equation 8. Note that f pen is the penalty coefficient. The first part corresponds to the summation of the multiplication of the unit prices with the class coefficients for each campaign. The subtraction part is the penalty that considers the difference in the actual distribution rate and the predefined distribution rates for each campaign. Based on this computation, the fittest individual is selected to be transferred to the next generation (without undergoing the selection procedure), where the other individuals go through to the next generation using binary roulette wheel selection by considering the fitnesses of the candidates, that is, each individual has a survival chance proportional to its fitness. The remaining spots in the new generation are filled with randomly constructed individuals.

Experiments
In this section, we present the real-world data set used in the present study and the results of the experiments.

Experimental setup
The real-world data set is directly taken from real allocation instances provided by CPM. The company has a wide product range that provides services to almost every province in Turkey. The product range includes products like city light posters (CLPs), billboards, electric poles, tower boards, mega boards, megalights, metro boards, poster boards, roller towers, and bus stations. The instances are collected from real campaign allocations in 2018. The company has a total of nearly 40,000 ad units. In our study, we focus on the allocation of billboards in İstanbul as a case study. There are 913 billboards in total and 4690 advertising face capacities. Of this volume, only 3044 of them are used for pilot applications.
The largest data set, with 3044 campaign posters, is from 18 different sector types with different numbers of campaigns. The main sectors in this large data set are as follows: real estate, fair, retail, durable consumption, education, finance, accessories, IT, social, tourism, automotive, textile, food, sports centers/equipment, media, and municipality. To evaluate the scalability of the algorithms, three smaller data set groups (each of which including four randomly constructed data sets) with various sizes are constructed at random from the real data as follows: DS 1 : 5 different addresses and 5 different campaigns (with data set IDs 1-4), DS 2 : 10 different addresses and 10 different campaigns (with data set IDs 5-8), and DS 3 : 20 different addresses and 10 different campaigns (with data set IDs 9-12). Note that the largest data set, which includes the entire data, is denoted with data set ID 13. Figure 5 presents a statistical analysis of the data used in the experiments. Each subfigure presents the characteristics of the data for each campaign in the related data set in terms of quantity and unit price. Each bar set enumerated with a number on the x axis denotes a distinct campaign in the related data set. The leftmost y axis represents unit price while the other (rightmost) denotes the quantity from the corresponding campaign. Figure 5a illustrates the statistics for the data set that includes 5 addresses and 5 campaigns, Figure  5b shows the one that includes 10 addresses and 10 campaigns, and Figure 5c corresponds to the data set with 20 addresses and 10 campaigns. Note that each campaign has multiple copies of posters to be assigned to the  billboards, where each address has several billboards with a number of faces. Therefore, a total of 322 distinct addresses and 41 campaigns exist in our data set. Moreover, each address has a different number of billboards and each campaign has various copies to be published. As seen from the subfigures, each data set has a different scale considering the unit prices and quantities for the campaigns. In the following section, analyses with the proposed methods on these data sets are presented.

Manual planning process
The motivation behind our work is to automate the allocation phase, which was done manually by operators previously. In the course of these studies, according to the natural process of allocations, we have to pay attention to the demanded numbers and the occupancy rates of the installed products in the field. While operating on the 5-address data set, it is not challenging to distribute the demanded amount of posters homogeneously due to the low number of installed products and clients. Since in the first distribution stage the demanded campaign poster quantities and the occupancy rates of the installed products are not taken into consideration, the constraints may be violated. Consequently, after the distribution of the campaign posters is complete, an improvement is made for resolving these constraints.
In the 10-address data set, while allocating, due to the random order of the address levels and the increasing number of campaigns, the operator needs to focus on combinations of the bookings. One of the reasons that make the allocations difficult is the following: while trying to pay attention to the number of installed products at the addresses and to the homogeneous distribution quantities, in the meantime it is difficult to allocate the items. Considering these, the allocations for 10 campaigns could be distributed according to the address classes, but the number of installed products at each address and the required quantities for other campaigns could lead to incomplete or excessive planning. Likewise, by the nature of media planning, backtracking may be required for each distributed campaign poster, and even reallocation may be necessary. This situation again prevents both the efficiency of time management and effective allocation. In the 20 address data set, it is necessary to be more careful while allocating since the number of addresses increases. In terms of homogeneous distribution, it is even more challenging to reach the effective distribution of each customer and follow the quantities remaining at the addresses. In some cases, while capturing the distribution rates of the customers, the remaining quantities in the addresses may fail to be noticed and overplanning may occur. Taking into account all of these, allocations of a campaign may result in ineffective time management. As a result, for each data set, it can be challenging to track the slots, to control the homogeneous distributions of the requested quantities for the campaigns, to assign them to the right locations, and also to complete the process by considering the needs of the new campaigns.

Experimental results
In this subsection, we present experimental results on the presented data sets by the described methods. The GA-based method is run 500 times for each data set with the following parameters: the number of generations as 500, the crossover rate as 0.7, the mutation rate as 0.1, and the penalty coefficient ( f pen ) as 6000. Table   1 presents the performance analysis of the methods. In each column, data set ID, the number of billboards (addresses) and the number of campaigns that the corresponding data set includes, and the results of the methods are presented. Each row shows the results for the corresponding data set in terms of the quality of the solution (best fitness), the average deviation rate from the billboard class distribution rates, and the time elapsed for the related method to come up with the solution. In addition to the data sets explained in Figure 5, the last data set row presents the results for a larger data set that includes 322 addresses and 41 campaigns to be assigned. The last row represents the average deviation rates from the expected billboard class distribution rates for each method. As can be seen from Table 1, for the small data set group, all of the methods perform similarly. Note that the manual planning method corresponds to the solution where a human operator assigns the campaigns to the billboards. It can also be inferred from the table that the manual planning and IP methods require more time compared to the other methods. As the data set size grows, the GA-based solution results in close to optimal results in terms of best fitness. In some cases, heuristic methods (the GA-based and the greedy methods) seem to find solutions better than the optimal ones found by the IP method. However, note that in these cases there is a small amount of violation (as soft constraints) in the average deviations of the class distributions that are acceptable by the company (less than 15%). It has also been observed that the results of the greedy method have the highest deviation rates. As the numbers of campaigns and billboards grow, the manual planning and IP-based methods cannot even come up with a result. For example, the last data set row indicates the results of the largest data set with 322 billboards and 41 campaigns. By the manual planning and IP-based methods, it is not possible to generate a solution within an acceptable period of time for this data set. On the other hand, the greedy algorithm and the GA-based methods can provide solutions. Overall, the GA-based solution generates acceptable solutions within reasonable time periods and therefore is more efficient than the others.  Tables 2-4 present an analysis of the parameters of the GA-based solution for each data set considering their sizes. This analysis is based on varying crossover rates and the penalty coefficients f pen . Note that the mutation rate is set to 0.1 during this analysis. In these tables, the key word Total represents the total number of campaign posters and billboards (considering the multiple faces of the billboards and copies of the campaign posters). The columns of the table present results for the data sets where each row corresponds to a distinct setting of the GA. According to Table 2, almost all of the crossover settings give similar solutions for both penalty coefficients. For the medium-sized data set group (Table 3), the crossover rate of 0.5 provides better solutions in general for f pen = 6000 . For the larger data sets (Table 4), the crossover rates of 0.5 and 0.7 are sufficient for finding the best solutions when the penalty coefficient is high. Moreover, for the last and the largest data set, a crossover rate of 0.7 gives the best result. For Tables 2-4, as the penalty coefficient decreases to 100 (in other words, when the class distribution rate constraints are softened), the best fitness values increase because of the allowance of deviations from class distribution rates. For instance, the average deviation from the class distribution rate ( d tm ) of the solutions for the small-sized data set group increases from 0.05 (when f pen = 6000 ) to 0.21 (when f pen = 100 ), for the medium-sized data sets from 0.06 ( f pen = 6000 ) to 0.17 ( f pen = 100 ), and for the larger data sets from 0.05 ( f pen = 6000 ) to 0.17 ( f pen = 100 ).  Figure 6 illustrates the evolution of individuals in a population during the GA's generations for the largest data set (a total of 3044 campaigns). In the plot, the x axis represents the number of generations, while the y axis presents the fitness values. The blue line denotes the fitness of the best individual in the current population found so far, while the orange line presents the average fitness of the individuals. As seen from the plot, the solution is generally obtained when the number of generations is approximately 350. Note that the average fitness of the population does not converge to the best fitness value. This is because during the reproduction of the population with binary roulette wheel selection only one individual (the better one) survives. Therefore, the empty spots of the individuals that do not survive are replaced with randomly constructed individuals in the new generation.

Conclusion and future work
In this paper, the OCAP is investigated and analyzed. The problem is represented with a GA formulation and tested on various sized data sets that include real-world data gathered from İstanbul. The proposed method is also compared with manual planning-based, IP-based, and greedy algorithm-based solutions. The results indicate that the GA-based method generates close to optimal solutions. Furthermore, when the data size grows, the GA-based solution can handle the problem faster within a few seconds compared to the other methods. Future work includes applying machine learning methods to consider the features of the campaigns to associate with the locations of the billboards while achieving the allocation task.