banner
二階堂春希

春希のブログ

山雨欲来风满楼,故攻八面以铸无双。 孤战非所望,俗安不可期。
tg_channel
telegram
twitter
github

Mathematical Modeling of Airport User Package Selection

Some time ago, an airport owner contacted me, hoping I could provide mathematical assistance to help them calculate what packages users would choose. I replied that I needed the necessary data to establish a mathematical model. However, they refused to provide me with the data, possibly because they had not collected it or found it troublesome to anonymize.

After negotiations, I established a model for them that did not require data but could be adjusted based on data, and I also provided some cost-reduction solutions; they offered me a higher fee.

Although I strongly suspected that it would take a long time for them to recoup the costs of hiring me through the profits gained from this model, I still went ahead and did it. Moreover, they allowed me to open-source part of this model.

Their airport uses PHP, but for ease of calculation, I used PyTorch. I think they might still need to find someone who knows Python to run this model.

Model Parameters#

Assuming the probability density function of the user's budget distribution is b(x1)b(x_1), the probability density function of traffic demand is t(x2)t(x_2), and the probability density function of node quality demand is q(x3)q(x_3).

Assuming the probability density function of the user's emphasis on the above three factors is a(y1,y2)a(y_1,y_2), defined as:

  • y1(0,1)y_1\in(0,1), y2(0,1)y_2\in(0,1), y1+y2<1y_1+y_2<1, otherwise a(y1,y2)=0a(y_1,y_2)=0
  • y1y_1 is the weight of the budget, y2y_2 is the weight of traffic, and 1y1y21-y_1-y_2 is the weight of quality.

Each package has three attributes: price, traffic, and quality. It is assumed that when users consider which package to buy, they will make decisions according to the Analytic Hierarchy Process. The parameters of bb, tt, and qq need parameter estimation.

Let P=(pr1,pr2,pr3,,prn)P=(p_{r1},p_{r2},p_{r3},\cdots,p_{rn})', T=(t1,t2,t3,,tn)T=(t_1,t_2,t_3,\cdots,t_n)', Q=(q1,q2,q3,,qn)Q=(q_1,q_2,q_3,\cdots,q_n)' be the product array vectors, representing the prices, traffic, and quality of the products, respectively. The product array vector is determined by the product properties and does not require parameter estimation.

import torch

# Define the price, traffic, and quality of the packages
packageNumber = 3
packagePrice = [1, 2, 3]
packageTrafficLimit = [1, 2, 3]
packageQuality = [1, 2, 3]

Quantification of Analytic Hierarchy Process#

For budget satisfaction, it can be defined as f(pr,x1)=10prx1f(p_r,x_1)=10^{-\frac{p_r}{x_1}}, where prp_r is the package price and x1x_1 is the budget.

For traffic satisfaction, it can be defined as g(t,x2)=4tx234g(t,x_2)=\sqrt{\displaystyle\frac{4t}{x_2}-\frac{3}{4}} (defined as 0 when the expression under the square root is below 0), where tt is the package traffic and x2x_2 is the package demand.

For node quality satisfaction, it can be defined as h(q,x3)=qx33h(q,x_3)=\sqrt[3]{\displaystyle\frac{q}{x_3}}, where qq is the node quality quantification index and x3x_3 is the node quality demand.

Users will choose satisfaction as follows:

y110prx1+y24tx234+y3qx33y_1\cdot10^{\displaystyle-\frac{p_r}{x_1}}+y_2\sqrt{\displaystyle\frac{4t}{x_2}-\frac{3}{4}} + y_3\sqrt[3]{\displaystyle\frac{q}{x_3}}

The maximum package

Moreover, if the satisfaction for the package is less than 0.1, then they will not choose this package.

In vector form, it can be expressed as:

SATISFY=[f(P,x1),g(T,x2),h(Q,x3)]Y\text{SATISFY} = [f(P,x_1),g(T,x_2),h(Q,x_3)]'Y
SELECT={maxSATISFYPACKAGE, SATISFY>0.1, SATISFY0.1SELECT=\begin{cases} \max_{\text{SATISFY}} \text{PACKAGE},\ \text{SATISFY}>0.1\\ \varnothing,\ \text{SATISFY}\leq 0.1 \end{cases}

The corresponding Python code is:

# Define the mapping from user parameters to package satisfaction
def user_satisfaction(vector_x, vector_y, pr, t, q):
    # Ensure X and Y are torch tensors
    vector_x = torch.tensor(vector_x, dtype=torch.float32)
    vector_y = torch.tensor(vector_y, dtype=torch.float32)
    
    term1 = vector_y[:, 0] * 10 ** (-pr / vector_x[:, 0])
    term2 = vector_y[:, 1] * torch.sqrt(4 * t / vector_x[:, 1] - 0.75)
    term3 = vector_y[:, 2] * (q / vector_x[:, 2]) ** (1 / 3)
    
    return term1 + term2 + term3

User Modeling#

User Budget Distribution#

The budget can be considered to roughly follow a Pareto distribution, assuming the initial parameter is x1Par(1,0.6)x_1\sim Par(1,0.6), that is,

b(x1)=0.6x11.6(x1>1)b(x_1) = \frac{0.6}{x_1^{1.6}} (x_1>1)

where 0.6 needs parameter estimation to obtain an accurate value.

The Pareto distribution is often used to describe unequal distributions, such as wealth or income, and the corresponding budget. In the real world, wealth distribution is often highly unequal, with a few individuals holding most of the wealth. Therefore, using a Pareto distribution to reflect user budgets should be appropriate.

Additionally, the Pareto distribution is closely related to the so-called "80/20 rule," which states that 80% of effects (such as consumption) come from 20% of causes (such as consumers). This phenomenon is common in many economic models, where a small number of consumers contribute to a large portion of consumption expenditure.

Among the two parameters of the Pareto distribution, the first parameter is the cutoff value, and when x1<1x_1<1, b(x1)=0b(x_1)=0. The cutoff value cannot be parameter estimated, but assuming it to be 1 is appropriate.

Traffic Demand Distribution#

Traffic demand can be considered to approximately follow a normal distribution, with parameters x2N(130,302)x_2\sim N(130,30^2), that is,

t(x2)=1402πe12(x212040)2t(x_2)=\frac{1}{40\sqrt{2\pi}}e^{-\displaystyle\frac{1}{2}\left(\frac{x_2-120}{40}\right)^2}

Both of these parameters require parameter estimation.

Quality Demand Distribution#

If the quantification of quality is 1101\sim10, where 1 represents a completely basic airport and 10 represents a fully IPLC multi-entry intelligent parsing airport (performance close to a game accelerator), then quality demand can also be considered to follow a Pareto distribution, with parameters x3Par(1,2)x_3\sim Par(1,2), that is,

q(x3)=2x32q(x_3)=\frac{2}{x_3^2}

This Pareto distribution also only requires parameter estimation for the second parameter.

Weight Distribution#

Note that the distribution of y1y_1, y2y_2, and y3y_3 is actually uniformly placed on an equilateral triangle. This equilateral triangle is the base of a tetrahedron. The three sides of this tetrahedron are right triangles with legs of length 1.

The three vertices of the equilateral triangle represent the cases where one weight is 1 and the other two weights are 0.

When using (y1,y2)(y_1,y_2) to represent, the three cases correspond to (0,0)(0,0), (1,0)(1,0), (0,1)(0,1). To make y1,y2,y3y_1,y_2,y_3 form a more uniform equilateral triangle, this area needs to be linearly transformed to (3+1,0)(-\sqrt{3}+1,0), (31,1)(\sqrt{3}-1,1), (31,1)(\sqrt{3}-1,-1). This linear transformation can be easily derived.

First, align (3+1,0)(-\sqrt{3}+1,0) with (0,0)(0,0), then the other two points are (3,1)(\sqrt{3},1), (3,1)(\sqrt{3},-1). Thus, the transformation matrix is:

[3311][1001]1=[3311]\begin{bmatrix} \sqrt{3} & \sqrt{3} \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}^{-1}= \begin{bmatrix} \sqrt{3} & \sqrt{3} \\ 1 & -1 \end{bmatrix}

The translation vector is (1,0)(-1,0), that is,

[y1y2]=[3311][y1y2]+[10]\begin{bmatrix} y_1' \\ y_2' \end{bmatrix}= \begin{bmatrix} \sqrt{3} & \sqrt{3} \\ 1 & -1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}+ \begin{bmatrix} -1 \\ 0 \end{bmatrix}

However, regarding the emphasis on the three factors, it can be assumed that the average person does not care about node quality at all and only considers node price and traffic. Therefore, a truncated bivariate normal distribution can be chosen. The parameters before truncation are:

(y1,y2)N([1/21/2],[1/9001/9])(y_1,y_2)\sim \mathcal{N}\left( \begin{bmatrix} 1/2 \\ 1/2 \end{bmatrix}, \begin{bmatrix} 1/9 && 0 \\ 0 && 1/9 \end{bmatrix} \right)

That is, the probability density function is:

a(y1,y2)=9πe92[(x11/2)2+(x21/2)2]a(y_1,y_2)= \frac{9}{\pi}e^{\displaystyle -\frac{9}{2}\left[(x_1-1/2)^2+(x_2-1/2)^2\right]}

These two parameters should also not require parameter estimation.

from torch.distributions import Pareto, Normal
from torch.distributions.multivariate_normal import MultivariateNormal

# Define user budget distribution
user_budget_distribution = Pareto(1, 0.6)
# Define user traffic demand distribution
user_traffic_demand_distribution = Normal(130, 30)
# Define user quality demand distribution
user_quality_demand_distribution = Pareto(1, 2)
# Define user weight distribution
user_weight_mean = torch.tensor([0.5, 0.5])
user_weight_covariance = torch.tensor([[1 / 9, 0], [0, 1 / 9]])
user_weight_distribution = MultivariateNormal(user_weight_mean, user_weight_covariance)

# Generate samples
sample_size = 10000     # Number of samples
user_budget_sample = user_budget_distribution.sample((sample_size,))
user_traffic_demand_sample = user_traffic_demand_distribution.sample((sample_size,))
user_quality_demand_sample = user_quality_demand_distribution.sample((sample_size,))
user_weight_sample = user_weight_distribution.sample((sample_size,))

# Simulate user package selection
y_1 = user_weight_sample[:, 0]
y_2 = user_weight_sample[:, 1]
y_3 = 1 - y_1 - y_2
user_sample = torch.stack([user_budget_sample, user_traffic_demand_sample, user_quality_demand_sample], dim=1)
user_weight_sample = torch.stack([y_1, y_2, y_3], dim=1)
user_satisfaction_of_packages = []
for i in range(packageNumber):
    pr = packagePrice[i]
    t = packageTrafficLimit[i]
    q = packageQuality[i]
    satisfactions = user_satisfaction(user_sample, user_weight_sample, pr, t, q)
    user_satisfaction_of_packages.append(satisfactions)
    
user_satisfaction_of_packages = torch.stack(user_satisfaction_of_packages, dim=1)
max_values, max_indices = torch.max(user_satisfaction_of_packages, dim=1)
indices = torch.where(max_values < 0.1, torch.tensor(-1), max_indices)

Parameter Estimation#

In summary, the parameters that need to be estimated are:

  • The kk parameter of the user's budget Pareto distribution: kbk_b
  • The mean and variance of the traffic demand distribution: μt\mu_t, σt2\sigma^2_t
  • The kk parameter of the quality demand Pareto distribution: kqk_q

It seems that these parameters can be estimated through neural networks, but I haven't researched it, and the initial parameters are not unusable.

Minimizing Costs#

Parameter Definition#

Assuming the user's node preference is:

R=[r1r2rn]R = \begin{bmatrix} r_1 \\ r_2 \\ \vdots \\ r_n \end{bmatrix}

and stipulating that i=1nri=1\sum_{i=1}^n r_i = 1.

The actual traffic used by users at each node is:

U=[u1u2un]=uRU= \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{bmatrix} = uR

where uu is the total traffic actually used by the user.

Here it is assumed that users will not switch to less preferred nodes just because they are close to exceeding their traffic limit.

Ignoring the user capacity limits of each node, it is assumed that users will exhaust the node's traffic before overloading it due to requests. Thus, the marginal costs of each node are:

C=[c1c2cn]C=\begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}

Then the cost is CU=uCRC'U=uC'R.

Here it is assumed that the marginal cost is constant, but generally, airports experience economies of scale (marginal costs decrease with usage).

If the price is prp_r, then the marginal profit is pruCRp_r-uC'R.

Thus, the average profit per package is:

rˉ=pruˉCR\bar{r} = p_r - \bar{u}C'R

Overselling Problem#

To determine the marginal cost of node traffic, it is necessary to calculate the overselling ratio, since cn=ctn/Rosc_n=c_{tn}/R_{os}, where ctnc_{tn} is the actual marginal cost of traffic, and RosR_{os} is the overselling ratio.

To ensure that availability does not decline due to excessive overselling, define RamaxRa_{max} as the overselling availability, meaning: there is a RamaxRa_{max} probability that availability will not be compromised due to overselling.

Here, the hyperparameter is defined as Ramax=1104=99.99%Ra_{max}=1-10^{-4}=99.99\%.

For packages g1,g2,,gng_1,g_2,\cdots,g_n, let the probability density function of the total traffic actually used by users be fn(u)f_{n}(u), then for a certain node, the probability density function of the traffic used is rnfn(urn)r_nf_n(ur_n).

Let the actual traffic of the node be trt_r, then to satisfy availability, it is required that:

Pr{u<tr}RamaxPr\{u<t_r\} \geq Ra_{max}

That is,

n0trrnfn(urn)durn=n0tr/rnfn(u)duRamax\sum_n\int_0^{t_r} r_nf_n(ur_n) dur_n=\sum_n\int_0^{t_r/r_n}f_n(u)du\geq Ra_{max}

For example, assuming there is only one package, and the limit of this package is 150G, for the Japan node, the multiplier is 1, and the user's preference level is 0.5, the traffic used follows a normal distribution uN(80,100)u\sim N(80,100), then trt_r should be 58.59558.595, and the overselling ratio is 2.54232.5423.

The traffic used by users is roughly the same as the user demand traffic obtained from previous parameter estimation, but there is some error. This is because users often overestimate the traffic they need. However, assuming that the actual traffic used by users equals user traffic demand may be feasible, as leaving a little redundancy in traffic is not a big problem.

Because

Ros=trnrntR_{os}=\frac{t_r}{\sum_n r_nt}

And trt_r can be solved from the equation, thus RosR_{os} can be obtained through data analysis.

Maximum Expected Profit Problem#

The maximum expected profit problem is to maximize:

nfnrnˉ=fn(pr,nunˉCR)\sum_n f_n\bar{r_n} =f_n\left( p_{r,n} - \bar{u_n}C'R\right)

(where fnf_n is the selection rate).

Since CC and RR are essentially constant quantities, and after estimating the user's demand distribution, fnf_n only depends on the attributes of the packages, and unˉ\bar{u_n} is also stable, the maximum profit problem is essentially about determining the attributes of the packages.

If it is assumed that the quality attribute of the package is constant, then the only variables are the prices and traffic of each package.

Thus, the method to calculate the maximum expected profit is:

  • Statistically obtain CRC'R
  • Determine the quality of each package
  • Determine the price or traffic of each package, setting the other as a variable.
  • Perform gradient descent on the variable to maximize the above sum
    • fnf_n can be obtained through the mathematical model with users, using the Monte Carlo method
    • prp_r is the price, which is the input variable
    • unu_n is related to user modeling.
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.