python 概率分布模型

科技2023-12-25 83

python 概率分布模型

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Towards Data Science编辑的注意事项：尽管我们允许独立作者按照我们的规则和指南发表文章，但我们不认可每位作者的贡献。您不应在未征求专业意见的情况下依赖作者的作品。有关详细信息，请参见我们的阅读器条款。

In perfectly predictable world, we would have exact information about the future growth rates and cash flows of a business, which would lead to a single accurate intrinsic value of the company. However, in reality, things are more uncertain. In fact, while some analysts give out exact stock price targets, various inputs into financial models are actually randomly distributed variables. The resulting complexities aggravate the challenge of finding an adequate company value. While there is no longer a “single version of the truth” in case of random inputs, if our inputs obey certain rules, it is possible to find a range of values that is likely to contain the true mean value of our company. In writing the following model, I drew on insights provided by NYU-professor Aswath Damodaran in his paper on using probabilistic approaches to conduct company valuation [1].

在完全可预测的世界中，我们将获得有关企业未来增长率和现金流量的准确信息，这将导致公司具有单一准确的内在价值。但是，实际上，情况更加不确定。实际上，尽管一些分析师给出了确切的股价目标，但财务模型中的各种输入实际上是随机分布的变量。由此产生的复杂性加剧了寻找适当公司价值的挑战。尽管在随机输入的情况下不再存在“真理的单一版本”，但是如果我们的输入遵循某些规则，则有可能找到可能包含公司真实均值的一系列值。在编写以下模型时，我借鉴了纽约大学教授Aswath Damodaran在其关于使用概率方法进行公司估值的论文中提供的见解[1]。

一，统计弯路 (I. Statistics Detour)

Let’s assume that we only have four random variables as an input to our stock valuation model: revenue CAGR, EBIT margins, the Weighted Average Cost of Capital (WACC) and the long-term growth rate of the company (let’s say Microsoft). We can further assume that these values have a certain mean value and are Normally distributed. This entails that the distribution of these variables follows a set of rules pertaining, for instance, to their spread.

假设我们只有四个随机变量作为股票估值模型的输入：收入CAGR，EBIT利润率，加权平均资本成本(WACC)和公司的长期增长率(比方说Microsoft)。我们可以进一步假设这些值具有一定的平均值并且呈正态分布。这使得这些变量的分布遵循一组规则，例如与它们的传播有关。

Given these random variables, there is some “true” mean company value, which we try to determine. This true mean value is the mean of infinitely many intrinsic value forecasts based on our random input variables. As it is impossible to find this exact population mean, we need to limit ourselves to creating a single sample of values and drawing inferences about the population mean from this sample.

给定这些随机变量，我们尝试确定一些“真实”的公司平ASP值。这个真实的平均值是基于我们的随机输入变量的无限多个内在价值预测的平均值。由于不可能找到准确的总体均值，因此我们需要限制自己创建单个值样本，并从该样本中得出关于总体均值的推论。

I.1抽样分布 (I.1 Sampling Distributions)

In order to understand how a sample behaves relative to the overall population of company values, it makes sense to first envision a hypothetical scenario in which we know the true mean company value and its standard deviation. If we now draw repeated samples from our population of company values, these samples will follow a sampling distribution which can be approximated using the Normal distribution. The mean of this distribution is the true mean company value and its standard deviation is the standard deviation of the company values divided by square root of the number of observations in the sample. Using this standard deviation, we can make predictions regarding the probability of the occurence of certain events. For instance, we know that 68% of sample means will fall within one standard deviation of the population mean, around 95% will fall within two, and so on.

为了了解样本相对于公司总价值的表现，首先设想一种假设情况，在此情况下我们知道公司的真实平ASP值及其标准差。如果现在从公司价值总体中抽取重复样本，则这些样本将遵循可以使用正态分布近似的抽样分布。该分布的平均值是真实的公司平ASP格，其标准偏差是公司价值的标准偏差除以样本中观察值的平方根。使用该标准偏差，我们可以对某些事件的发生概率进行预测。例如，我们知道68％的样本均值将落在总体均值的一个标准差之内，约95％的样本均值将在两个均值之内，依此类推。

I.2推论 (I.2 Drawing Inferences)

So how does this help us? If we know the population mean and the standard deviation of its sampling distribution, we can theoretically generate a sample and draw inferences about the true population mean. For instance, if we take the mean of our sample, and create a range that extends two standard deviations from that mean to either side, we can be 95% confident that the true mean value will lie in this interval. The only problem is that we do not know the population standard deviation.

那么这对我们有什么帮助呢？如果我们知道总体均值及其采样分布的标准偏差，则可以从理论上生成样本并得出有关真实总体均值的推论。例如，如果我们采用样本均值，并创建一个将两个标准差从该均值扩展到任一侧的范围，则我们可以95％确信真实均值将在此区间内。唯一的问题是我们不知道总体标准偏差。

It turns out that we can take the sample standard deviation as an approximation for the true population value and account for this by extending the width of our confidence interval (by using t-values). If a sufficiently large sample is drawn from the population, this increases the accuracy of our population standard deviation estimate and we can therefore use z-scores instead.

事实证明，我们可以将样本标准差作为真实总体值的近似值，并通过扩展我们的置信区间的宽度(使用t值)来解决这个问题。如果从总体中抽取足够大的样本，则可以提高总体标准偏差估计的准确性，因此可以使用z得分代替。

Essentially, what all this means is that we can forecast our company value a certain number of times, which will lead to a sample from the population of company values. Then, we can create a 95% confidence interval for the true mean value to get an indication of where the true mean value (based on the four random input variables) will likely lie.

本质上，这意味着我们可以多次预测公司的价值，这将从公司价值的总体中得出样本。然后，我们可以为真实平均值创建95％的置信区间，以指示真实平均值(基于四个随机输入变量)可能位于何处。

二。用Python实现 (II. Implementation in Python)

While the underlying theory is somewhat complicated, the implementation of the model with Python is actually rather straightforward, as is explained in the subsequent sections.

尽管基础理论有些复杂，但是使用Python进行模型的实现实际上相当简单，这将在随后的部分中进行解释。

II.1计算公司的公允价值 (II.1 Calculating the Fair Value of a Company)

Using the yahoo_fin library, it is possible to get financial data on publicly traded companies. We can use this information to derive mean values for the Normally distributed variables used in the following simulations. In each iteration of the simulation, we forecast one specific company value, based on the values of the random variables. The code I used to get this value is quite similar to the one I wrote some time ago (except that yahoo_fin freed me from having to scrape all the financial data manually). As the focus here is on drawing inferences from sample data generated by a Monte Carlo simulation, I will not go into detail regarding the code and the simplifying assumptions underlying it. All that is important is that the following block of code will (hopefully) return a somewhat accurate estimate of the intrinsic value of a company, given specific inputs for company revenue CAGR, EBIT margins, WACC, and long-term growth rates.

使用yahoo_fin库，可以获取上市公司的财务数据。我们可以使用此信息来导出以下模拟中使用的正态分布变量的平均值。在模拟的每次迭代中，我们根据随机变量的值预测一个特定的公司价值。我用来获得该值的代码与我前一段时间编写的代码非常相似(除了yahoo_fin使我不必手动收集所有财务数据)。由于这里的重点是从由蒙特卡洛模拟生成的样本数据中得出推论，因此我将不详细讨论代码及其底层的简化假设。重要的是，给定公司收入CAGR，EBIT利润率，WACC和长期增长率的特定输入，下面的代码块将(希望)返回公司内在价值的准确估计。

from yahoo_fin import stock_info as sifrom matplotlib import pyplot as plt import pandas_datareader as drimport numpy as npimport pandas as pd'''----// General input variables //----'''company_ticker = 'MSFT'market_risk_premium = 0.059debt_return = 0.01long_term_growth = 0.01tax_rate = 0.3iterations = 1000'''----// Get financial information from yahoo finance //----'''income_statement_df = si.get_income_statement(company_ticker)pars_df = income_statement_df.loc[['totalRevenue', 'ebit']]input_df = pars_df.iloc[:, ::-1]'''----// Calculate average revenue CAGR & EBIT margin //----'''def get_cagr(past_revs): CAGR = (past_revs.iloc[0,3]/past_revs.iloc[0,0])**(1/4)-1 return(CAGR)def get_average_margin(past_ebit): margin = 0 margin_lst = [] for i in range(len(past_ebit.columns)): margin = past_ebit.iloc[1,i]/past_ebit.iloc[0,i] margin_lst.append(margin) return(sum(margin_lst)/len(margin_lst))mean_cagr = get_cagr(input_df)mean_margin = get_average_margin(input_df)'''----// Create forecast function through which random variables will flow //----'''def get_forecast(input_df, cagr, margin, long_term_growth): forecast_lst = [] for i in range(6): if i < 5: forecast_lst.append(input_df.iloc[0,3]*(1+cagr)**(i+1)*margin) else: forecast_lst.append(input_df.iloc[0,3]*(1+cagr)**(i)*(1+long_term_growth)*margin) return forecast_lst'''----// Get WACC and net debt //----'''def get_wacc(company_ticker, market_risk_premium, debt_return, tax_rate): risk_free_rate_df = dr.DataReader('^TNX', 'yahoo') risk_free_rate = (risk_free_rate_df.iloc[len(risk_free_rate_df)-1,5])/100 equity_beta = si.get_quote_table('msft')['Beta (5Y Monthly)'] equity_return = risk_free_rate+equity_beta*(market_risk_premium) balance_sheet_df = si.get_balance_sheet(company_ticker) short_term_debt_series = balance_sheet_df.loc['shortLongTermDebt'] long_term_debt_series = balance_sheet_df.loc['longTermDebt'] cash_series = balance_sheet_df.loc['cash'] net_debt = short_term_debt_series.iloc[0] + + long_term_debt_series.iloc[0] - cash_series.iloc[0] market_cap_str = si.get_quote_table(company_ticker)['Market Cap'] market_cap_lst = market_cap_str.split('.') if market_cap_str[len(market_cap_str)-1] == 'T': market_cap_length = len(market_cap_lst[1])-1 market_cap_lst[1] = market_cap_lst[1].replace('T',(12-market_cap_length)*'0') market_cap = int(''.join(market_cap_lst)) if market_cap_str[len(market_cap_str)-1] == 'B': market_cap_length = len(market_cap_lst[1])-1 market_cap_lst[1] = market_cap_lst[1].replace('B',(9-market_cap_length)*'0') market_cap = int(''.join(market_cap_lst)) company_value = market_cap + net_debt WACC = market_cap/company_value * equity_return + net_debt/company_value * debt_return * (1-tax_rate) return WACCdef get_net_debt(): balance_sheet_df = si.get_balance_sheet(company_ticker) short_term_debt_series = balance_sheet_df.loc['shortLongTermDebt'] long_term_debt_series = balance_sheet_df.loc['longTermDebt'] cash_series = balance_sheet_df.loc['cash'] return short_term_debt_series.iloc[0] + long_term_debt_series.iloc[0] - cash_series.iloc[0]mean_wacc = get_wacc(company_ticker, market_risk_premium, debt_return, tax_rate)net_debt = get_net_debt()'''----// Discount EBIT figures to arrive at the PV of the firm's cash flows //----'''def discount(forecast, discount_rate, long_term_rate): discount_lst = [] for x,i in enumerate(forecast): if x < 5: discount_lst.append(i/(1+discount_rate)**(x+1)) else: discount_lst.append(i/(discount_rate-long_term_rate)*(1/(1+discount_rate)**5)) return sum(discount_lst)forecast = get_forecast(input_df, cagr, margin, long_term_rate)present_value = discount(forecast, discount_rate, long_term_rate)-net_debt

The idea is to forecast company values repeatedly using loops and storing the resulting model outputs in a list that is later used to determine the sample mean and standard deviation.

想法是使用循环反复预测公司价值，并将结果模型输出存储在列表中，该列表随后用于确定样本均值和标准差。

II.2在Python中生成正态分布的随机变量 (II.2 Generating Normally Distributed Random Variables in Python)

I arrived at the mean CAGR, EBIT margin, and WACC figures by assuming that past values of these variables were accurate predictors of future values. The long-term growth rate of the company is more difficult to determine and should be entered on a case by case basis. The same is true for the standard deviation of each of the four variables. Given a mean value and standard deviation for each of the four variables, it is easy to get draws from a random distribution with numpy. In fact, we only need one line of code after importing the library.

通过假定这些变量的过去值是未来值的准确预测因素，我得出了CAGR，EBIT利润率和WACC的平均值。公司的长期增长率较难确定，应根据具体情况进行输入。四个变量中每个变量的标准偏差都相同。给定四个变量中每个变量的平均值和标准偏差，可以很容易地从具有numpy的随机分布中得出图形。实际上，导入库后，我们只需要一行代码。

cagr = np.random.normal(mean_cagr, 0.01)

Doing the same for the EBIT margin, WACC, and long-term growth rates, we can use the resulting figures in the calculation of the company value. Doing this once entails drawing a single value from the population of company values. Using a loop, we can repeat the process several times (1,000 in this case) and store the resulting company values in a list.

对EBIT利润率，WACC和长期增长率进行同样的操作，我们可以在计算公司价值时使用得出的数字。这样做一次需要从公司价值群体中汲取单一价值。使用循环，我们可以重复几次该过程(在这种情况下为1,000)，并将得到的公司价值存储在列表中。

'''----// Run simulation //----'''hist_lst = []for i in range(iterations): cagr = np.random.normal(mean_cagr, 0.01) margin = np.random.normal(mean_margin, 0.005) long_term_rate = np.random.normal(long_term_growth, 0.001) discount_rate = np.random.normal(mean_wacc, 0.001) forecast = get_forecast(input_df, cagr, margin, long_term_rate) hist_lst.append(discount(forecast, discount_rate, long_term_rate)-net_debt)hist_array = np.array(hist_lst)

Using numpy, we can easily find the mean and standard deviation of this list. Subsequently, the upper and lower bound of our 95% confidence interval can be calculated.

使用numpy，我们可以轻松找到该列表的平均值和标准差。随后，可以计算我们95％置信区间的上限和下限。

mean = hist_array.mean()standard_error = hist_array.std()/(iterations**(1/2))lower_bound = mean-1.96*standard_errorupper_bound = mean+1.96*standard_error

II.3图形化输出 (II.3 Graphing the Output)

Using matplotlib, we can also display the sample data graphically. This can help understand the distribution of the sample data better and also allows us to verify the assumption of Normality underlying the inferences drawn.

使用matplotlib，我们还可以以图形方式显示示例数据。这可以帮助更好地理解样本数据的分布，还可以验证所推论的正态性假设。

plt.hist(hist_array, bins=50, align='mid', color = 'steelblue', edgecolor='black')plt.title('Sample Distribution ' + company_ticker, {'fontname':'Calibri'})plt.xlabel('Equity Value in $', {'fontname':'Calibri'})plt.ylabel('Frequency', {'fontname':'Calibri'})plt.show() Figure 1: Histogram of the distribution of forecasted company values. As the data is approximately Normally distributed, we are able to compute confidence intervals as described above. 图1：公司预测值分布的直方图。由于数据大致呈正态分布，因此我们能够如上所述计算置信区间。

II.4模型性能 (II.4 Model Performance)

To get an impression of how the model was performing, I computed confidence intervals for the true mean value of different companies and subsequently compared the intervals with actual company market caps. The results look as follows.

为了给模型带来一个印象，我计算了不同公司的真实均值的置信区间，然后将该区间与实际公司的市值进行了比较。结果如下。

Figure 2: Model output for AAPL (actual mkt. cap = $1,941B) and MSFT (actual mkt. cap = $1,554B) 图2：AAPL(实际交易上限= $ 1,941B)和MSFT(实际交易上限= $ 1,554B)的模型输出 Figure 3: Model output for WMT (actual mkt. cap = $388B) and PG (actual mkt. cap = $340B) 图3：WMT(实际销售上限= $ 388B)和PG(实际销售上限= $ 340B)的模型输出 Figure 4: Model output for NKE (actual mkt. cap = $179B) and MRK (actual mkt. cap = $211B) 图4：NKE(实际最高销售量= $ 179B)和MRK(实际最高销售量= $ 211B)的模型输出

When comparing the confidence intervals to the actual market caps of the companies, it appears as if the model is somewhat off. While the program is certainly not a perfect representation of the real world, we must still take into account that the confidence intervals provide a range for the true mean value of the company and not for a single point estimate.

当将置信区间与公司的实际市值进行比较时，该模型似乎有些偏离。尽管该程序当然不能完美地代表现实世界，但我们仍必须考虑到置信区间为真实均值提供了一个范围公司的价值，而不是单点估计。

三，结束语 (III. Concluding Remarks)

Like the previous model I built to calculate the fair value of a company, this program also makes several simplified assumptions. For instance, I implicitly assume that the past values of the four random input variables are adequate approximations of their population mean value going forward. However, this assumption is potentially unwarranted and even allowing for some randomness in these input variables will not solve that issue. Despite this, I believe that incorporating randomness into the model takes it one step closer to becoming an adequate representation of real-world dynamics.

像我以前用来计算公司公允价值的模型一样，该程序也进行了一些简化的假设。例如，我隐式地假设四个随机输入变量的过去值是它们未来总体均值的足够近似值。但是，此假设可能毫无根据，即使允许这些输入变量具有一定的随机性也无法解决该问题。尽管如此，我相信将随机性纳入模型可以使它更接近成为真实世界动态的充分表示。

III.1免责声明 (III.1 Disclaimer)

The model and code is simply an exercise in applying Python programming to company valuation. Therefore, the code should obviously not be used to make investment decisions. Further, information pulled from Yahoo finance should not be used for any commercial purposes.

该模型和代码只是将Python编程应用于公司估值的一种练习。因此，该代码显然不应用于做出投资决策。此外，从Yahoo财务中提取的信息不得用于任何商业目的。

III.2最终代码 (III.2 Final Code)

The following is the overall code needed to run the simulations and draw inferences from the resulting sample data. All that is required for it to work is Python (I used 3.7.8) and several packages, namely yahoo_fin, matplotlib, pandas-datareader, numpy, and pandas.

以下是运行模拟并从生成的样本数据中得出推断所需的总体代码。要使其正常工作，仅需使用Python(我使用3.7.8)和几个软件包，即yahoo_fin，matplotlib，pandas-datareader，numpy和pandas。

from yahoo_fin import stock_info as sifrom matplotlib import pyplot as plt import pandas_datareader as drimport numpy as npimport pandas as pd'''----// General input variables //----'''company_ticker = 'MSFT'market_risk_premium = 0.059debt_return = 0.01long_term_growth = 0.01tax_rate = 0.3iterations = 1000'''----// Get financial information from yahoo finance //----'''income_statement_df = si.get_income_statement(company_ticker)pars_df = income_statement_df.loc[['totalRevenue', 'ebit']]input_df = pars_df.iloc[:, ::-1]'''----// Calculate average revenue CAGR & EBIT margin //----'''def get_cagr(past_revs): CAGR = (past_revs.iloc[0,3]/past_revs.iloc[0,0])**(1/4)-1 return(CAGR)def get_average_margin(past_ebit): margin = 0 margin_lst = [] for i in range(len(past_ebit.columns)): margin = past_ebit.iloc[1,i]/past_ebit.iloc[0,i] margin_lst.append(margin) return(sum(margin_lst)/len(margin_lst))mean_cagr = get_cagr(input_df)mean_margin = get_average_margin(input_df)'''----// Create forecast function through which random variables will flow //----'''def get_forecast(input_df, cagr, margin, long_term_growth): forecast_lst = [] for i in range(6): if i < 5: forecast_lst.append(input_df.iloc[0,3]*(1+cagr)**(i+1)*margin) else: forecast_lst.append(input_df.iloc[0,3]*(1+cagr)**(i)*(1+long_term_growth)*margin) return forecast_lst'''----// Get WACC and net debt //----'''def get_wacc(company_ticker, market_risk_premium, debt_return, tax_rate): risk_free_rate_df = dr.DataReader('^TNX', 'yahoo') risk_free_rate = (risk_free_rate_df.iloc[len(risk_free_rate_df)-1,5])/100 equity_beta = si.get_quote_table('msft')['Beta (5Y Monthly)'] equity_return = risk_free_rate+equity_beta*(market_risk_premium) balance_sheet_df = si.get_balance_sheet(company_ticker) short_term_debt_series = balance_sheet_df.loc['shortLongTermDebt'] long_term_debt_series = balance_sheet_df.loc['longTermDebt'] cash_series = balance_sheet_df.loc['cash'] net_debt = short_term_debt_series.iloc[0] + + long_term_debt_series.iloc[0] - cash_series.iloc[0] market_cap_str = si.get_quote_table(company_ticker)['Market Cap'] market_cap_lst = market_cap_str.split('.') if market_cap_str[len(market_cap_str)-1] == 'T': market_cap_length = len(market_cap_lst[1])-1 market_cap_lst[1] = market_cap_lst[1].replace('T',(12-market_cap_length)*'0') market_cap = int(''.join(market_cap_lst)) if market_cap_str[len(market_cap_str)-1] == 'B': market_cap_length = len(market_cap_lst[1])-1 market_cap_lst[1] = market_cap_lst[1].replace('B',(9-market_cap_length)*'0') market_cap = int(''.join(market_cap_lst)) company_value = market_cap + net_debt WACC = market_cap/company_value * equity_return + net_debt/company_value * debt_return * (1-tax_rate) return WACCdef get_net_debt(): balance_sheet_df = si.get_balance_sheet(company_ticker) short_term_debt_series = balance_sheet_df.loc['shortLongTermDebt'] long_term_debt_series = balance_sheet_df.loc['longTermDebt'] cash_series = balance_sheet_df.loc['cash'] return short_term_debt_series.iloc[0] + long_term_debt_series.iloc[0] - cash_series.iloc[0]mean_wacc = get_wacc(company_ticker, market_risk_premium, debt_return, tax_rate)net_debt = get_net_debt()'''----// Discount EBIT figures to arrive at the PV of the firm's cash flows //----'''def discount(forecast, discount_rate, long_term_rate): discount_lst = [] for x,i in enumerate(forecast): if x < 5: discount_lst.append(i/(1+discount_rate)**(x+1)) else: discount_lst.append(i/(discount_rate-long_term_rate)*(1/(1+discount_rate)**5)) return sum(discount_lst)'''----// Run simulation and plot distribution of model forecasts //----'''hist_lst = []for i in range(iterations): cagr = np.random.normal(mean_cagr, 0.01) margin = np.random.normal(mean_margin, 0.005) long_term_rate = np.random.normal(long_term_growth, 0.001) discount_rate = np.random.normal(mean_wacc, 0.001) forecast = get_forecast(input_df, cagr, margin, long_term_rate) hist_lst.append(discount(forecast, discount_rate, long_term_rate)-net_debt)hist_array = np.array(hist_lst)plt.hist(hist_array, bins=50, align='mid', color = 'steelblue', edgecolor='black')plt.title('Sample Distribution ' + company_ticker, {'fontname':'Calibri'})plt.xlabel('Equity Value in $', {'fontname':'Calibri'})plt.ylabel('Frequency', {'fontname':'Calibri'})plt.show()mean = hist_array.mean()standard_error = hist_array.std()/(iterations**(1/2))lower_bound = mean-1.96*standard_errorupper_bound = mean+1.96*standard_error print(lower_bound)print(upper_bound)

翻译自: https://towardsdatascience.com/company-valuation-using-probabilistic-models-with-python-712e325964b7

python 概率分布模型

python 概率分布模型

一，统计弯路 (I. Statistics Detour)

I.1抽样分布 (I.1 Sampling Distributions)

I.2推论 (I.2 Drawing Inferences)

二。 用Python实现 (II. Implementation in Python)

II.1计算公司的公允价值 (II.1 Calculating the Fair Value of a Company)

II.2在Python中生成正态分布的随机变量 (II.2 Generating Normally Distributed Random Variables in Python)

II.3图形化输出 (II.3 Graphing the Output)

II.4模型性能 (II.4 Model Performance)

三， 结束语 (III. Concluding Remarks)

III.1免责声明 (III.1 Disclaimer)

III.2最终代码 (III.2 Final Code)

二。用Python实现 (II. Implementation in Python)

三，结束语 (III. Concluding Remarks)