数据科学和机器学习

科技2022-07-31 129

数据科学和机器学习

可能性 (Probability)

Often in life, we are confronted with uncertainty. Be it in rolling dice, stock price, or the winner of the champions league or any other things. Suppose I have a coin and I am going to flip it. How likely it is to come up head or tail or even side? By instinct, we say it is less likely to come side as an outcome of our experiment. But how can we represent such uncertainty in numbers? This is where the probability comes into play.

在生活中FTEN O，我们面临着不确定性。无论是掷骰子，股票价格，还是冠军联赛冠军或其他任何东西。假设我有一个硬币，我要掷它。头，尾或什至侧面出现的可能性有多大？凭直觉，我们说它不太可能作为我们实验的结果。但是，我们如何用数字来表示这种不确定性呢？这就是概率起作用的地方。

It is a mathematical tool that helps to quantify the uncertainty of events so as to know what is likely to happen and what is not. Probability is a measurement of how strongly we believe things about the world.

它是一种数学工具，可帮助量化事件的不确定性，从而了解可能发生的事情和不发生的事情。概率是对我们对世界事物的信任程度的度量。

样本空间 (Sample Space)

Whenever we think about probability we always keep a set of possible outcomes in mind. This set is called sample space and is denoted by the capital letter S.

w ^ henever我们想想我们的概率始终保持一套记可能的结果。此集称为样本空间，并用大写字母S表示。

For example in tossing the coin, sample space S = {head, tail} as they are the only possibility that can happen when we toss the coin. For tossing two coins, the corresponding sample space would be {(head,head), (head,tail), (tail,head), (tail,tail)}, commonly written {HH, HT, TH, TT}.

例如，在抛硬币时，样本空间S = {head，tail}，因为它们是我们抛硬币时可能发生的唯一可能性。为了扔两个硬币，相应的样本空间将是{(head，head)，(head，tail)，(tail，head)，(tail，tail)}，通常写为{HH，HT，TH，TT}。

样本空间的条件 (Condition for Sample Space)

Elements in sample space need to be mutually exclusive meaning that the occurrence of two or more elements cannot happen at the same time. We do not expect both head and tail at the same time when we toss the coin so they are the elements.

样本空间中的元素必须互斥，这意味着不能同时发生两个或更多元素。我们不希望在抛硬币时同时看到头和尾，因此它们是要素。

P(S) = 1, i.e., if A is the entire sample space S, then P(A) = 1 i.e. the probability of all possible outcomes must equal 1. In a coin toss example, the Probability of head or tail would be equal to 1.

P(S)= 1，即，如果A是整个样本空间S，则P(A)= 1，即所有可能结果的概率必须等于1。在抛硬币的示例中，头或尾的概率为等于1。

The sample space shouldn't contain irrelevant information. For instance, in the trial of tossing a coin, we could have as a sample space S1 = {H, T} where H stands for Heads and T for Tails. Another sample space could be S2 = { H and R, H and NR, T and R, T and NR}. Here R stands for Rain and NR stands for Not Rain. Obviously S1 is a better choice than S2 as we do not care about how the weather affects the tossing of a coin.

样本空间不应包含不相关的信息。例如，在抛硬币的试验中，我们可以将S1 = {H，T}作为样本空间，其中H代表Heads，T代表Tails。另一个样本空间可以是S2 = {H和R，H和NR，T和R，T和NR} 。在此R代表Rain，NR代表Not Rain。显然，S1比S2是更好的选择，因为我们不在乎天气如何影响投币。

大事记 (Events)

The subset of sample space is called events to which probabilities are assigned and is denoted by P(A). If we toss a dice and let S= {1,2,3,4,5,6} then A = {1,2,3,6},B = {4,5,6} could be an example of events. In English words, event A simply means tossing dice and getting 1,2,3 or 6. We say event A is true only when an experiment leads to an outcome that is in the event otherwise it is said False. For example, tossing dice and getting outcome as 4 leads to event A as False while B as True. Similarly getting 6 makes both events A and B True.

吨他子集的样本空间被称为到概率被分配，并且通过P(A)表示的事件。如果我们掷骰子并让S = {1,2,3,4,5,6}，则A = {1,2,3,6}，B = {4,5,6}可能是事件的一个例子。用英语来说，事件A只是意味着掷骰子并得到1,2,3或6。我们说事件A仅在实验导致结果的情况下才为真，否则为False。例如，掷骰子并获得4的结果将导致事件A为False，而事件B为True 。同样，得到6会使事件A和B都为 True 。

For any event A, the probability of its occurrence(or being True) is denoted by P(A). The value of P(A) ranges between 0 and 1. A value closer to zero means event A is less likely to happen and closer to 1 means it is more likely to happen. P(A) = 1 means we are certain that events A will be True while P(A) = 0 means it will be False.

对于任何事件A ，其发生(或为True)的概率由P(A)表示。 P(A)的值在0到1之间。值接近零表示事件A发生的可能性较小，值接近1表示事件A发生的可能性较大。 P(A)= 1表示我们确定事件A为True，而P(A)= 0表示事件为False。

计算概率 (Calculating Probabilities)

作为长期相对频率(计数事件) (As Long Run Relative Frequency(counting events))

Probability is just the math of proportions. We have two sets of outcomes that are very important while calculating those proportions. The first is the possible outcome of an event. The second one is the count of outcome that you are interested in. Given these two sets of outcomes, all we care about is the ratio of outcomes we are interested in to the total number of possible outcomes.

可操作性只是比例的数学。在计算这些比例时，我们有两组非常重要的结果。首先是事件的可能结果。第二个是您感兴趣的结果的数量。考虑到这两组结果，我们关心的只是我们感兴趣的结果与可能结果总数之间的比率。

Now, let's ask a question what is the probability of getting at least one head while tossing two coins? The set of possible outcomes, in this case, is {(heads, heads),(heads, tails),(tails, heads)}. As you can see, the set of events we care about has 3 elements, and there are 4 possible pairs we could get. This means that P(at least one heads) = 3/4.

现在，让我们问一个问题，扔两个硬币时至少有一个头的概率是多少？在这种情况下，可能的结果集是{(头，头)，(头，尾)，(尾，头)}。如您所见，我们关注的事件集包含3个元素，并且可以获得4个可能的对。这意味着P(至少一个头)= 3/4。

Note: It is intuitive to think probability as the long-run relative frequency of each possible outcome. For example, if I say the probability of the head while tossing a coin is 50 % means when we toss the coin many, many times we expect the relative frequency of head would be very close to 0.5 but we might get all 5 heads during our first 5 tosses of the coin.

注意：直观地将概率视为每种可能结果的长期相对频率。例如，如果我说抛硬币的正面概率为50％，则意味着当我们多次抛硬币时，我们多次预期正面的相对频率将非常接近0.5，但在我们进行过程中我们可能会获得全部5个正面硬币的前5次抛掷。

概率为信念 (Probability as belief)

Counting events is useful for physical objects, but it’s not so great for the vast majority of real-life probability questions such as tomorrow’s weather or the winner of presidential elections as it is impossible to repeat such experiments many many times. In such cases, we think of probability as a measure of belief. Assigning a probability to such belief is like finding how much you would bet on that belief. To calculate it, we need to establish how many times more we believe in one hypothesis over another.

çounting事件是对物理对象是有用的，但它不是绝大多数的现实生活中的概率问题如此之大，如明天的天气或总统选举的赢家，因为它是不可能重复这种实验很多很多次。在这种情况下，我们将概率视为信念的度量。为这种信念分配概率就像发现您会在该信念上下注多少。要计算它，我们需要确定一个假设相对于另一个假设要相信多少倍。

For example, Suppose you are chatting about Football with your friend. Your friend asks if you think Ronaldo will score in upcoming games. Looking at his form you are pretty confident he will definitely score. You decide to put your money where you are telling your friend: “you give me $20 if he scores, and I’ll give you $100 if he doesn't”. You believe that the Ronaldo not scoring the goal is so unlikely that you’ll give your friend $100 if you are wrong and only get $20 from him if you are right.

例如，假设您正在和朋友聊天。您的朋友问您是否认为罗纳尔多会在即将到来的比赛中得分。从他的状态来看，你很有信心他一定会得分。您决定把钱放在告诉好友的地方：“如果他得分，您给我20美元，如果他不得分，我给您100美元”。您认为罗纳尔多没有进球就不太可能，如果您做错了，您将给您的朋友100美元，如果您做对了，您只会给他20美元。

We can express your bet in terms of odds as “100 to 20.” i.e. O(H) = 5.Odds are a common way to represent beliefs as a ratio of how much you would be willing to pay if you were wrong about the outcome of an event to how much you’d want to receive for being correct. We can write this as the ratio of belief in Ronaldo scoring P(A), to your friend's belief Ronaldo, not scoring, P(B) like so :

我们可以将您的赌注表示为“ 100到20”。即O(H)=5。赔率是一种常见的表示信念的方式，即如果事件的结果有误，您愿意支付的费用与为得到正确而希望获得的费用的比率。我们可以将其写为Ronaldo得分P(A)与您朋友的Ronaldo信念而不是P(B)的比率，如下所示：

From the ratio of these two beliefs, we can see that your belief on the hypothesis that Ronaldo will score a goal is 5 times more than your belief in your friend's hypothesis. We can use this fact to work out the exact probability for your hypothesis using some high school algebra

从这两种信念的比率，我们可以看出，您对罗纳尔多将进球的假设的信念比对朋友的假设的信念高出5倍。我们可以利用一些高中代数，利用这一事实为您的假设找出确切的概率

As there are only two possibilities either Ronaldo scoring or not scoring we can write :

由于罗纳尔多得分或不得分只有两种可能性，我们可以写：

Now you have a nice, clearly defined value between 0 and 1 to assign as a concrete, quantitative probability to your belief. We can generalize this process of converting odds to probability using the following equation:

现在，您有了一个不错的，明确定义的值(介于0和1之间)，可以将具体的量化概率分配给您的信念。我们可以使用以下公式来概括将赔率转换为概率的过程：

You would likely take a billion to 1 bet that the sun will rise tomorrow, but you might take much lower odds for tomorrow's weather.

您可能会以十亿比一的赌注押注明天太阳会升起，但是您可能会以较低的几率考虑明天的天气。

互斥活动 (Mutually Exclusive Events)

Two events are said to be mutually exclusive if they cannot be true at the same time. In rolling dice, events A = {1,2,3}, events B = {4,5,6} are mutually exclusive events since they cannot happen at the same time while C= {3,4,5} and D = {4,5,6} are not as getting an outcome either 4 or 5 can make both events C and D True.

牛逼禾事件被认为是相互排斥的，如果他们不能在同一时间是真实的。在掷骰子中，事件A = {1,2,3} ，事件B = {4,5,6}是互斥的事件，因为它们不能同时发生，而C = {3,4,5}和D = {4,5,6}并没有得到结果4或5可以使事件C和D都成立。

条件概率 (Conditional Probability)

Independent events often don't reflect in real life. For example, the probability of you doing great in Internship and the probability you are offered a job in the company are not independent events. If you do well in an internship you are far more likely to get a job than you would otherwise.

独立事件通常不会反映在现实生活中。例如，您在实习中表现出色的机率和为您提供公司工作的机率并非独立事件。如果您在实习中表现出色，那么您获得工作的可能性将比其他情况要高得多。

If A and B are two events then the symbol P(A|B) denotes the probability of A is True given that event B is already True and it is called the conditional probability of A given B. Given that the event B is already True does not provide information about whether event A is True or not but it might affect the probability of event A, being True. Conditional probability helps us to demonstrate how information can change our beliefs. P(A|B) stands for the fraction of the time that A occurs once we know that B occurs.

如果A和B是两个事件，则符号P(A | B)表示假设事件B已经为True ，则A为True的概率，这被称为A给定B的条件概率。鉴于事件B已经为True不提供有关事件A是否是真还是假，但它可能会影响事件A的概率是真实的信息。条件概率可以帮助我们证明信息如何改变我们的信念。 P(A | B)代表一旦我们知道B发生，A发生的时间的一部分。

For example S = {1,2,3,4,5,6} , A = {1,2,3,5} , B = {3,4}. Then P(A)= 2/3 and P(B) = 1/3. But if we are given B is True then the outcome of this experiment either needs to be 3 or 4 with equal probability of 50 % Therefore P(A|B) = 1/2 since there is a 50% chance that outcome is 3 which ultimately makes event A True.

例如S = {1,2,3,4,5,6} ， A = {1,2,3,5} ， B = {3,4}。那么P(A)= 2/3和P(B)= 1/3 。但是，如果我们给定B为True，则该实验的结果需要为3或4，且概率为50％，因此P(A | B)= 1/2，因为有50％的可能性结果为3，因此最终使事件一个真实的。

Similarly, let’s suppose P(A = girl) = 1/2 and P( B= pregnant) = 1/3. But once we are given that event B is already True then P(A|B) becomes certain with a probability of 1 as there is almost no way boy can be pregnant.

同样，假设P(A =女孩)= 1/2 ， P(B =怀孕)= 1/3。但是一旦我们知道事件B已经为True，那么P(A | B)就可以确定为概率为1，因为几乎没有办法让男孩怀孕。

One thing to keep in mind is that P(A|B) is not equal to P(B|A). P(pregnant|girl) is much much lower than P(girl|pregnant)for obvious reason.

要记住的一件事是P(A | B)不等于P(B | A) 。 P(怀孕|女孩)大于P远低得多|出于显而易见的原因(女孩怀孕了)。

Suppose your friend is rolling the dice secretly and tell you that outcome is odd(Say event A). Now with this information what is the probability that outcome is prime(Say event B)? Our initial Sample space S = {1,2,3,4,5,6} without any prior information. Our reduced possible outcome of an event once we have the information that the result of an outcome is odd, is A = {1,3,5}. similarly outcome of an event B = {3,5}. Here we have to find what fraction of the time that B occurs once we know that A occurs.

假设您的朋友正在秘密地掷骰子，并告诉您结果是奇怪的(说事件A)。现在，有了这些信息，结果为首要的概率是多少(说事件B)？我们的初始样本空间S = {1,2,3,4,5,6}，没有任何先验信息。一旦得知结果的结果是奇数，我们就可以减少事件的可能结果，即A = {1,3,5}。类似地，事件B的结果= {3,5}。在这里，我们必须找出一旦知道A发生，B发生的时间的百分比。

By definition of probability P(A|B) = count of outcome we are interested/count of the possible outcome. In our case, the count of outcome we are interested is equal to the total no of elements that make both events A and B True (say n(A and B)and the count of possible outcome is equal to the total no of elements in an event A(say(n(A)).In the above example, n(A) = 3 and n(A and B) = 2 as 3 and 5 make events both A and B True.

通过定义概率P(A | B)=结果数，我们感兴趣/可能结果数。在我们的案例中，我们感兴趣的结果计数等于使事件A和B都成立的元素总数(例如n(A和B))，可能结果的计数等于其中事件A的总数。在上面的示例中，n(A)= 3且n(A和B)= 2为3和5使事件A和B均为True。

Dividing above equation by R.H.S by total no of elements in sample space without prior info(say n(s)) we get,

在没有先验信息(例如n(s))的情况下，将上述等式除以RHS除以样本空间中的元素总数，得出：

独立活动 (Independent Events)

If A and B are two independent events then knowing that event B is True doesn’t affect the probability of event A being True i.e. the events are unrelated. For example. consider two events, the winner of the USA presidential election 2020 and the number of claps in this medium post. Both of them can be considered independent events as the probability of one of them occurring when another is already True have no effect.

如果 A和B是两个独立的事件，则知道事件B为True不会影响事件A为True的可能性，即事件无关。例如。考虑两个事件，分别是2020年美国总统选举的获胜者和该职位的拍手次数。这两个事件都可以视为独立事件，因为其中一个事件在另一个事件已经为True时发生的可能性不起作用。

与AND相结合的概率 (Combining probabilities with AND)

如果活动是互斥的 (If Events are mutually exclusive)

For mutually exclusive events, P (A and B ) = 0 by the definition. For example, while tossing a single coin, the probabilities of getting the head and tail is 0 as they cannot happen at the same time.

F或互斥事件，根据定义，P(A和B)= 0。例如，扔一个硬币时，获得头部和尾部的概率为0，因为它们不能同时发生。

如果事件是独立的(概率的产品规则) (If Events are independent(Product rule of Probability))

In probability we use AND to talk about the probability of combined events. For example, the probability of a)Rolling 6 in dice AND flipping a heads b)Raining and canceling meeting.

在概率中，我们使用AND讨论合并事件的概率。例如，a)在骰子上滚动6并翻转一个头的可能性b)开会和取消会议的可能性。

Suppose we want to know the probability of getting a head in a coin flip AND rolling a 6 on a die.let’s imagine these events happening in sequence. When we flip the coin we have two possible outcomes, heads and tails. Now for each head and tail, there are 6 possible outcomes of dice which can be depicted in the following figure.

假设我们想知道在硬币掷出时朝头掷骰子并在骰子上掷出6的可能性。让我们想象这些事件是按顺序发生的。当我们掷硬币时，会有两种可能的结果，正面和反面。现在，对于每个头和尾，骰子有6种可能的结果，如下图所示。

fig 1: Visualizing the possible outcomes from a coin toss and the roll of a die 图1：可视化抛硬币和掷骰子可能产生的结果

From the above figure, it is clear that there are 12 possible outcomes and there is a single outcome of which we are interested. So by the definition of Probability, P(Head, Six) = 1/12 which is equivalent to the P(Head) × P(Six). So rather than counting all possible events, we can count only the probabilities of the events we care about by following along the branches.Itillustrates a product rule for combining probabilities with AND:

从上图可以明显看出，有12种可能的结果，并且我们感兴趣的是一个结果。因此，根据概率的定义，P(头，六)= 1/12，它等于P(头)×P(六)。因此，除了计算所有可能的事件之外，我们还可以沿着分支进行跟踪，从而仅计算我们关心的事件的概率，它说明了将概率与AND相结合的乘积规则：

如果事件是依赖的 (If events are dependent)

From Conditional Probability example,

从条件概率示例中，

It cannot be True because two probabilities are the same. We know that, while the probability of picking a male or a female is the same, if we pick a female, the probability that she is Pregnant should be much much higher than for a male. So the true probability of finding a male who is pregnant is the probability of picking a male multiplied by the probability that he is pregnant. Mathematically, we can write this as:

它不可能为True，因为两个概率相同。我们知道，虽然选择男性或女性的可能性相同，但如果我们选择女性，则她怀孕的可能性应该比男性高得多。因此，找到一个怀孕男性的真正可能性是选择一个男性的可能性乘以他怀孕的可能性。数学上，我们可以这样写：

We can generalize this solution to rewrite our product rule as follows:

我们可以概括此解决方案以重写我们的产品规则，如下所示：

This definition works for independent probabilities as well, because for independent probabilities P(B) = P(B | A).

该定义也适用于独立概率，因为对于独立概率P(B)= P(B | A)。

将概率与OR结合 (Combining probabilities with OR)

如果活动是互斥的 (If Events are mutually exclusive)

What is the probability of getting heads or tails on a coin toss?

掷硬币的正面或反面的概率是多少？

It is intuitive to add the probability of these events together i.e. it must be equal to 1. We know this works because heads and tails are the only possible outcomes, they cannot happen at the same time and the probability of all possible outcomes must equal 1. Similarly, what is the probability of getting 1 or 6 while rolling the dice? We know that the probability of rolling a 1 is 1/6, and the same is true for rolling a 2. So we can perform the same operation, adding the two probabilities, and see that the combined probability of rolling either a 1 OR a 2 is 2/6, or 1/3.

将这些事件的概率加在一起即为直觉是很直观的，即它必须等于1。我们知道这是可行的，因为正反是唯一可能的结果，它们不可能同时发生，并且所有可能结果的概率必须等于1同样，掷骰子时获得1或6的概率是多少？我们知道掷出1的概率是1/6，掷出2的概率也是如此，因此我们可以执行相同的操作，将两个概率相加，然后看到掷出1或a的组合概率2是2/6或1/3。

如果事件不是互斥的(概率总和) (If Events are not mutually exclusive(Sum rule of Probability))

Let's look at the example of the probability of getting head or rolling 6 on dice. We may assume adding the probabilities of events work in this case as well.

让我们看一下在骰子上获得正面或侧面6的概率的示例。在这种情况下，我们可以假设添加事件的概率也可以工作。

The problem is that these events are not mutually exclusive as both events can happen at the same time. The reason just adding the probabilities does not work for non mutually exclusive events is that doing so double the counting of events where both things happen. From figure 1 it can be seen that there are six outcomes that satisfy the condition of flipping heads: they are <(Heads — 1),(Heads — 2)……(Heads — 6)>and two that satisfy the condition of rolling a 6 :they are <(Heads — 6),(Tails — 6)>. We might be tempted to say that there are eight outcomes that represent getting either head or rolling a 6. However, we would be double-counting because (Heads — 6 )appears in both lists. So in order to correct probability, we must add individual probabilities and then subtract the probability of both events occurring which rules to the sum rule of probability as:

问题在于这些事件不是互斥的，因为两个事件可以同时发生。仅添加概率对非互斥事件不起作用的原因是，这样做会使在两种情况同时发生的事件计数加倍。从图1中可以看出，有六个满足翻转头部条件的结果：<(Heads_1)，(Heads_2)……(Heads_6)>，另外两个满足滚动条件a 6：它们是<(Heads-6)，(Tails-6)>。我们可能会想说有8个结果代表获得领先或获得6分。但是，由于(Heads-6)出现在两个列表中，因此我们将进行重复计算。因此，为了校正概率，我们必须将各个概率相加，然后将两个规则同时发生的事件的概率减去概率的总和为：

翻译自: https://medium.com/swlh/probability-for-machine-learning-and-data-science-cccd4f4f1df1

数据科学和机器学习

相关资源：微信小程序源码-合集6.rar

Processed: 0.017, SQL: 9