金字塔算法
The Minto’s Pyramid Principle is aimed to better organize and remember ideas with a pyramidal, and hierarchical structure. If you have to answer: what are the different classification algorithms? You may think of, Random Forest, KNN, Naive Bayes, Logistic Regression, and so on. But what the relationships between, and how to give a structured answer?
Minto的金字塔原理旨在更好地组织和记住具有金字塔结构和层次结构的思想。 如果您必须回答:有哪些不同的分类算法? 您可能会想到随机森林,KNN,朴素贝叶斯,逻辑回归等。 但是之间的关系是什么,如何给出结构化的答案呢?
Let’s apply the Minto’s Pyramid Principle.
让我们应用Minto的金字塔原理。
3 basic ways of classifying data into categories are:
将数据分类的3种基本方法是:
Linear classifier with a linear decision boundary. Or more precisely, the decision boundary is a hyperplane.
具有线性决策边界的线性分类器。 更确切地说,决策边界是一个超平面。
Nearest Neighbors by analyzing k nearest observations (after calculating the distance between the new observation and all training data)
通过分析k个最近的观测值(计算新观测值与所有训练数据之间的距离之后),确定最邻近的人
Decision trees construct (hyper)rectangles to regroup observations by minimizing gini or entropy.
决策树通过最小化基尼或熵来构造(超)矩形以重新组合观察。
Construct a hyperplane to separate the space into 2, this is the principle of linear classifiers. Now how to construct this hyperplane? We have several different ways.
构造一个超平面将空间分成2个,这是线性分类器的原理。 现在如何构造这个超平面? 我们有几种不同的方式。
Let’s first consider the binary classification case:
让我们首先考虑二进制分类的情况:
Logistic Regression uses the logistic function to transform the data into classes 0 and 1. And the value of 0.5 will give us the decision boundary, which is a hyperplane.
Logistic回归使用logistic函数将数据转换为0类和1类。值0.5将为我们提供决策边界,这是一个超平面。
SVM finds the hyperplane by maximizing the soft margin of the two classes.
SVM通过最大化两个类别的软裕量来找到超平面。
Linear Discriminant Analysis considers that the data of the two classes follow a multivariate normal distribution. And then by using Bayes’ theorem, calculate the posterior probability for each class. At the end, the most probable class wins! And by looking at which category is winning for each point in the space, we get a hyperplane. (On one condition: the covariance matrix is considered to be the same for all classes).
线性判别分析认为这两类数据遵循多元正态分布。 然后使用贝叶斯定理,计算每个类别的后验概率。 最后,最有可能的一班获胜! 通过查看哪个类别为空间中的每个点获胜,我们得到了一个超平面。 (在一种情况下:协方差矩阵对于所有类别都被认为是相同的)。
Now for multiclass classification
现在进行多类分类
For SVM, we usually use one-vs-rest. So the basic principle of the algorithm is not different. 对于SVM,我们通常使用one-vs-rest。 因此该算法的基本原理没有什么不同。 For Logistic Regression, we generalize the logistic function to get the softmax function and the classifier is then called softmax classifier. 对于Logistic回归,我们将logistic函数泛化以获得softmax函数,然后将分类器称为softmax分类器。 For LDA, by construction, we can handle multiclass classification by calculating a multivariate distribution for each class. 对于LDA,通过构造,我们可以通过为每个类计算多元分布来处理多类分类。The relationship between SVM and Logistic Regression is also explained by the difference in their loss functions: Logistic for Logistic Regression, and Hinge for SVM.
SVM和Logistic回归之间的关系还通过其损失函数的差异来解释:Logistic用于Logistic回归,而Hinge用于SVM。
And we can also find the relationship between all these linear classifiers and KNN by considering different notions of distance for each other. For a new observation, the principle of KNN is to analyze the nearest neighbors, and we actually don’t need any model. That is why we say that KNN is an instance-based classifier.
通过考虑彼此不同的距离概念,我们还可以找到所有这些线性分类器与KNN之间的关系。 对于新的观察,KNN的原理是分析最近的邻居,我们实际上不需要任何模型。 这就是为什么我们说KNN是基于实例的分类器。
For other ones, we consider a model, and in the case of linear models, this model a hyperplane separator. And how to classify a new observation? We need to measure the distance of the new observation from the hyperplane.
对于其他模型,我们考虑一个模型,对于线性模型,该模型是一个超平面分隔符。 以及如何对新观察结果进行分类? 我们需要测量新观测值与超平面的距离。
For KNN, the distance of a new observation from the training data is the Euclidean distance.
对于KNN ,从训练数据中得到的新观测值的距离是欧几里得距离。
For Linear Discriminant Analysis, the distance considered is the Mahalanobis Distance, of a new observation from the centers of the different classes.
对于线性判别分析,所考虑的距离是从不同类别的中心进行的新观测的马氏距离。
For SVM, we consider the distance from the hyperplane constructed by minimizing the soft margin. 对于SVM,我们考虑了通过最小化软裕量来构造与超平面的距离。 For Logistic Regression, the distance results from the logistic function transformation. 对于Logistic回归,距离是由Logistic函数转换得出的。A single decision tree is not very efficient in practice. But based on this principle of constructing trees, we can make many of them. The ensemble methods use in general decision trees. The different ways of creating a forest (composed of trees) give us different algorithms:
单个决策树在实践中不是很有效。 但是基于构造树木的原则,我们可以制造许多树木。 集成方法用于一般决策树。 创建森林(由树木组成)的不同方法为我们提供了不同的算法:
Bagging: in theory, we can bootstrap-aggregate all algorithms, in practice, it really happens for decision trees.
套袋:理论上,我们可以引导聚合所有算法,实际上,它确实发生在决策树上。
Random Forest: besides doing bagging, we can also select randomly the variables for each node. And this principle gives us Random Forest (which is also a trademark).
随机森林:除了进行装袋外,我们还可以为每个节点随机选择变量。 这个原则给了我们随机森林(也是商标)。
Gradient boosting: instead of taking the average for all trees, gradient boosting aggregate the trees constructed on residuals by adding them step-by-step.
梯度增强:梯度增强不是逐步取所有树木的平均值,而是通过逐步将残差构造的树木累加在一起。
By construction, KNN and tree-based models are non-linear classifiers. If the data is not linearly separable, they will be more performant than linear classifiers.
通过构造,KNN和基于树的模型是非线性分类器。 如果数据不是线性可分离的,则它们将比线性分类器更具性能。
Now, what can the linear classifiers do? Several enhancements can be done.
现在,线性分类器可以做什么? 可以进行一些增强。
For LDA, the matrix of covariance is considered the same for all classes. But if we compute a different matrix of covariance, then the decision boundary will not be linear anymore. And we get Quadratic Discriminant Analysis. And we can also consider that the variables are independent, then we get (Gaussian) Naive Bayes.
对于LDA,协方差矩阵对于所有类别均被视为相同。 但是,如果我们计算不同的协方差矩阵,那么决策边界将不再是线性的。 得到二次判别分析。 并且我们还可以考虑变量是独立的,那么我们得到(高斯)朴素贝叶斯。
For SVM, the Kernel Trick is used to transform data into spaces of higher dimensions, where they will be linearly separated.
对于SVM,内核技巧用于将数据转换为更高维度的空间,在这些空间中它们将被线性分离。
The same kernel trick can be used for Logistic Regression as well. 同样的内核技巧也可以用于Logistic回归。With Logistic Regression, we can also combine it several times, to get a Neural Network. If this is not clear for you, please read Visualize How a Neural Network works from Scratch.
使用Logistic回归,我们还可以将其组合几次,以获得神经网络。 如果您不清楚,请阅读Scratch中的“可视化神经网络的工作原理” 。
So finally, we have this Pyramid, for the main classifiers we use for classification tasks in machine learning. With a green background, we have linear classifiers. And in a blue background, we have non-linear classifiers.
最后,我们有了这个金字塔,用于在机器学习中用于分类任务的主要分类器。 背景为绿色,我们有线性分类器。 在蓝色背景中,我们有非线性分类器。
Do you think that this pyramid is useful for you? If you think that I should mention other algorithms, please comment.
您认为这座金字塔对您有用吗? 如果您认为我应该提到其他算法,请发表评论。
翻译自: https://medium.com/towards-artificial-intelligence/the-pyramid-principle-applied-to-classification-algorithms-b8118e14f405
金字塔算法