监督学习算法

科技2025-03-20 46

Hmmm….Algorithms huh!!!

嗯……算法！！！

As I pledged in my last article that I would be writing about algorithms in next article.

正如我在上一篇文章中保证的那样，我将在下一篇文章中介绍算法。

Here I am buddies.

我在这里，哥们。

Algorithms are the core to building machine learning models and here I am providing details about most of the algorithms used for supervised learning to provide you with intuitive understanding for where to use it and where not to.

算法是构建机器学习模型的核心，在这里，我将提供有关用于监督学习的大多数算法的详细信息，以使您直观地了解在何处使用或不使用它。

By the end of this article, you will be adept at algorithms from intuitive level of understanding.

到本文结束时，您将从直观的角度熟练掌握算法。

CAVEAT: I AM NOT DESCRIBING MATHS BEHIND IT INSTEAD HOW IT WORKS AND WHERE TO USE IT.

卡瓦特：我并不是在描述它的背后是数学，而只是说它的工作方式和使用位置。

So, folks here we go.

所以，我们这里的人们。

1.朴素的宝贝 (1.NAIVE BAYES)

Naive Bayes are the algorithms used for classification based on Bayes theorem and it is the foundational algorithm to know at most for machine learning.

朴素贝叶斯是基于贝叶斯定理的分类算法，是机器学习中最多了解的基础算法。

Advantages:

好处：

It is very helpful for handling large amount of datasets and generalizes the data accurately for such large datasets.

这对于处理大量数据集以及将如此大的数据集准确地概括化数据非常有帮助。 Applied mostly in classification problems eg. spam detection,spam filtering,sentiment analysis,fraud detection, recommendation engine etc.

主要用于分类问题，例如。垃圾邮件检测，垃圾邮件过滤，情感分析，欺诈检测，推荐引擎等。

Disadvantages:

缺点：

It is naive i.e doesn’t understand data in ordered format like in text learning.(Still it is preferred for its speed and easiness of use).

这是幼稚的，即不像文本学习那样理解顺序格式的数据。(尽管如此，它还是速度和易用性的首选。 Stock price prediction 股价预测

2.物流回归(2.LOGISTIC REGRESSION)

Logistic regression by name sounds algorithm for regression but in-fact it is a classification algorithm. It is a linear and simplest classification algorithm.

Logistic回归的名称听起来很适合回归，但是实际上它是一种分类算法。它是一种线性和最简单的分类算法。

Pros:

优点：

It is simple and interpretable.

它简单易懂。 It works best for linear data i.e when classes we are trying to predict are non-overlapping and linearly separable.

它最适合线性数据，即当我们尝试预测的类不重叠且线性可分离时。

Cons:

缺点：

When classes are non-linear, it will fail.

当类是非线性的时，它将失败。 It can’t handle complex problems.

它无法处理复杂的问题。

3，线性回归 (3.Linear Regression)

Linear Regression is also a linear model but used for regression problems.

线性回归也是线性模型，但用于回归问题。

Advantages:

好处：

It is also simple, interpretable and hard to overfit.

它也很简单，易于解释并且很难过拟合。 It is best when the relationship between input and output variables is linear.

最好输入和输出变量之间的关系是线性的。

Disadvantages:

缺点：

It will underfit the data when the relationship between input and output is nonlinear i.e it fails to generalize non linear data accurately.

当输入和输出之间的关系是非线性的时，它将不足以拟合数据，即它不能准确地概括非线性数据。 It also can’t model complex relationships.

它也不能为复杂的关系建模。

4.K_NEAREST_NEIGHBORS (4.K_NEAREST_NEIGHBORS)

It is an algorithm that has the ability to model non-linear data as well as linear data efficiently. It is used for both regression and classification problems.

它是一种能够对非线性数据以及线性数据进行有效建模的算法。它用于回归和分类问题。

Advantages:

好处：

Albeit being simple and interpretable ,it is highly flexible and efficient at learning more complex, non-linear relationships.

尽管简单易懂，但它在学习更复杂的非线性关系方面具有很高的灵活性和效率。 Used in recommender systems,like in Netflix, spotify etc.

用于推荐系统，例如Netflix，spotify等。

Disadvantages:

缺点：

It doesn’t work well when no of observations and features grow i.e doesn’t generalized well for large datasets.

当没有观测值和特征增长时，它就不能很好地工作，即对于大型数据集，不能很好地推广。

5.支持向量机(SVM) (5.SUPPORT VECTOR MACHINES(SVM))

SVM are highly flexible algorithms that make a separating data-line between datasets. It can be used for both regression and classification.

SVM是高度灵活的算法，可在数据集之间建立单独的数据线。它可以用于回归和分类。

Advantages:

好处：

It can handle complex datasets as well.

它也可以处理复杂的数据集。 It works for nonlinear data too.

它也适用于非线性数据。

Disadvantages:

缺点：

Prone to noise.

容易产生噪音。 Don’t work well for large datasets.

对于大型数据集，效果不佳。

6，基于树的方法 (6.TREE BASED METHODS)

Tree based methods are the most effective algorithms developed for solving extremely complex domains of problems. It is compatible for both classification and regression problems.

基于树的方法是为解决极其复杂的问题领域而开发的最有效的算法。它适用于分类和回归问题。

There are many tree based methods:

有许多基于树的方法：

1.Decision tree 2.Bagging 3.Random Forests 4.Boosting(Gradient boost, Ada Boost, XG Boost).

1.决策树2.装袋3.随机森林4.Boosting(渐变增强，Ada增强，XG增强)。

Advantages:

好处：

These methods are best for supervised learning for prediction problems.

这些方法最适合于预测问题的监督学习。 Handle complex relationships along with handling missing data and categorical features in an adept way.

以熟练的方式处理复杂的关系以及处理丢失的数据和分类特征。

Disadvantages:

缺点：

Difficult to interpret and might take long to train the model as well.

难以解释，并且可能需要很长时间来训练模型。

7，神经网络 (7.NEURAL NETWORKS:)

Neural networks are the state of the art technique to generalize even the most complex problems out there in the world. These algorithms come under deep learning which is the most complex still the most efficient model to handle cumbersome problems and get the best metrics for our problems. Since these methods are really complex, we should first try to use above simple linear models before getting our hands dirty on neural networks.

神经网络是最先进的技术，可以概括世界上最复杂的问题。这些算法经过深度学习，这是处理复杂问题并获得解决问题的最佳指标的最复杂，最有效的模型。由于这些方法确实很复杂，因此在弄乱神经网络之前，我们应该首先尝试使用上述简单的线性模型。

Hooo🥱…..finally the article is over but not the learning process. I have provided the basic understanding of these algorithms used for machine learning from an intuitive perspective so that you would be able to perceive them with breeze. Next its up-to you to get more adept at these topics.

🥱…..最后文章结束了，但学习过程还没有结束。我已经从直观的角度提供了对这些用于机器学习的算法的基本理解，以便您能够轻而易举地理解它们。接下来由您自己决定，以进一步熟悉这些主题。

I guess you got a bit of concepts on these algorithms from this article. I hold my pen here. Oops I hold my hands out of my keyboard😂😂.

我猜您从本文中对这些算法有了一些概念。我在这里握笔。糟糕，我将手伸出键盘😂😂。

Anyway….

无论如何…。

Thank you.And yeah be happy and don’t worry .Just take a small step at a time and you will reach the summit in a jiffy.

谢谢。是的，快乐，别担心。只需一次走一小步，您就可以轻松到达山顶。

翻译自: https://medium.com/analytics-vidhya/supervised-learning-algorithms-ad934e0b1834

Processed: 0.009, SQL: 8