如何开始使用数据科学

    科技2022-08-01  146

    So, as the name gives it away, data science is all about data, processing it with scientific methods, algorithms, etc. It includes a lot of concepts and theories like Statistics, Probability, Advanced Calculus, Computer Science, Information Science, etc. Before we move on further, there are some buzz terms with which people usually start comparing with Data science, words like Machine learning, Deep Learning, and Artificial Intelligence. So, to be able to differentiate between all these concepts, refer to this blog by towardsdatascience:

    因此,顾名思义,数据科学就是关于数据,用科学方法,算法等对其进行处理的数据科学。它包括许多概念和理论,例如统计,概率,高级微积分,计算机科学,信息科学等。在继续进行之前,人们通常会开始使用一些流行语来与数据科学进行比较,例如机器学习,深度学习和人工智能。 因此,为了能够区分所有这些概念,请朝着datascience的方向访问此博客:

    在哪里使用? (Where is it applied?)

    Data Science practices are used in many big tech companies like Netflix, Amazon, Google, etc. For eg., when you open Netflix, you get movie or web series recommendations based on what you have seen earlier using your account, same in amazon for product recommendations, etc, not only this but if we talk about the Healthcare sector, it had some great benefits from data science applications in Medical Image Analysis, Drug development, etc. Even your speech and face recognition systems in our mobile phones and laptops that we use daily are an application of data science. Most of our daily-life problems can be solved using data science.

    数据科学实践已在Netflix,Amazon,Google等许多大型科技公司中使用。例如,当您打开Netflix时,会根据您之前使用帐户看到的内容获得电影或网络连续剧推荐,在亚马逊中也是如此产品推荐等,不仅是这个,而且如果我们谈论医疗保健领域,它在医学图像分析,药物开发等领域的数据科学应用中也具有很多优势。甚至您在我们的手机和笔记本电脑中的语音和面部识别系统我们每天使用的都是数据科学的应用程序。 使用数据科学可以解决我们大多数的日常生活问题。

    Andrew McAfee 安德鲁·迈克菲

    如何开始学习? (How to get started with learning?)

    It’s pretty simple. First of all, as we discussed in the starting that data science includes mathematics a lot, so first get familiar with some Maths concepts like Probability, Statistics, Linear Algebra, Calculus.

    很简单 首先,正如我们在一开始所讨论的那样,数据科学包括很多数学,因此首先要熟悉一些数学概念,例如概率,统计,线性代数,微积分。

    After that, pick a programming language. There’s always a fight between Python and R because these are the two languages most of the data scientists use to build models and algorithms. I prefer python more because I started with it too, and it’s a straightforward language to understand and begin with, even if you are new at programming. Make yourself entirely comfortable with the language you have chosen. Make sure if you have you are selecting python, make yourself familiar with libraries like NumPy, pandas, Matplotlib, sklearn, etc. because these will be used a lot once you start learning data science.

    之后,选择一种编程语言。 Python和R之间始终存在争斗,因为这是大多数数据科学家用来构建模型和算法的两种语言。 我更喜欢python,因为我也是从python开始的,即使您是编程新手,它也是一种易于理解和入门的简单语言。 使自己完全适应所选的语言。 确保是否要选择python,让自己熟悉NumPy,pandas,Matplotlib,sklearn等库,因为一旦开始学习数据科学,它们将被大量使用。

    First Learn these topics before hopping onto algorithms and concepts of data science:

    在跳到数据科学的算法和概念之前,请先学习以下主题:

    Web Scraping(Scrapy)

    网页抓取(草稿) Data Acquisition(BeautifulSoup)

    数据采集​​(BeautifulSoup) Data Visualization(One of the most important thing in whole Data Science, it can be done using Pandas, Matplotlib, and many other libraries)

    数据可视化(整个数据科学中最重要的事情之一,可以使用Pandas,Matplotlib和许多其他库来完成)

    After this, start learning machine learning algorithms, in the following order:

    之后,按以下顺序开始学习机器学习算法:

    Linear and Logistic Regression (Including Locally Weighted Regression (LOWESS))

    线性和逻辑回归(包括局部加权回归(LOWESS)) K-Nearest Neighbours and Clustering algorithm

    K最近邻居和聚类算法 Naive Bayes Classifier, Gaussian Naive Bayes(Try learning this in Depth.)

    朴素贝叶斯分类器,高斯朴素贝叶斯(尝试在深度学习。) Decision Trees and Random Trees Classifier

    决策树和随机树分类器

    Then comes Deep Learning, follow this order:

    然后是深度学习,请按照以下顺序进行:

    Natural Language Pre-Processing(Markov chains, Tfidf, Bag of Words, NLTK, N-grams)

    自然语言预处理(Markov链,Tfidf,单词袋,NLTK,N-gram) Neural Networks(Perceptron and Multi-Layer Perceptron)

    神经网络(感知器和多层感知器) Convolution Neural Networks & Transfer Learning( It’ll be beneficial to read some case studies in CNN)

    卷积神经网络与转移学习(阅读CNN中的一些案例研究会很有帮助) Recurrent Neural Networks

    递归神经网络 Word Embeddings(Word2Vec, Glove Vectors)

    词嵌入(Word2Vec,手套向量) Generative Adversarial Networks

    生成对抗网络 Learn how to work with Tensorflow and Keras

    了解如何与Tensorflow和Keras一起使用

    When you do all these topics, you have come a long way, and kudos to you!.

    当您完成所有这些主题时,您已经走了很长一段路,并为您感到荣幸!

    下一步是什么? (What’s next?)

    Now after getting done with the learning part. You can do three things to make use of your skills and sharpen it even more.

    现在完成学习部分。 您可以做三件事来利用自己的技能,并进一步提高自己的技能。

    Internships

    实习机会

    You can try looking for internships on Internshala(Not a promotion), or if you are in college, you can ask you are professors who are in the same field to take you under a research internship (this one is beneficial for those students interested in research or has plans to go for Masters).

    您可以尝试在Internshala上寻找实习机会(不是升职),或者如果您正在大学学习,可以要求您是同一领域的教授来进行研究实习(这对那些感兴趣的学生而言是有益的)研究或计划去读硕士。

    2. Projects

    2. 项目

    Start looking out for some cool projects on the Internet and start building it. There are many blogs regarding individual projects(with explanation), which will help you get started. One great practice would be if you have something on your mind other than the project on the Internet, work on it, and try collaborating with it others. The benefit of collaborating with others on a project is that you get to learn from your partners about his skills, his mindset of approaching different problems, etc.

    开始在互联网上寻找一些不错的项目并开始构建它。 关于个人项目的博客很多(有解释),可以帮助您入门。 一种不错的做法是,如果您除了Internet上的项目之外还想着其他事情,请进行研究,然后尝试与其他项目进行协作。 在项目上与他人合作的好处是,您可以向合作伙伴学习有关他的技能,他解决各种问题的心态等。

    3. Hackathons and Kaggle

    3. Hackathons和Kaggle

    Participate in hackathons, online, and offline both on Platforms like Hackerrank, HackerEarth, etc. You can also take part in competitions in Kaggle. Kaggle is the best website to compete and get clean datasets for your projects. If you are unable to score well in the competitions, don’t lose hope. By the time you will get the hang of it that how to approach any problem.

    可以在Hackerrank,HackerEarth等平台上在线和离线参加黑客马拉松。您还可以参加Kaggle的比赛。 Kaggle是竞争和获取您项目的干净数据集的最佳网站。 如果您无法在比赛中取得好成绩,请不要失去希望。 到那时,您将掌握如何解决任何问题的诀窍。

    给初学者的一些建议: (A Bit of Advice for Beginners:)

    It’s essential for learning the maths behind any ML/DL Algorithm, and most of the people don’t bother to understand that and implement these algorithms using sklearn only. That’s not a good practice. Learning the maths behind an algorithm makes it a lot clearer what’s going on behind the whole algorithm, then you will get a pretty good idea where to apply which algorithm or concept. And try to have more hands-on experience in data science doesn’t just only dive into the theory. This is one of the mistakes most people do these days. For excellent hands-on learning, refer to this book: Hands-On Machine Learning with Scikit-Learn and TensorFlow, by O’Reilly publications.

    这是学习任何ML / DL算法背后的数学知识必不可少的,而且大多数人不会费心去理解它,而仅使用sklearn来实现这些算法。 这不是一个好习惯。 学习算法背后的数学知识可以更清楚地了解整个算法的本质,然后您将很清楚地知道在哪里应用哪种算法或概念。 尝试在数据科学方面拥有更多动手经验,不仅会涉足理论领域。 这是当今大多数人犯的错误之一。 要获得出色的动手学习能力,请参考这本书:O'Reilly出版的《使用Scikit-Learn和TensorFlow进行动手机器学习》。 When you do any project, upload it on Github so you would be able to collaborate with other developers or data science enthusiasts; collaborating with others always is a good practice to learn more and more, and GitHub provides that platform to you. You can also contribute to other projects on GitHub by different people, this is often known as OpenSource Contribution(as a beginner in data science, you don’t need to get engaged in opensource, but this will come handy when you work for a company, do it whenever you want it’s optional) if you want to know more about the wonders of OpenSource Community refer to this blog:

    当您执行任何项目时,请将其上传到Github上,这样您就可以与其他开发人员或数据科学爱好者合作。 与他人合作始终是一种学习更多知识的好习惯,而GitHub为您提供了该平台。 您也可以由不同的人为GitHub上的其他项目做贡献,这通常被称为OpenSource Contribution(作为数据科学的初学者,您不需要参与开源,但这在您为公司工作时会很方便,请在需要时进行操作)。如果您想进一步了解OpenSource社区的奇迹,请访问以下博客:

    Lastly, I would like to suggest if anyone’s thinking of taking any course from any online coaching, don’t take it only for getting certified, do it to gain your knowledge and sharpen up your skills.

    最后,我想建议一下,如果有人想从任何在线教练那里学习任何课程,不要仅仅为了获得认证而学习,而是要获取知识并提高自己的技能。

    结语 (Wrapping Up)

    I have mentioned all the things that will help you get started with data science. I hope this article helped you understand what data science is and how to get started with it. Signing off, learn from your mistakes, Never lose hope and Keep on Learning.

    我已经提到了所有可以帮助您入门数据科学的事物。 我希望本文能帮助您了解什么是数据科学以及如何开始使用它。 签字批准,从错误中学习,永远不要失去希望,继续学习。

    翻译自: https://medium.com/@vermashivam0606/how-to-get-started-with-data-science-ea0643c5ee68

    Processed: 0.016, SQL: 8