决策树

    科技2022-08-19  129

    决策树的使用

    #%% import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn import datasets import matplotlib.pyplot as plt %matplotlib inline from sklearn import tree from sklearn.model_selection import train_test_split #%% X,y = datasets.load_iris(True) X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1024) clf = DecisionTreeClassifier(criterion="entropy") clf.fit(X_train,y_train) y_ = clf.predict(X_test) from sklearn.metrics import accuracy_score accuracy_score(y_test,y_) #%%

    sklearn使用的步骤 1.数据清洗 2.特征工程 3.使用模型进行训练 4.模型参数调优

    随机森林

    随机森林由多棵决策树构建(集成算法) 随机森林的随机在于:随机抽样,子样本不同 在随机森林中,集合中的每棵树都是根据训练集中的替换样本(即引导样本)构建的。 此外,在树的构造中拆分每个节点时,可以从所有输入要素或size的随机子集中找到最佳拆分。(信息增益最大) 随机性来源的目的是减少森林估计量的方差。单个决策树通常表现出较高的方差并且倾向于过拟合。通过取预测的平均值,可以消除一些误差。

    import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn import datasets import pandas as pd from sklearn.model_selection import train_test_split wine = datasets.load_wine() wine X = wine['data'] y = wine['target'] X.shape X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2) clf = RandomForestClassifier() clf.fit(X_train,y_train) y_= clf.predict(X_test) from sklearn.metrics import accuracy_score accuracy_score(y_test,y_)
    Processed: 0.008, SQL: 9