基于神经网络的分类
Every day we hear different sounds and it is part of our life. Humans can differentiate between sounds easily but how cool it will be if computer can also classify the sounds into categories.
每天我们听到不同的声音,这是我们生活的一部分。 人类可以轻松地区分声音,但是如果计算机也可以将声音分类,声音将多么酷。
In this blog post, we’ll learn techniques for classifying urban sounds into categories using machine learning with neural networks. The dataset is taken from a competition in analytics vidya called Urban Sound. This dataset contains 8732 labelled sound excerpts of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music.
在此博客文章中,我们将学习使用神经网络机器学习将城市声音分类的技术。 该数据集取材自名为Urban Sound的分析vidya竞赛。 该数据集包含来自10个类别的城市声音的8732个标记的声音摘录:空调,汽车喇叭,儿童游戏,狗吠,钻探,enginge_idling,gun_shot,手提钻,警笛和street_music。
I will use the python librosa library to extract numerical features from audio clips and use those features to train a neural network model.
我将使用python librosa库从音频片段中提取数字特征,并使用这些特征来训练神经网络模型。
First, let us get all the required libraries,
首先,让我们获取所有必需的库,
import IPython.display as ipdimport osimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport librosafrom tqdm import tqdmfrom sklearn.preprocessing import StandardScalerfrom keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activationfrom keras.optimizers import AdamThe dataset is available in a google drive, it can be downloaded from here.
该数据集可在Google驱动器中使用,也可以从此处下载。
The dataset contain train, test folder in which sound excerpts are saved and there are train.csv and test.csv which have labels of each sound excerpts. I will be using only train folder for training, validation and testing, it contains 5435 labelled sounds.
数据集包含train,test文件夹,在其中保存了声音摘录,还有train.csv和test.csv,其中包含每个声音摘录的标签。 我将仅使用Train文件夹进行培训,验证和测试,其中包含5435个标记的声音。
Now let’s read the train.csv which contains labelled information about sound excerpts.
现在,让我们阅读train.csv,其中包含有关声音摘录的标记信息。
data=pd.read_csv('/content/drive/MyDrive/colab_notebook/train.csv')data.head()#To see the datasetLet’s hear any random sound from the dataset,
让我们听听数据集中的任何随机声音,
ipd.Audio(‘/content/drive/My Drive/colab_notebook/Train/123.wav’)Now, the main step is to extract features from the dataset. For this, I will be using librosa library. It is a good library to use with audio files.
现在,主要步骤是从数据集中提取特征。 为此,我将使用librosa库。 这是与音频文件一起使用的很好的库。
Using librosa library, I will be extracting four features from the audio files. These features are Mel-frequency cepstral coefficients (MFCCs), tonnetz, mel-scaled spectrogram and chromagram from a waveform.
使用librosa库,我将从音频文件中提取四个功能。 这些特征包括梅尔频率倒谱系数(MFCC),tonnetz,梅尔缩放频谱图和色谱图。
mfc=[]chr=[]me=[]ton=[]lab=[]for i in tqdm(range(len(data))): f_name='/content/drive/My Drive/colab_notebook/Train/'+str(data.ID[i])+'.wav' X, s_rate = librosa.load(f_name, res_type='kaiser_fast') mf = np.mean(librosa.feature.mfcc(y=X, sr=s_rate).T,axis=0) mfc.append(mf) l=data.Class[i] lab.append(l)try: t = np.mean(librosa.feature.tonnetz( y=librosa.effects.harmonic(X), sr=s_rate).T,axis=0) ton.append(t)except: print(f_name) m = np.mean(librosa.feature.melspectrogram(X, sr=s_rate).T,axis=0) me.append(m) s = np.abs(librosa.stft(X)) c = np.mean(librosa.feature.chroma_stft(S=s, sr=s_rate).T,axis=0) chr.append(c)I have got 186 features for each audio files with their respective labels.
对于每个带有各自标签的音频文件,我都有186个功能。
After extracting features from the audio files save the features because it will take a lot of time to extract features.
从音频文件中提取特征后,请保存特征,因为提取特征会花费很多时间。
mfcc = pd.DataFrame(mfc)mfcc.to_csv('/content/drive/My Drive/colab_notebook/mfc.csv', index=False)chrr = pd.DataFrame(chr)chrr.to_csv('/content/drive/My Drive/colab_notebook/chr.csv', index=False)mee = pd.DataFrame(me)mee.to_csv('/content/drive/My Drive/colab_notebook/me.csv', index=False)tonn = pd.DataFrame(ton)tonn.to_csv('/content/drive/My Drive/colab_notebook/ton.csv', index=False)la = pd.DataFrame(lab)la.to_csv('/content/drive/My Drive/colab_notebook/labels.csv', index=False)Concatenate features into one array so that it can be passed to the model.
将要素连接到一个数组中,以便可以将其传递给模型。
features = []for i in range(len(ton)): features.append(np.concatenate((me[i], mfc[i], ton[i], chr[i]), axis=0))Encode the labels so that model can understand.
编码标签,以便模型可以理解。
la = pd.get_dummies(lab)label_columns=la.columns #To get the classestarget = la.to_numpy() #Convert labels to numpy arrayNow normalize the features so that gradient descents can converge more quickly.
现在对特征进行归一化,以便梯度下降可以更快地收敛。
tran = StandardScaler()features_train = tran.fit_transform(features)Now I will create train, validation and test dataset.
现在,我将创建训练,验证和测试数据集。
feat_train=features_train[:4434]target_train=target[:4434]y_train=features_train[4434:5330]y_val=target[4434:5330]test_data=features_train[5330:]test_label=lab['0'][5330:]print("Training",feat_train.shape)print(target_train.shape)print("Validation",y_train.shape)print(y_val.shape)print("Test",test_data.shape)print(test_label.shape)Next step is to build the model.
下一步是建立模型。
model = Sequential()model.add(Dense(186, input_shape=(186,), activation = 'relu'))model.add(Dense(256, activation = 'relu'))model.add(Dropout(0.6))model.add(Dense(128, activation = 'relu'))model.add(Dropout(0.5))model.add(Dense(10, activation = 'softmax'))model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')This is the final model which will be used for training.
这是用于训练的最终模型。
history = model.fit(feat_train, target_train, batch_size=64, epochs=30, validation_data=(y_train, y_val))The model will train for epoch =30 and has a batch size of 64.
该模型将针对纪元= 30进行训练,并且批次大小为64。
After training the model it gives the validation accuracy of 92%.
训练模型后,其验证准确性为92%。
Now let’s see how our model will perform on test dataset.
现在,让我们看看我们的模型将如何在测试数据集上执行。
predict = model.predict_classes(test_data) #To predict labelsThis will get the values now to get the prediction as classes.
现在将获取值以将预测作为类。
prediction=[]for i in predict: j=label_columns[i] prediction.append(j)Prediction has 104 test label, and now calculate how many are correctly predicted.
预测具有104个测试标签,现在可以计算正确预测的数量。
k=0for i, j in zip(test_label,prediction): if i==j: k=k+1Out of 104 labels, this model has correctly predicted 94 labels, which is very good.
在104个标签中,该模型正确预测了94个标签,非常好。
In this blog, we have discussed how to extract features from audio files using librosa library and then build a model to classify audio files in different classes.
在此博客中,我们讨论了如何使用librosa库从音频文件中提取特征,然后构建模型以将音频文件分类为不同的类。
All the code in this article resides on this Github link:
本文中的所有代码都位于以下Github 链接上 :
翻译自: https://towardsdatascience.com/urban-sound-classification-using-neural-networks-9b6fcd8a9150
基于神经网络的分类
相关资源:基于卷积神经网络的声音识别