上篇文章讲了CNN的基础使用方法,这篇文章讲其在真实项目上的应用,这篇文章参考了《Deep Learning with Python》这本书上的内容。

这次使用的是Kaggle的猫狗分类数据集,可以在https://www.kaggle.com/c/dogs-vs-cats上下载。

1562148582970


训练集各有猫狗12,500张,首先,将训练数据集换分为 train, validation, test 3个集合,分别有猫狗图片各 10000, 1250, 1250张,之后,分别将他们归类到不同的文件夹中。

首先,导入处理文件用到的工具库

import os, shutil
#原解压后的训练集的地址
origin_dir = '../input/dogs-vs-cats/train'
#新数据集的根目录
base_dir = '../input/dogs-vs-cats-new'

#创建新数据集的根目录
os.mkdir(base_dir)

#创建各个集合的各个分类的目录
train_dir = os.path.join(base_dir, 'train')
os.makedirs(train_dir)

val_dir = os.path.join(base_dir, 'val')
os.makedirs(val_dir)

test_dir = os.path.join(base_dir, 'test')
os.makedirs(test_dir)

train_cats_dir = os.path.join(train_dir,'cats')
os.mkdir(train_cats_dir)

train_dogs_dir = os.path.join(train_dir,'dogs')
os.mkdir(train_dogs_dir)

val_cats_dir = os.path.join(val_dir,'cats')
os.mkdir(val_cats_dir)

val_dogs_dir = os.path.join(val_dir,'dogs')
os.mkdir(val_dogs_dir)

test_cats_dir = os.path.join(test_dir,'cats')
os.mkdir(test_cats_dir)

test_dogs_dir = os.path.join(test_dir,'dogs')
os.mkdir(test_dogs_dir)

#复制图片文件到对应的文件夹
fnames = ['cat.{}.jpg'.format(i) for i in range(10000)]
for fname in fnames:
    src = os.path.join(origin_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src,dst)

fnames = ['dog.{}.jpg'.format(i) for i in range(10000)]
for fname in fnames:
    src = os.path.join(origin_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src,dst)

fnames = ['cat.{}.jpg'.format(i) for i in range(10000, 11250)]
for fname in fnames:
    src = os.path.join(origin_dir, fname)
    dst = os.path.join(val_cats_dir, fname)
    shutil.copyfile(src,dst)

fnames = ['dog.{}.jpg'.format(i) for i in range(10000, 11250)]
for fname in fnames:
    src = os.path.join(origin_dir, fname)
    dst = os.path.join(val_dogs_dir, fname)
    shutil.copyfile(src,dst)

fnames = ['cat.{}.jpg'.format(i) for i in range(11250, 12500)]
for fname in fnames:
    src = os.path.join(origin_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src,dst)

fnames = ['dog.{}.jpg'.format(i) for i in range(11250, 12500)]
for fname in fnames:
    src = os.path.join(origin_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src,dst)

搭建模型

接下来,要搭建CNN模型,共3个卷积层,后面接包含512个神经元的隐藏层,由于是二分类问题,直接用sigmoid激活函数把输出压缩到[0,1]之间。

from keras import layers, models, optimizers
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                       input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.summary()


model.compile(loss='binary_crossentropy',
             optimizer=optimizers.RMSprop(lr=1e-4),
             metrics=['acc'])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

数据预处理

数据预处理的步骤如下:

(1) 读取图片文件;

(2)将JPEG文件解码为RGB像素网格

(3)将这些像素网格转换为浮点数张量

(4)将像素值(0~255)压缩到[0,1]区间

这些步骤看似比较麻烦,但可以用Keras自动完成,Keras有一个图像处理辅助工具的模块,位于keras.preprocessing.image,其中有一个ImageDataGenerator类,可以快速创建Python Generator,自动预处理张量。

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255)# 将所有像素值乘1/255
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_dir,
        target_size=(150, 150),#所有图像尺寸调整为150*150
        batch_size=50,
        class_mode='binary')#一共就两个分类,所以使用二进制标签,也就是0和1

val_generator = test_datagen.flow_from_directory(
        val_dir,
        target_size=(150, 150),
        batch_size=50,
        class_mode='binary')

训练

在训练的时候,可以使用fit_generator来训练数据,它需要提供的参数如下:

  • train_generator:就是一个Python生成器,用来不停生成训练数据和对应的target
  • steps_per_epoch:生成多少次样本算是一个epoch,我们这里设置的batch_size=50,训练样本为20000个,也就是10000/50=400
  • validation_data:也是一个生成器,用来生成验证集的数据和target
  • validation_steps:与steps_per_epoch相同,用来设定验证集生成多少次数据算是一个epoch,这里2500/50=50
history = model.fit_generator(
        train_generator,
        steps_per_epoch=400,
        epochs=30,
        validation_data=val_generator,
        validation_steps=50)
Epoch 1/30
400/400 [==============================] - 24s 60ms/step - loss: 0.3115 - acc: 0.8621 - val_loss: 0.3415 - val_acc: 0.8400
Epoch 2/30
400/400 [==============================] - 23s 58ms/step - loss: 0.2866 - acc: 0.8780 - val_loss: 0.3496 - val_acc: 0.8462
...
Epoch 30/30
400/400 [==============================] - 24s 59ms/step - loss: 0.0218 - acc: 0.9929 - val_loss: 0.4752 - val_acc: 0.8838

我们可以保存模型到本地

model.save('cat_and_dogs.h5')

可视化

我们先看下在训练集和验证集上的准确率变化怎么样

import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(acc, 'g', label='training acc')
plt.plot(val_acc, 'r', label='validation acc')
plt.title("Training and Validation accuracy")
plt.legend()

1562208682145

再看下损失函数的变化

plt.figure()

plt.plot(loss, 'g', label='training loss')
plt.plot(val_loss, 'r', label='validation loss')
plt.title("Training and Validation loss")
plt.legend()

1562208724060

可以看到,训练集上的模型效果确是在不断变好,但在测试集上却没有什么改变,有一种解决过拟合的方法是增加样本数,可以尝试使用数据增强来解决。

数据增加(Data Augmentation)

我们可以使用数据增强来从现有数据中生成更多的样本,通过把现有图像宣传扭曲、缩放、随机错切、翻转,让模型能够学习到更多的图像,避免过拟合。

其中,参数大体一下:

  • rotation_range:表示图像随机旋转的角度范围
  • width_shift 与 height_shift:图像在水平或者垂直方向上平抑的范围
  • shear_range:随机错切变换的角度
  • zoom_range:图像随机缩放的范围
  • horizontal_flip:随机将一张图片水平翻转
  • fill_mode:用于填充新创建像素的方法,可以来自旋转或者宽度/高度的平移
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

看一下效果如何

from keras.preprocessing import image

fnames = [os.path.join(train_cats_dir, fname) for 
          fname in os.listdir(train_cats_dir)]
img_path = fnames[3]
img = image.load_img(img_path, target_size=(150, 150))
x = image.img_to_array(img)

x = x.reshape((1, ) + x.shape)

i = 0
for batch in datagen.flow(x, batch_size=1):
    plt.subplot(2,2,i + 1)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break
plt.show()

1562209425861

再次训练

这次,我们重新建立模型,这次不仅使用数据增加防止过拟合,还尝试使用BatchNormalization以及Dropout来更大程度地消除过拟合。

关于BatchNormalization,以及其他归一化,可以看我之前的文章:各种 Normalization 的简介及区分

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                       input_shape=((150, 150, 3))))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.BatchNormalization())

model.add(layers.Flatten())
model.add(layers.Dropout(0.5))

model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.RMSprop(lr=1e-4),
             metrics=['acc'])

再次数据生成器

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# 注意:验证集不要做数据增强,
# 否则性能就是在数据增强后的集合上衡量的,
# 很可能会失真
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary')

val_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary')

开始训练:

history = model.fit_generator(
    train_generator,
    steps_per_epoch= 400,
    epochs=50,
    validation_data=val_generator,
    validation_steps=50)
Epoch 1/50
400/400 [==============================] - 87s 218ms/step - loss: 0.7360 - acc: 0.5990 - val_loss: 0.6284 - val_acc: 0.6725
Epoch 2/50
400/400 [==============================] - 100s 251ms/step - loss: 0.6832 - acc: 0.6328 - val_loss: 0.6177 - val_acc: 0.6698
...
Epoch 50/50
400/400 [==============================] - 65s 162ms/step - loss: 0.3750 - acc: 0.8309 - val_loss: 0.3060 - val_acc: 0.8769

这次再看一下准确率和损失的变化过程

import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(acc, 'g', label='training acc')
plt.plot(val_acc, 'r', label='validation acc')
plt.title("Training and Validation accuracy")
plt.legend()

1562214725369

plt.figure()

plt.plot(loss, 'g', label='training loss')
plt.plot(val_loss, 'r', label='validation loss')
plt.title("Training and Validation loss")
plt.legend()

可以看到,模型在训练集和验证集上的效果都在进步,没有任何过拟合!

最后修改:2021 年 06 月 01 日 02 : 15 PM
如果觉得我的文章对你有用,请随意赞赏