AlexNet 笔记

2025-12-12 17:50:40 4088

2023 November 24 图像分类 AlexNet 笔记 论文精读 The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective.

ReLU Nonlinearity

Training on Multiple GPUs

切图,怎么在多个GPU上训练的细节

Local Response Normalization(没用

Overall Architecture

Reducing Overfitting

Data Augmentation

PCA(通道颜色上的变换)、抠图成224*224

Dropout(50%)

Details of learning

用的SGD调参(虽然不够稳定,但里面的噪音对模型的泛化有好处,现在变得主流)

\[v_{i+1}:=0.9*v_i-0.0005*\epsilon*w_i-\epsilon*<\frac{\partial L}{\partial w}|_{w_i}>_{D_i}\\ w_{i+1}:=w_i+v_i+1\] 0.9是momentum(适用于优化的表面非常不平滑,以免掉坑),0.0005是weight decay

初始化:均值为0,方差0.01(Bert0.02)的高斯分布

The learning rate was initialized at 0.01 and reduced three times prior to termination(不动了降低十倍)

现在学习率用cos函数(更平滑)

补充知识 激活函数

Normalization 深度学习常用的 Normalization 方法:BN、LN、IN、GN-腾讯云开发者社区-腾讯云 (tencent.com)

Data Augmentation [方法汇总 Pytorch实现常见数据增强(Data Augmentation)【附源码】_深度学习中数据增强代码-CSDN博客](https://blog.csdn.net/qq_42589613/article/details/141367923) Dropout Dropout的深入理解(基础介绍、模型描述、原理深入、代码实现以及变种)-CSDN博客

SGD 【优化器】(一) SGD原理 & pytorch代码解析_sgd优化器-CSDN博客

weight decay(L2正则项) 权重衰减weight_decay参数从入门到精通_weight decay-CSDN博客

网络介绍: ImageNet2012竞赛第一名;他标志着DNN深度学习革命的开始; 网络包含5个卷积层+3个全连接层; 60M个参数+650K个神经元; 2个分组——>2个GPU(3G,受限于当时硬件),训练时长一周,50x加速; 引入的新技术有: ReLU – 非线性激活; Max pooling – 池化; Dropout regularization – 用于防止过拟合,在判断决策的FC层使用; 模型网络框图: 两个GPU:

输入图片大小理论上应为227X227X3(大小为227*227的RGB图)

每一层的结构:

其中LRN为局部响应归一化,具体解释可参考文章: http://blog.csdn.net/hduxiejun/article/details/70570086

AlexNet网络结构详解(含各层维度大小计算过程)与PyTorch实现-CSDN博客

AlexNet实现 PyTorch——AlexNet实现(附完整代码)-CSDN博客

加载数据 使用“Fashion-MNIST”数据集。读取数据的时候我们额外做了一步将图像高和宽扩大到AlexNet使用的图像高和宽224。这个可以通过torchvision.transforms.Resize实例来实现。也就是说,我们在ToTensor实例前使用Resize实例,然后使用Compose实例来将这两个变换串联。

def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):

if sys.platform.startswith('win'):

num_workers = 0

else:

num_workers = 4

trans = []

if resize:

trans.append(torchvision.transforms.Resize(size=resize))

trans.append(torchvision.transforms.ToTensor())

transform = torchvision.transforms.Compose(trans)

mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)

mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)

test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

return train_iter, test_iter

batch_size = 128

train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)

构建模型class AlexNet(nn.Module):

def __init__(self):

super(AlexNet, self).__init__()

self.conv = nn.Sequential(

nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding

nn.ReLU(),

nn.MaxPool2d(3, 2), # kernel_size, stride

# 减小卷积窗口,使用填充为2来使得输入与输出的高和宽一致,且增大输出通道数

nn.Conv2d(96, 256, 5, 1, 2),

nn.ReLU(),

nn.MaxPool2d(3, 2),

# 连续3个卷积层,且使用更小的卷积窗口。除了最后的卷积层外,进一步增大了输出通道数。

nn.Conv2d(256, 384, 3, 1, 1),

nn.ReLU(),

nn.Conv2d(384, 384, 3, 1, 1),

nn.ReLU(),

nn.Conv2d(384, 256, 3, 1, 1),

nn.ReLU(),

nn.MaxPool2d(3, 2)

)

self.fc = nn.Sequential(

nn.Linear(256*5*5, 4096),

nn.ReLU(),

nn.Dropout(0.5),

nn.Linear(4096, 4096),

nn.ReLU(),

nn.Dropout(0.5),

nn.Linear(4096, 10),

)

def forward(self, img):

feature = self.conv(img)

output = self.fc(feature.view(img.shape[0], -1))

return output

损失函数 损失函数使用交叉熵损失。

loss = torch.nn.CrossEntropyLoss()

优化方法 优化方法使用Adam算法。

optimizer = torch.optim.Adam(net.parameters(), lr=lr)

完整代码import time

import torch

from torch import nn, optim

import torchvision

import sys

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):

if sys.platform.startswith('win'):

num_workers = 0

else:

num_workers = 4

trans = []

if resize:

trans.append(torchvision.transforms.Resize(size=resize))

trans.append(torchvision.transforms.ToTensor())

transform = torchvision.transforms.Compose(trans)

mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)

mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)

test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

return train_iter, test_iter

batch_size = 128

train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)

class AlexNet(nn.Module):

def __init__(self):

super(AlexNet, self).__init__()

self.conv = nn.Sequential(

nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding

nn.ReLU(),

nn.MaxPool2d(3, 2), # kernel_size, stride

# 减小卷积窗口,使用填充为2来使得输入与输出的高和宽一致,且增大输出通道数

nn.Conv2d(96, 256, 5, 1, 2),

nn.ReLU(),

nn.MaxPool2d(3, 2),

# 连续3个卷积层,且使用更小的卷积窗口。除了最后的卷积层外,进一步增大了输出通道数。

nn.Conv2d(256, 384, 3, 1, 1),

nn.ReLU(),

nn.Conv2d(384, 384, 3, 1, 1),

nn.ReLU(),

nn.Conv2d(384, 256, 3, 1, 1),

nn.ReLU(),

nn.MaxPool2d(3, 2)

)

self.fc = nn.Sequential(

nn.Linear(256*5*5, 4096),

nn.ReLU(),

nn.Dropout(0.5),

nn.Linear(4096, 4096),

nn.ReLU(),

nn.Dropout(0.5),

nn.Linear(4096, 10),

)

def forward(self, img):

feature = self.conv(img)

output = self.fc(feature.view(img.shape[0], -1))

return output

net = AlexNet()

def evaluate_accuracy(data_iter, net, device=None):

if device is None and isinstance(net, torch.nn.Module):

# 如果没指定device就使用net的device

device = list(net.parameters())[0].device

acc_sum, n = 0.0, 0

with torch.no_grad():

for X, y in data_iter:

net.eval() # 评估模式, 这会关闭dropout

acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()

net.train() # 改回训练模式

n += y.shape[0]

return acc_sum / n

def train(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):

net = net.to(device)

print("training on ", device)

loss = torch.nn.CrossEntropyLoss()

for epoch in range(num_epochs):

train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()

for X, y in train_iter:

X = X.to(device)

y = y.to(device)

y_hat = net(X)

l = loss(y_hat, y)

optimizer.zero_grad()

l.backward()

optimizer.step()

train_l_sum += l.cpu().item()

train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()

n += y.shape[0]

batch_count += 1

test_acc = evaluate_accuracy(test_iter, net)

print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'

% (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

lr, num_epochs = 0.001, 5

optimizer = torch.optim.Adam(net.parameters(), lr=lr)

train(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

pytorch实现AlexNet(含完整代码)_pytorch alexnet-CSDN博客

AlexNet总结

ResNet的创新主要集中在解决梯度消失问题上,允许构建非常深的网络,从而提高性能。 VGGNet采用了相对简单的网络结构,通过卷积层的堆叠和小卷积核的使用提供了良好的性能。 AlexNet是深度学习的先驱,证明了深度卷积神经网络的潜力。它的创新包括使用多GPU、卷积层和Dropout正则化。 Discussion 神经网络的公平性?

Copyright © 2022 硬核游戏活动情报站 All Rights Reserved.