VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE -SCALE IMAGE RECOGNITION arXiv:1409.1556v6 [cs.CV] 10 Apr 2015 Karen Simonyan∗ & Andrew Zisserman+ Visual Geometry Group, Department of Engineering Science, University of Oxford Table 1: ConvNet configurations (shown in columns). The depth of the configurations increases from the left (A) to the right (E), as more layers are added (the added layers are shown in bold). The convolutional layer parameters are denoted as “convhreceptive field sizei-hnumber of channelsi”. The ReLU activation function is not shown for brevity. ![]() vgg16 D conv3-64:核为3,卷积将特征变换到64,或者说是从64个维度提取数据特征 conv3-64: maxpool:使用maxpool收缩特征图 conv3-128:核为3,卷积将特征变换到128,或者说是从128个维度提取数据特征 总体来说,VGG是用于大量图像识别的,从A到E参数变多,可以识别的图像量就越大 ![]() 卷积的层数: 卷积与全连接算一层, maxpool,BN,RuLE等不算层,算组件 conv1-512与conv3-512 1与3表示卷积核为1与3: conv1-512:kernel_size=1,stripe=1,padding=0 conv3-512:kernel_size=3,stripe=1,padding=1 通道变换,in_channels,out_channels,控制特征的变换 卷积核控制滑动取窗,由kernel_size,stripe,padding三个参数控制 其他 LRN是类似于BN功能的一个组件,但没有发展起来 vgg16,19较常用 |
import torch from torch import nn class VggNet(nn.Module): """ 自定义Vgg网络 """ def __init__(self): super(VggNet, self).__init__() # 提取特征 self.features = nn.Sequential( # stage1 nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=64), nn.ReLU(), nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=64), nn.ReLU(), # maxpool nn.MaxPool2d(kernel_size=2, stride=2, padding=0), # stage2 nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=128), nn.ReLU(), nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=128), nn.ReLU(), # maxpool nn.MaxPool2d(kernel_size=2, stride=2, padding=0), # stage3 nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=256), nn.ReLU(), nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=256), nn.ReLU(), nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=256), nn.ReLU(), nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=256), nn.ReLU(), # maxpool nn.MaxPool2d(kernel_size=2, stride=2, padding=0), # stage4 nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), # maxpool nn.MaxPool2d(kernel_size=2, stride=2, padding=0), # stage5 nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(num_features=512), nn.ReLU(), # maxpool nn.MaxPool2d(kernel_size=2, stride=2, padding=0), ) # 统一形状 self.avgpool = nn.AdaptiveAvgPool2d(output_size=(7, 7)) # 做分类 self.classifier = nn.Sequential( nn.Dropout(p=0.5), nn.Flatten(), nn.Linear(in_features=25088, out_features=4096), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(in_features=4096, out_features=4096), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(in_features=4096, out_features=1000) ) def forward(self, x): x = self.features(x) x = self.avgpool(x) o = self.classifier(x) return o imgs = torch.randn(3,3,224,224) model = VggNet() model(imgs).shape 参数量 sum(x.numel() for x in model.parameters()) 143678248 |
主体流程 卷积提取特征:卷啊卷,激活,池化 # 统一形状 self.avgpool = nn.AdaptiveAvgPool2d(output_size=(7, 7)) 全连接分类 Dropout 因为信息量大,dropout起到弱化过拟合的作用 信息量 VGG给出一种提示/暗示/方式,处理数据的信息量大,那么对应的网络层数也多 结构顺序 同时,VGG进一步遵从了 卷积 -- BN - RELU -- MAXPOOL 这种结构顺序 但个人(仅个人观点),认为 卷积 -- MAXPOOL - RELU - BN 更符合各层的含义 |
|
|