개발

Deep Residual Learning for Image Recognition: ResNet 시각화 리뷰

Hugh Q Lee 2025. 3. 25. 20:56

ResNet을 제시한 논문 "Deep Residual Learning for Image Recognition"을 params# 측면에서의 주요 컨셉을 코드 및 시각화로 리뷰합니다.

Figure 3. Example network architectures for ImageNet (He et al., 2016).

$$\text{conv_params}=\text{input_ch}×\text{output_ch}×\text{kernel_w}×\text{kernel_h}+\text{bias}$$

1. 첫 레이어

비교 VGG's ResNet's
params# 260,160 9,472
# VGG

import torch.nn as nn

module = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding='same'),
    nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding='same'),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding='same'),
    nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding='same'),
    nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
            Conv2d-2         [-1, 64, 224, 224]          36,928
         MaxPool2d-3         [-1, 64, 112, 112]               0
            Conv2d-4        [-1, 128, 112, 112]          73,856
            Conv2d-5        [-1, 128, 112, 112]         147,584
         MaxPool2d-6          [-1, 128, 56, 56]               0
================================================================
Total params: 260,160
# ResNet

import torch.nn as nn

module = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3),
    nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,472
         MaxPool2d-2           [-1, 64, 56, 56]               0
================================================================
Total params: 9,472

(좌) VGG's, (우) ResNet's 첫 레이어 until output_size (56,56)

$$ \begin{aligned} & \textbf{VGG's parameters:} \\[0.5em] 260,160 = &\; (3 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 128 \times 3 \times 3 + 128) \\ &+ (128 \times 128 \times 3 \times 3 + 128) \end{aligned} $$ $$ \begin{aligned} & \textbf{ResNet's parameters:} \\[0.5em] 9,472 = &\; (3 \times 64 \times 7 \times 7 + 64) \end{aligned} $$

ResNet은 VGG에 비해 성능 지표(accuracy 등)가 더 우수할 뿐 아니라, Bottleneck 구조 덕분에 파라미터 수와 연산량이 줄어들어 더 효율적인 계산이 가능하며, 결과적으로 더 빠른 속도를 보일 수 있습니다.

2. Bottleneck 레이어

사실 Figure 3에서 표시한 34-layers에는 `x + y`를 설명할 뿐, Bottleneck을 설명할 수 없지만 params# 측면에서는 bottleneck이 더 두드러져 적용하여 설명하였습니다.

비교 plain (ConvBlock) (숏컷 없음) BottleneckBlock (실선) BottleneckBlock (점선)
params# 73,856 45,248 57,728
Residual X O O
Bottleneck X O (논문은  X) O (논문은 X)
차원 변환 64 -> 64 64 -> 64 64 -> 128
import torch.nn as nn

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(ConvBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return x
        
module = ConvBlock(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding='same')

>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 56, 56]          36,928
            Conv2d-2           [-1, 64, 56, 56]          36,928
================================================================
Total params: 73,856
import torch.nn as nn

class BottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(BottleneckBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.conv3 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        return x + y
        
module = BottleneckBlock(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding='same')

>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 56, 56]           4,160
            Conv2d-2           [-1, 64, 56, 56]          36,928
            Conv2d-3           [-1, 64, 56, 56]           4,160
================================================================
Total params: 45,248

(좌) plain's, (우) bottleneck's 필터 비교, 부피 차이가 대략적인 params# 차이.

$$ \begin{aligned} \textbf{plain's parameters:} \\[0.5em] 73,856 = &\; (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \end{aligned} $$ $$ \begin{aligned} \textbf{bottleneck's parameters:} \\[0.5em] 45,248 = &\; (64 \times 64 \times 1 \times 1 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 64 \times 1 \times 1 + 64) \end{aligned} $$

점선으로 표시된 숏컷은 차원 수가 바뀌는 경우를 뜻합니다. (input 64 -> output 128)

input과 output의 dimension이 달라져, 차원 수를 맞춰줄 `conv_shortcut`으로 input `x`를 전처리합니다.

import torch.nn as nn

class BottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(BottleneckBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.conv3 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels*2, kernel_size=1, stride=1, padding=0)
        self.conv_shortcut = nn.Conv2d(in_channels=in_channels, out_channels=out_channels*2, kernel_size=1, stride=stride, padding=0)

    def forward(self, x):
        y = self.conv1(x)
        y = self.conv2(y)
        y = self.conv3(y)

        x = self.conv_shortcut(x)
        return x + y
        
 module = BottleneckBlock(in_channels=64, out_channels=64, kernel_size=3, stride=2, padding=1)
 
 >
 ----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 56, 56]           4,160
            Conv2d-2           [-1, 64, 28, 28]          36,928
            Conv2d-3          [-1, 128, 28, 28]           8,320
            Conv2d-4          [-1, 128, 28, 28]           8,320
================================================================
Total params: 57,728

차원이 달라져야 하는 경우, BottleneckBlock에서의 features' dimension.

$$ \begin{aligned} \textbf{bottleneck's parameters:} \\[0.5em] 57,728 = &\; (64 \times 64 \times 1 \times 1 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 128 \times 1 \times 1 + 128) \\ &+ (64 \times 128 \times 1 \times 1 + 128) \end{aligned} $$

728x90