Deep Residual Learning for Image Recognition: ResNet 시각화 리뷰

개발

Deep Residual Learning for Image Recognition: ResNet 시각화 리뷰

Hugh Q Lee 2025. 3. 25. 20:56

ResNet을 제시한 논문 "Deep Residual Learning for Image Recognition"을 params# 측면에서의 주요 컨셉을 코드 및 시각화로 리뷰합니다.

Figure 3. Example network architectures for ImageNet (He et al., 2016).

$$\text{conv_params}=\text{input_ch}×\text{output_ch}×\text{kernel_w}×\text{kernel_h}+\text{bias}$$

1. 첫 레이어

비교	VGG's	ResNet's
params#	260,160	9,472

# VGG

import torch.nn as nn

module = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding='same'),
    nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding='same'),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding='same'),
    nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding='same'),
    nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
            Conv2d-2         [-1, 64, 224, 224]          36,928
         MaxPool2d-3         [-1, 64, 112, 112]               0
            Conv2d-4        [-1, 128, 112, 112]          73,856
            Conv2d-5        [-1, 128, 112, 112]         147,584
         MaxPool2d-6          [-1, 128, 56, 56]               0
================================================================
Total params: 260,160

# ResNet

import torch.nn as nn

module = nn.Sequential(
    nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3),
    nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,472
         MaxPool2d-2           [-1, 64, 56, 56]               0
================================================================
Total params: 9,472

(좌) VGG's, (우) ResNet's 첫 레이어 until output_size (56,56)

$$ \begin{aligned} & \textbf{VGG's parameters:} \\[0.5em] 260,160 = &\; (3 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 128 \times 3 \times 3 + 128) \\ &+ (128 \times 128 \times 3 \times 3 + 128) \end{aligned} $$ $$ \begin{aligned} & \textbf{ResNet's parameters:} \\[0.5em] 9,472 = &\; (3 \times 64 \times 7 \times 7 + 64) \end{aligned} $$

ResNet은 VGG에 비해 성능 지표(accuracy 등)가 더 우수할 뿐 아니라, Bottleneck 구조 덕분에 파라미터 수와 연산량이 줄어들어 더 효율적인 계산이 가능하며, 결과적으로 더 빠른 속도를 보일 수 있습니다.

2. Bottleneck 레이어

사실 Figure 3에서 표시한 34-layers에는 `x + y`를 설명할 뿐, Bottleneck을 설명할 수 없지만 params# 측면에서는 bottleneck이 더 두드러져 적용하여 설명하였습니다.

비교	plain (ConvBlock) (숏컷 없음)	BottleneckBlock (실선)	BottleneckBlock (점선)
params#	73,856	45,248	57,728
Residual	X	O	O
Bottleneck	X	O (논문은 X)	O (논문은 X)
차원 변환	64 -> 64	64 -> 64	64 -> 128

import torch.nn as nn

class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(ConvBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return x
        
module = ConvBlock(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding='same')

>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 56, 56]          36,928
            Conv2d-2           [-1, 64, 56, 56]          36,928
================================================================
Total params: 73,856

import torch.nn as nn

class BottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(BottleneckBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.conv3 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        return x + y
        
module = BottleneckBlock(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding='same')

>
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 56, 56]           4,160
            Conv2d-2           [-1, 64, 56, 56]          36,928
            Conv2d-3           [-1, 64, 56, 56]           4,160
================================================================
Total params: 45,248

(좌) plain's, (우) bottleneck's 필터 비교, 부피 차이가 대략적인 params# 차이.

$$ \begin{aligned} \textbf{plain's parameters:} \\[0.5em] 73,856 = &\; (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \end{aligned} $$ $$ \begin{aligned} \textbf{bottleneck's parameters:} \\[0.5em] 45,248 = &\; (64 \times 64 \times 1 \times 1 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 64 \times 1 \times 1 + 64) \end{aligned} $$

점선으로 표시된 숏컷은 차원 수가 바뀌는 경우를 뜻합니다. (input 64 -> output 128)

input과 output의 dimension이 달라져, 차원 수를 맞춰줄 `conv_shortcut`으로 input `x`를 전처리합니다.

import torch.nn as nn

class BottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
        super(BottleneckBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0)
        self.conv2 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.conv3 = nn.Conv2d(in_channels=out_channels, out_channels=out_channels*2, kernel_size=1, stride=1, padding=0)
        self.conv_shortcut = nn.Conv2d(in_channels=in_channels, out_channels=out_channels*2, kernel_size=1, stride=stride, padding=0)

    def forward(self, x):
        y = self.conv1(x)
        y = self.conv2(y)
        y = self.conv3(y)

        x = self.conv_shortcut(x)
        return x + y
        
 module = BottleneckBlock(in_channels=64, out_channels=64, kernel_size=3, stride=2, padding=1)
 
 >
 ----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 56, 56]           4,160
            Conv2d-2           [-1, 64, 28, 28]          36,928
            Conv2d-3          [-1, 128, 28, 28]           8,320
            Conv2d-4          [-1, 128, 28, 28]           8,320
================================================================
Total params: 57,728

차원이 달라져야 하는 경우, BottleneckBlock에서의 features' dimension.

$$ \begin{aligned} \textbf{bottleneck's parameters:} \\[0.5em] 57,728 = &\; (64 \times 64 \times 1 \times 1 + 64) \\ &+ (64 \times 64 \times 3 \times 3 + 64) \\ &+ (64 \times 128 \times 1 \times 1 + 128) \\ &+ (64 \times 128 \times 1 \times 1 + 128) \end{aligned} $$

728x90

'개발' 카테고리의 다른 글

JetPack6로 업그레이드하기. (NVIDIA Jetson Nano Orin Developer Kit) (0)	2025.03.29
YOLOv5 C3 Block 시각화 리뷰 (0)	2025.03.28
딥시크(DeepSeek-R1-Zero) 논문 리뷰 (0)	2025.01.31
라즈베리파이4와 스텝 모터 드라이버 연결 및 문제 해결 과정 (1)	2025.01.03
Next.js Introduction (0)	2024.04.20

현재글Deep Residual Learning for Image Recognition: ResNet 시각화 리뷰

hughqlee's blog

행복하자, 아프지 말고.

Image Classification, object detection, pyqt5, openai, CV, 노마드코더, DeepLearning, 라즈베리파이4, 북클럽, 노개북, 경주여행, raspi, tensorflow, 논문리뷰, 세이노의 가르침, python, Computer Vision, 해몽해드림, DART, Deep Learning,

250x250

Today :
Yesterday :

hughqlee's blog