[{"content":"Welcome To My Blog This is a picture I took when I traveled back to Beijing!\n","date":"2024-12-21T18:38:04-08:00","image":"https://7490c39e.personal-blog-6c9.pages.dev/p/welcome-to-my-blog/cover_hu9309997041864395908.jpg","permalink":"https://7490c39e.personal-blog-6c9.pages.dev/p/welcome-to-my-blog/","title":"Welcome To My Blog"},{"content":"Hetzner Cloud \u0026amp; Storage Box: A Technical Review Hetzner is a frequent recommendation for hosting applications or setting up a high-performance \u0026ldquo;homelab\u0026rdquo; in the cloud. They are known for exceptional price-to-performance ratios and a solid reputation for reliability. In this review, I evaluate the Hetzner Storage Box (BX11)($4 dollars per month) and the Cloud Server (CX33)($6.59/month), focusing on real-world bandwidth, latency, and hardware performance.\nCompute \u0026amp; Disk Performance Using the Yet-Another-Bench-Script (YABS) on a CX33 instance (4 vCPU AMD EPYC, 8GB RAM), I verified the machine\u0026rsquo;s baseline capabilities.\nDisk I/O: The NVMe storage is excellent, reaching 3.27 GB/s sequential throughput. This makes it ideal for database-heavy applications. Compute: A Geekbench 6 score of 1268 (Single) and 3133 (Multi) confirms it can handle modern web stacks and media transcoding (like Jellyfin) with ease. Network (Global): While internal European speeds are massive (reaching 13 Gbps in Amsterdam), trans-oceanic speeds to Los Angeles and Singapore remain stable at near-gigabit levels. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## # # Yet-Another-Bench-Script # # v2025-04-20 # # https://github.com/masonr/yet-another-bench-script # # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## # Sun Feb 8 11:10:15 PM UTC 2026 Basic System Information: --------------------------------- Uptime : 0 days, 1 hours, 59 minutes Processor : AMD EPYC-Rome Processor CPU cores : 4 @ 2445.406 MHz AES-NI : ✔ Enabled VM-x/AMD-V : ❌ Disabled RAM : 7.6 GiB Swap : 512.0 MiB Disk : 75.0 GiB Distro : Ubuntu 24.04.3 LTS Kernel : 6.8.0-100-generic VM Type : KVM IPv4/IPv6 : ✔ Online / ✔ Online IPv6 Network Information: --------------------------------- ISP : Hetzner Online GmbH ASN : AS24940 Hetzner Online GmbH Host : Hetzner Location : Nuremberg, Bavaria (BY) Country : Germany fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/sda1): --------------------------------- Block Size | 4k (IOPS) | 64k (IOPS) ------ | --- ---- | ---- ---- Read | 98.10 MB/s (24.5k) | 1.02 GB/s (15.9k) Write | 98.36 MB/s (24.5k) | 1.02 GB/s (16.0k) Total | 196.47 MB/s (49.1k) | 2.04 GB/s (31.9k) | | Block Size | 512k (IOPS) | 1m (IOPS) ------ | --- ---- | ---- ---- Read | 1.58 GB/s (3.1k) | 1.58 GB/s (1.5k) Write | 1.67 GB/s (3.2k) | 1.69 GB/s (1.6k) Total | 3.25 GB/s (6.3k) | 3.27 GB/s (3.2k) iperf3 Network Speed Tests (IPv4): --------------------------------- Provider | Location (Link) | Send Speed | Recv Speed | Ping ----- | ----- | ---- | ---- | ---- Clouvider | London, UK (10G) | 6.87 Gbits/sec | 6.79 Gbits/sec | 18.8 ms Eranium | Amsterdam, NL (100G) | 13.4 Gbits/sec | 3.46 Gbits/sec | 9.97 ms Uztelecom | Tashkent, UZ (10G) | 1.68 Gbits/sec | 1.29 Gbits/sec | 97.0 ms Leaseweb | Singapore, SG (10G) | 905 Mbits/sec | 562 Mbits/sec | 169 ms Clouvider | Los Angeles, CA, US (10G) | 847 Mbits/sec | 954 Mbits/sec | 183 ms Leaseweb | NYC, NY, US (10G) | 1.73 Gbits/sec | 2.22 Gbits/sec | 101 ms Edgoo | Sao Paulo, BR (1G) | 837 Mbits/sec | 1.31 Gbits/sec | 202 ms iperf3 Network Speed Tests (IPv6): --------------------------------- Provider | Location (Link) | Send Speed | Recv Speed | Ping ----- | ----- | ---- | ---- | ---- Clouvider | London, UK (10G) | 6.17 Gbits/sec | 7.40 Gbits/sec | 18.6 ms Eranium | Amsterdam, NL (100G) | 13.7 Gbits/sec | 3.58 Gbits/sec | 9.74 ms Uztelecom | Tashkent, UZ (10G) | 1.74 Gbits/sec | 1.32 Gbits/sec | 96.6 ms Leaseweb | Singapore, SG (10G) | 913 Mbits/sec | 1.38 Gbits/sec | 169 ms Clouvider | Los Angeles, CA, US (10G) | 833 Mbits/sec | 1.11 Gbits/sec | 183 ms Leaseweb | NYC, NY, US (10G) | 1.71 Gbits/sec | 2.37 Gbits/sec | 97.0 ms Edgoo | Sao Paulo, BR (1G) | 704 Mbits/sec | 1.11 Gbits/sec | 202 ms Geekbench 6 Benchmark Test: --------------------------------- Test | Value | Single Core | 1268 Multi Core | 3133 Full Test | https://browser.geekbench.com/v6/cpu/16486805 Network Throughput \u0026amp; Latency A critical part of this test was comparing the connection between a Private LA Server and the Hetzner (Nuremberg) instance.\nInternal Throughput: CX33 to Storage Box (BX11) Cross-Continental Sync: LA to Germany Iperf3 Network Benchmarks 1. Internal Throughput: CX33 to Storage Box (BX11) To test the internal \u0026ldquo;backplane\u0026rdquo; speed between the Cloud instance (Nuremberg) and the Storage Box, I used the dd utility. This measures raw disk and network performance without the overhead of complex sync logic.\nLocal Write \u0026amp; Read Performance\nSequential Write: 62.4 MB/s (approx. 500 Mbps)\nSequential Read: 544 MB/s (approx. 4.3 Gbps)\nThe read speeds are impressive, for media streaming (e.g., Jellyfin), the Storage Box can easily saturate a 1Gbps link. However, write speeds are significantly slower likely due to the mechanical HDD architecture and RAID overhead of the Storage Box backend.\n1 2 3 4 5 6 7 # Sequential Write Test (to Storage Box) dd if=/dev/zero of=/mnt/storagebox/testfile bs=1G count=1 oflag=dsync status=progress # Result: 1.1 GB copied in 17.2s (62.4 MB/s) # Sequential Read Test (from Storage Box) dd if=/mnt/storagebox/testfile of=/dev/null bs=1G count=1 status=progress # Result: 1.1 GB copied in 1.97s (544 MB/s) 2. Cross-Continental Sync: LA to Germany Moving 500GB+ of data from a private Los Angeles server to Hetzner (Germany) revealed the limitations of standard sync tools over high-latency paths.\nThe Rclone with parallel transfers (\u0026ndash;transfers 3) averaging 30-32 MiB/s when transfering video files.\n1 rclone copy ./videos hetzner-storage:home/ --progress --transfers 3 --checkers 4 --buffer-size 16M --retries 10 --low-level-retries 20 --timeout 5m --contimeout 15s --stats 10s --log-file rclone-hetzner-copy.log --log-level INFO 3. Iperf3 Network Benchmark I performed bidirectional iperf3 tests to visualize where the packet loss occurs.\nConnection Path Direction Bitrate (Avg) Retransmissions (Retr) Congestion Window (Cwnd) LA Private ↔ LA Public Upload 176 Mbits/sec 3 771 KBytes Download 507 Mbits/sec 52 - Hetzner ↔ LA Public Upload 107 Mbits/sec 0 5.57 MBytes Download 122 Mbits/sec 0 - LA Private ↔ Hetzner Upload 94.4 Mbits/sec 605 2.34 MBytes Download 110 Mbits/sec 0 - Performance Summary LA Private ↔ LA Public: This link shows the highest raw capacity (507 Mbps Download). The low latency of being in the same city allows for high burst speeds, though the 52 retransmissions suggest some minor line noise or congestion. Hetzner ↔ LA Public: A very \u0026ldquo;clean\u0026rdquo; transatlantic route. Despite the physical distance, 0 retransmissions and a massive 5.57 MB Congestion Window indicate a very stable, high-quality peering connection. LA Private ↔ Hetzner: This is the weakest link. The 605 retransmissions on the upload side indicate significant packet loss. This is likely due to a poor routing path between your specific LA provider and Hetzner\u0026rsquo;s German/Finnish data centers. Speed between LA private server to public iperf3 server at LA\n1 2 3 4 5 6 7 8 9 10 11 iperf3 -c la.speedtest.clouvider.net -p 5200-5209 --bidir -i 10 [ ID][Role] Interval Transfer Bitrate Retr Cwnd [ 5][TX-C] 0.00-10.00 sec 209 MBytes 176 Mbits/sec 3 771 KBytes [ 7][RX-C] 0.00-10.00 sec 604 MBytes 507 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-10.00 sec 209 MBytes 176 Mbits/sec 3 sender [ 5][TX-C] 0.00-10.01 sec 206 MBytes 173 Mbits/sec receiver [ 7][RX-C] 0.00-10.00 sec 607 MBytes 510 Mbits/sec 52 sender [ 7][RX-C] 0.00-10.01 sec 604 MBytes 506 Mbits/sec receiver Speed between Hetzner to public LA iperf3 server\n1 2 3 4 5 6 7 8 9 10 11 iperf3 -c la.speedtest.clouvider.net -p 5200-5209 [ ID][Role] Interval Transfer Bitrate Retr Cwnd [ 5][TX-C] 0.00-10.01 sec 128 MBytes 107 Mbits/sec 0 5.57 MBytes [ 7][RX-C] 0.00-10.01 sec 145 MBytes 122 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-10.01 sec 128 MBytes 107 Mbits/sec 0 sender [ 5][TX-C] 0.00-10.19 sec 127 MBytes 105 Mbits/sec receiver [ 7][RX-C] 0.00-10.01 sec 149 MBytes 125 Mbits/sec 0 sender [ 7][RX-C] 0.00-10.19 sec 145 MBytes 120 Mbits/sec receiver Speed between private LA to Hetzer server\n1 2 3 4 5 6 7 8 9 [ ID][Role] Interval Transfer Bitrate Retr Cwnd [ 5][TX-C] 0.00-10.00 sec 112 MBytes 94.4 Mbits/sec 605 2.34 MBytes [ 7][RX-C] 0.00-10.00 sec 131 MBytes 110 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-10.00 sec 112 MBytes 94.4 Mbits/sec 605 sender [ 5][TX-C] 0.00-10.17 sec 111 MBytes 91.6 Mbits/sec receiver [ 7][RX-C] 0.00-10.00 sec 133 MBytes 112 Mbits/sec 0 sender [ 7][RX-C] 0.00-10.17 sec 131 MBytes 108 Mbits/sec receiver IP Quality and Service Access I conducted an IP integrity check using the IPQuality script to evaluate the reputation and \u0026ldquo;reachability\u0026rdquo; of the Hetzner CX33 network.\nMetric IPv4 Status IPv6 Status Verdict Risk Score 3 - 6 (Low) 0 - 6 (Extremely Low) Safe IP Type Native / Data Center Native / Data Center Clean IPQS Fraud Score 75 (Suspicious) 75 (Suspicious) Caution Port 25 (Mail) Blocked (In/Out) Blocked (In/Out) Restricted Blacklist Status 0 Listings 0 Listings Excellent Service Unlock Status:\n✅ Unlocked: Netflix (DE), YouTube, TikTok (IPv4), ChatGPT. ❌ Blocked: Disney+, Reddit. Service Unlock Status Hetzner\u0026rsquo;s German IPs are surprisingly \u0026ldquo;clean\u0026rdquo; for a major data center provider, successfully unlocking most high-demand services:\nStreaming \u0026amp; AI: Successfully unlocks Netflix (DE), YouTube, and ChatGPT natively.\nSocial \u0026amp; Regional: TikTok works on IPv4, but Disney+ and Reddit remain blocked/shielded, which is common for data center ranges.\nMail Hosting: Outbound Port 25 is strictly blocked. If you plan to run a mail server, you will need to use an external relay like SendGrid or Amazon SES.\nWhile most databases (AbuseIPDB, Scamalytics) score this IP as low risk, IPQS flagged it at 75 (Suspicious). This is a common \u0026ldquo;false positive\u0026rdquo; for Hetzner because their ranges are frequently used by developers and VPN providers. For most web applications and scraping tasks, the low score in the other four databases is a better indicator of health.\nFinal Verdict The Hetzner CX33 is a high-value powerhouse, delivering exceptional NVMe storage speeds and robust CPU performance for everything from web apps to mid-range development environments. In contrast, the BX11 Storage Box serves as a cost-effective \u0026ldquo;warm\u0026rdquo; or \u0026ldquo;cold\u0026rdquo; storage tier. While its sequential read speeds are strong, its write performance is modest, and the high latency of cross-continental links necessitates optimized tools like rclone to maintain stability.\nEncoding Note: A critical limitation of the Storage Box is its lack of native support for Chinese (CJK) characters in its internal environment. In the backend, non-ASCII filenames often render as raw escape sequences (slashes and numbers), and because you lack administrative terminal access within the Storage Box, you cannot manually install the locales or fonts required to fix this. However, this is purely a backend display issue; when the drive is mounted to an external Linux system using the iocharset=utf8 flag, the characters will display and behave correctly.\nTips:\nWhen transfering files with rclone use at least 3-8 parallel transfers to saturate the bandwidth. Make sure to use utf8 when mounting CIFS/SMB, otherwise Chinese characters won\u0026rsquo;t display correctly due to default encoder iso8859-1. Enable BBR for high latency connetions. ","date":"2026-02-08T14:25:40-08:00","image":"https://7490c39e.personal-blog-6c9.pages.dev/p/hetzner-review/dashboard_hu5199872190773257268.png","permalink":"https://7490c39e.personal-blog-6c9.pages.dev/p/hetzner-review/","title":"Hetzner Review"},{"content":"Install Docker Official Docker install page\nWe will use the shell script from Docker\nLinux\n1 2 curl -fsSL https://get.docker.com -o get-docker.sh sh get-docker.sh Mac\n1 brew install docker \u0026#x1f4a1; If you don\u0026rsquo;t have Homebrew installed, you can install it by running the following command in your macOS terminal:\n1 /bin/bash -c \u0026#34;$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\u0026#34; Windows\nFor Windows, Docker provides a desktop version that simplifies the installation process.\nGo to the Docker Desktop download page. Download the Docker Desktop installer for Windows. Run the installer and follow the on-screen instructions. Once installed, Docker Desktop should automatically start, and you can verify the installation by opening a terminal and running:\n1 docker --version Fix Docker Permission Denied If you encounter a \u0026ldquo;Permission Denied\u0026rdquo; error when running Docker commands, it may be due to Docker requiring root privileges. To fix this and allow Docker to run as a non-root user, follow these steps:\nReferences: StackOverflow Docker Documentation\nAdd your user to the docker group: 1 2 sudo groupadd docker sudo usermod -aG docker $USER Apply the new group membership: 1 newgrp docker Restart Docker to apply the changes: 1 sudo systemctl restart docker This will allow you to run Docker commands without using sudo.\nRoot If you need to run Docker as the root user, you can do so by prefixing Docker commands with sudo. However, this method is not recommended for security reasons.\nRootless If you prefer to use Docker without requiring root privileges, you can follow the official guide to set up Docker in rootless mode. This method allows users to run Docker without needing sudo at all, offering a more secure environment. To learn more, check out the Rootless Docker documentation.\nDocker Usage Example Once Docker is installed and running, you can start using it to create and manage containers. Here\u0026rsquo;s an example of how to run a simple Docker container:\nPull an image from Docker Hub: 1 docker pull hello-world Run the container: 1 docker run hello-world This will download the hello-world image from Docker Hub (if not already present) and run it inside a container. The output will display a welcome message confirming that Docker is working.\nList running containers: 1 docker ps List all containers (including stopped ones): 1 docker ps -a Stop a running container: 1 docker stop \u0026lt;container_id\u0026gt; Replace \u0026lt;container_id\u0026gt; with the actual ID of the container you want to stop, which you can find by running docker ps.\nDocker Mirror for Users in China If you\u0026rsquo;re located in China, you might experience difficulties pulling Docker images due to the country\u0026rsquo;s ban on Docker Hub. To resolve this, you can configure Docker to use a mirror source.\nTencent Docker Mirror (Tencent VPS only) Tencent offers a reliable Docker mirror for users in China. To use it, follow these steps:\nEdit or create the Docker daemon configuration file at /etc/docker/daemon.json: 1 sudo nano /etc/docker/daemon.json Add the following configuration to use Tencent\u0026rsquo;s mirror: 1 2 3 { \u0026#34;registry-mirrors\u0026#34;: [\u0026#34;https://mirror.ccs.tencentyun.com\u0026#34;] } Save the file and restart Docker: 1 sudo systemctl restart docker Now, Docker will use the Tencent mirror when pulling images, improving speed and reliability for users in China.\nAdditional Mirrors If you\u0026rsquo;d like more options, here are some other popular Docker mirrors:\nAliyun (Alibaba Cloud) Docker Mirror: 1 2 3 { \u0026#34;registry-mirrors\u0026#34;: [\u0026#34;https://registry.cn-hangzhou.aliyuncs.com\u0026#34;] } On Mac Reference: Stackoverflow Docker The config file is stored in the following path\n~/.docker/daemon.json\nYou can add other sources\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 { \u0026#34;registry-mirrors\u0026#34;: [ \u0026#34;https://dockerpull.org\u0026#34;, \u0026#34;https://docker.1panel.dev\u0026#34;, \u0026#34;https://docker.foreverlink.love\u0026#34;, \u0026#34;https://docker.fxxk.dedyn.io\u0026#34;, \u0026#34;https://docker.xn--6oq72ry9d5zx.cn\u0026#34;, \u0026#34;https://docker.zhai.cm\u0026#34;, \u0026#34;https://docker.5z5f.com\u0026#34;, \u0026#34;https://a.ussh.net\u0026#34;, \u0026#34;https://docker.cloudlayer.icu\u0026#34;, \u0026#34;https://hub.littlediary.cn\u0026#34;, \u0026#34;https://hub.crdz.gq\u0026#34;, \u0026#34;https://docker.unsee.tech\u0026#34;, \u0026#34;https://docker.kejilion.pro\u0026#34;, \u0026#34;https://registry.dockermirror.com\u0026#34;, \u0026#34;https://hub.rat.dev\u0026#34;, \u0026#34;https://dhub.kubesre.xyz\u0026#34;, \u0026#34;https://docker.nastool.de\u0026#34;, \u0026#34;https://docker.udayun.com\u0026#34;, \u0026#34;https://docker.rainbond.cc\u0026#34;, \u0026#34;https://hub.geekery.cn\u0026#34;, \u0026#34;https://docker.1panelproxy.com\u0026#34;, \u0026#34;https://atomhub.openatom.cn\u0026#34;, \u0026#34;https://docker.m.daocloud.io\u0026#34;, \u0026#34;https://docker.1ms.run\u0026#34;, \u0026#34;https://docker.linkedbus.com\u0026#34;, \u0026#34;https://dytt.online\u0026#34;, \u0026#34;https://func.ink\u0026#34;, \u0026#34;https://lispy.org\u0026#34;, \u0026#34;https://docker.xiaogenban1993.com\u0026#34; ] } ","date":"2025-02-12T19:50:34+08:00","image":"https://7490c39e.personal-blog-6c9.pages.dev/p/docker-setup/image2_hu100910552199442870.png","permalink":"https://7490c39e.personal-blog-6c9.pages.dev/p/docker-setup/","title":"Docker Setup"},{"content":" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from __future__ import print_function import matplotlib.pyplot as plt import numpy as np import pandas as pd import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from linformer import Linformer from sklearn.model_selection import train_test_split from torch.optim.lr_scheduler import StepLR from torch.utils.data import DataLoader, Dataset from torchvision import datasets, transforms from tqdm.notebook import tqdm import torchvision from vit_pytorch.efficient import ViT 1 2 3 4 5 6 7 8 print(f\u0026#34;Is CUDA supported by this system? {torch.cuda.is_available()}\u0026#34;) print(f\u0026#34;CUDA version: {torch.version.cuda}\u0026#34;) # Storing ID of current CUDA device cuda_id = torch.cuda.current_device() print(f\u0026#34;ID of current CUDA device: {torch.cuda.current_device()}\u0026#34;) print(f\u0026#34;Name of current CUDA device: {torch.cuda.get_device_name(cuda_id)}\u0026#34;) Is CUDA supported by this system? True CUDA version: 11.7 ID of current CUDA device: 0 Name of current CUDA device: NVIDIA GeForce RTX 4090 Pre Processing Load Data Here we are loading the CIFAR100 data set using the built-in function from PyTorch.\n1 2 3 4 5 6 7 8 batchSize = 128 # Orginial data is list of tuples (PIL Image, class label) # train_split = torchvision.datasets.CIFAR100(\u0026#39;./cifar-100\u0026#39;, train=True,download=True, transform = transforms.Compose([transforms.ToTensor()])) # test_split = torchvision.datasets.CIFAR100(\u0026#39;./cifar-100\u0026#39;, train=False,download=True, transform = transforms.Compose([transforms.ToTensor()])) train_split = torchvision.datasets.CIFAR100(\u0026#39;./cifar-100\u0026#39;, train=True,download=True) test_split = torchvision.datasets.CIFAR100(\u0026#39;./cifar-100\u0026#39;, train=False,download=True) Files already downloaded and verified Files already downloaded and verified 1 train_split[0] (\u0026lt;PIL.Image.Image image mode=RGB size=32x32\u0026gt;, 19) Each element in the train and test split contains an image in tensor and its class label. Here\u0026rsquo;s a dictionary that translate the number class labels to text labels.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 textLabel = [ \u0026#39;apple\u0026#39;, # id 0 \u0026#39;aquarium_fish\u0026#39;, \u0026#39;baby\u0026#39;, \u0026#39;bear\u0026#39;, \u0026#39;beaver\u0026#39;, \u0026#39;bed\u0026#39;, \u0026#39;bee\u0026#39;, \u0026#39;beetle\u0026#39;, \u0026#39;bicycle\u0026#39;, \u0026#39;bottle\u0026#39;, \u0026#39;bowl\u0026#39;, \u0026#39;boy\u0026#39;, \u0026#39;bridge\u0026#39;, \u0026#39;bus\u0026#39;, \u0026#39;butterfly\u0026#39;, \u0026#39;camel\u0026#39;, \u0026#39;can\u0026#39;, \u0026#39;castle\u0026#39;, \u0026#39;caterpillar\u0026#39;, \u0026#39;cattle\u0026#39;, \u0026#39;chair\u0026#39;, \u0026#39;chimpanzee\u0026#39;, \u0026#39;clock\u0026#39;, \u0026#39;cloud\u0026#39;, \u0026#39;cockroach\u0026#39;, \u0026#39;couch\u0026#39;, \u0026#39;crab\u0026#39;, \u0026#39;crocodile\u0026#39;, \u0026#39;cup\u0026#39;, \u0026#39;dinosaur\u0026#39;, \u0026#39;dolphin\u0026#39;, \u0026#39;elephant\u0026#39;, \u0026#39;flatfish\u0026#39;, \u0026#39;forest\u0026#39;, \u0026#39;fox\u0026#39;, \u0026#39;girl\u0026#39;, \u0026#39;hamster\u0026#39;, \u0026#39;house\u0026#39;, \u0026#39;kangaroo\u0026#39;, \u0026#39;computer_keyboard\u0026#39;, \u0026#39;lamp\u0026#39;, \u0026#39;lawn_mower\u0026#39;, \u0026#39;leopard\u0026#39;, \u0026#39;lion\u0026#39;, \u0026#39;lizard\u0026#39;, \u0026#39;lobster\u0026#39;, \u0026#39;man\u0026#39;, \u0026#39;maple_tree\u0026#39;, \u0026#39;motorcycle\u0026#39;, \u0026#39;mountain\u0026#39;, \u0026#39;mouse\u0026#39;, \u0026#39;mushroom\u0026#39;, \u0026#39;oak_tree\u0026#39;, \u0026#39;orange\u0026#39;, \u0026#39;orchid\u0026#39;, \u0026#39;otter\u0026#39;, \u0026#39;palm_tree\u0026#39;, \u0026#39;pear\u0026#39;, \u0026#39;pickup_truck\u0026#39;, \u0026#39;pine_tree\u0026#39;, \u0026#39;plain\u0026#39;, \u0026#39;plate\u0026#39;, \u0026#39;poppy\u0026#39;, \u0026#39;porcupine\u0026#39;, \u0026#39;possum\u0026#39;, \u0026#39;rabbit\u0026#39;, \u0026#39;raccoon\u0026#39;, \u0026#39;ray\u0026#39;, \u0026#39;road\u0026#39;, \u0026#39;rocket\u0026#39;, \u0026#39;rose\u0026#39;, \u0026#39;sea\u0026#39;, \u0026#39;seal\u0026#39;, \u0026#39;shark\u0026#39;, \u0026#39;shrew\u0026#39;, \u0026#39;skunk\u0026#39;, \u0026#39;skyscraper\u0026#39;, \u0026#39;snail\u0026#39;, \u0026#39;snake\u0026#39;, \u0026#39;spider\u0026#39;, \u0026#39;squirrel\u0026#39;, \u0026#39;streetcar\u0026#39;, \u0026#39;sunflower\u0026#39;, \u0026#39;sweet_pepper\u0026#39;, \u0026#39;table\u0026#39;, \u0026#39;tank\u0026#39;, \u0026#39;telephone\u0026#39;, \u0026#39;television\u0026#39;, \u0026#39;tiger\u0026#39;, \u0026#39;tractor\u0026#39;, \u0026#39;train\u0026#39;, \u0026#39;trout\u0026#39;, \u0026#39;tulip\u0026#39;, \u0026#39;turtle\u0026#39;, \u0026#39;wardrobe\u0026#39;, \u0026#39;whale\u0026#39;, \u0026#39;willow_tree\u0026#39;, \u0026#39;wolf\u0026#39;, \u0026#39;woman\u0026#39;, \u0026#39;worm\u0026#39;, ] Insepct Data Plot nine random CIFAR100 images using matplotlib\n1 2 3 4 5 6 7 random_idx = np.random.randint(1, len(train_split), size=9) fig, axes = plt.subplots(3, 3, figsize=(16, 12)) for idx, ax in enumerate(axes.ravel()): randIndex = random_idx[idx] ax.set_title(\u0026#39;The label is: \u0026#39; + textLabel[train_split[randIndex][1]]) ax.imshow(train_split[randIndex][0]) 1 2 print(train_split) print(test_split) Dataset CIFAR100 Number of datapoints: 50000 Root location: ./cifar-100 Split: Train Dataset CIFAR100 Number of datapoints: 10000 Root location: ./cifar-100 Split: Test Split We first do a 80/20 train test stratify split by label\n1 labels = [train_split[i][1] for i in range(len(train_split))] 1 train_list, valid_list = train_test_split(train_split, test_size=0.2, shuffle=True, stratify=labels) #, stratify=[i[1] for i in train_split] Here we are inspecting the distribution of the variables, the x axis is the class labels in numbers, the y axis the count for that class.\n1 2 3 4 5 import plotly.express as px x = [train_list[i][1] for i in range(len(train_list))] fig = px.histogram(x) fig.update_layout(title=\u0026#34;Train list\u0026#34;,bargap=0.2) fig.show() 1 2 3 4 x = [valid_list[i][1] for i in range(len(valid_list))] fig = px.histogram(title=\u0026#34;Valid list\u0026#34;, y=x) fig.update_layout(bargap=0.2) fig.show() 1 2 3 4 x = [test_split[i][1] for i in range(len(test_split))] fig = px.histogram(x) fig.update_layout(title=\u0026#34;Test list\u0026#34;,bargap=0.2) fig.show() 1 2 3 print(f\u0026#34;Train Data: {len(train_list)}\u0026#34;) print(f\u0026#34;Validation Data: {len(valid_list)}\u0026#34;) print(f\u0026#34;Test Data: {len(test_split)}\u0026#34;) Train Data: 40000 Validation Data: 10000 Test Data: 10000 Datasets Loading and Argumentations Here we define the data argumentations and create data loaders for each data split.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 from torchvision.transforms.autoaugment import AutoAugmentPolicy all_transforms = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomVerticalFlip(), transforms.RandomRotation(degrees=(0, 180)), transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0), transforms.AutoAugment(AutoAugmentPolicy.CIFAR10), transforms.ToTensor(), # transforms.RandomErasing(), ]) val_transforms = transforms.Compose( [ transforms.ToTensor(), ] ) test_transforms = transforms.Compose( [ transforms.ToTensor(), ] ) Here we are defining our own data class with transforms.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class CIFAR100Dataset(Dataset): def __init__(self, rawData, transform=None): self.rawData = rawData self.transform = transform def __len__(self): self.dataSize = len(self.rawData) return self.dataSize def __getitem__(self, idx): rawData = self.rawData[idx] img = rawData[0] img_transformed = self.transform(img) label = rawData[1] return img_transformed, label 1 2 3 train_list_transformed = CIFAR100Dataset(train_list, transform=all_transforms) valid_list_transformed= CIFAR100Dataset(valid_list, transform=val_transforms) test_split_transformed= CIFAR100Dataset(test_split, transform=test_transforms) Inspect the transformed data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 random_idx = np.random.randint(1, len(train_list_transformed), size=9) fig, axes = plt.subplots(3, 3, figsize=(16, 12)) fig.suptitle(\u0026#34;Transformed Images\u0026#34;, fontsize=14) for idx, ax in enumerate(axes.ravel()): randIndex = random_idx[idx] ax.set_title(\u0026#39;The label is: \u0026#39; + textLabel[train_list_transformed[randIndex][1]]) ax.imshow(transforms.ToPILImage()(train_list_transformed[randIndex][0])) PATCH_SIZE = 8 PATCH_NUM = int(32 / PATCH_SIZE) patches = train_list_transformed[random_idx[0]][0].unfold(1, PATCH_SIZE, PATCH_SIZE).unfold(2, PATCH_SIZE, PATCH_SIZE) fig, ax = plt.subplots(PATCH_NUM, PATCH_NUM) fig.suptitle(\u0026#34;Patched Transformed Images\u0026#34;, fontsize=14) for i in range(PATCH_NUM): for j in range(PATCH_NUM): sub_img = patches[:, i, j] ax[i][j].imshow(torchvision.transforms.functional.to_pil_image(sub_img)) ax[i][j].axis(\u0026#39;off\u0026#39;) patches = patches.reshape(3, -1, PATCH_SIZE, PATCH_SIZE) patches.transpose_(0, 1) fig, ax = plt.subplots(1, PATCH_NUM*PATCH_NUM, figsize=(20, 20)) for i in range(PATCH_NUM**2): ax[i].imshow(torchvision.transforms.functional.to_pil_image(patches[i])) ax[i].axis(\u0026#39;off\u0026#39;) fig, ax = plt.subplots(1, 4) for i in range(4): ax[i].imshow(torchvision.transforms.functional.to_pil_image(patches[i])) ax[i].axis(\u0026#39;off\u0026#39;) 1 2 3 train_loader = torch.utils.data.DataLoader(train_list_transformed, batch_size=batchSize, shuffle=True) valid_loader = torch.utils.data.DataLoader(valid_list_transformed, batch_size=batchSize, shuffle=True) test_loader = torch.utils.data.DataLoader(test_split_transformed, batch_size=batchSize, shuffle=True) 1 print(len(train_list), len(train_loader)) 40000 313 1 print(len(valid_list), len(valid_loader)) 10000 79 Here we are inspecting the transformed data.\n1 train_list_transformed[0] (tensor([[[0.5176, 0.5882, 0.7255, ..., 0.3451, 0.3882, 0.4431], [0.5255, 0.6431, 0.9882, ..., 0.3059, 0.5255, 0.4706], [0.5255, 0.7255, 0.7333, ..., 0.4000, 0.6000, 0.4353], ..., [0.9882, 0.8784, 0.6980, ..., 0.1529, 0.1725, 0.2784], [0.9804, 0.8627, 0.6784, ..., 0.2000, 0.2706, 0.4431], [0.9882, 0.8784, 0.7176, ..., 0.2510, 0.4078, 0.5608]], [[0.4431, 0.5255, 0.6627, ..., 0.3529, 0.4000, 0.4510], [0.4627, 0.5804, 0.6980, ..., 0.3176, 0.5333, 0.4784], [0.4510, 0.6627, 0.6706, ..., 0.4078, 0.6078, 0.4431], ..., [0.6980, 0.9333, 0.7451, ..., 0.1804, 0.2000, 0.3059], [0.7176, 0.9176, 0.7333, ..., 0.2235, 0.2980, 0.4706], [0.6980, 0.9333, 0.7725, ..., 0.2784, 0.4431, 0.6000]], [[0.0745, 0.1255, 0.2902, ..., 0.3255, 0.3922, 0.4667], [0.0863, 0.2000, 0.3412, ..., 0.2784, 0.5804, 0.5059], [0.0863, 0.3137, 0.3020, ..., 0.4157, 0.6824, 0.4549], ..., [0.3137, 0.4275, 1.0000, ..., 0.2510, 0.2784, 0.4275], [0.3412, 0.4549, 0.9725, ..., 0.3137, 0.4039, 0.6431], [0.3255, 0.4392, 0.6431, ..., 0.3922, 0.5686, 0.7686]]]), 67) 1 2 3 4 5 train_list_transformed[0][0].shape ch, seqDim, _ = train_list_transformed[0][0].shape print(ch, seqDim) print(train_list_transformed[0][0].shape) print(len(train_list_transformed)) 3 32 torch.Size([3, 32, 32]) 40000 Efficient Attention We want to use patch size of 8x8 for our CIFAR100 image which has 32x32 dimension. Note: large patch size would make the model fail to predict objects with complex features.\nHere we are using Linformer from paper Linformer: Self-Attention with Linear Complexity by Sinong Wang et al. The implementation of this transformer is provided by lucidrains.\n1 2 3 4 5 6 dim: the dimension of each head in multi-head attention k: the k that the key/values are projected to along the sequence dimension heads: number of heads dropout: the dropout rate for the linear layers depth: number of transformer block seq_len: the length of the sequence (number of pixels + class label) 1 2 3 4 5 6 7 8 efficient_transformer = Linformer( dim=256, seq_len=64+1, # 8x8 patches + 1 cls-token depth=4, heads=8, k=64, dropout = 0.1 ) 1 device = \u0026#39;cuda\u0026#39; Construct the transformer model using the transformer defined above. The implementation of the model is provided by lucidrains\u0026rsquo;s vit-pytorch.\n1 2 3 4 5 dim: Last dimension of output tensor after linear transformation patch_size: Number of patches image_size: dimension of the input image num_classes: classes to classify channels: color channels 1 2 3 4 5 6 7 8 model = ViT( dim=256, image_size=32, # 32 pixel by 32 pixel image patch_size=4, # Total 4 patch 8x8 each num_classes=100, transformer=efficient_transformer, channels=3 ).to(device) We decided to use SGD for classification over all other optimizers for our task after a lot of research and experiment. Adam and RMSprop didn\u0026rsquo;t perform as well as SGD. A basic scheduler was added to prevent overshoot according to our past experiment where 30% valid accuracy seems to be a barrier. The scheduler was set to decay the learning rate every 10 epoch at 80 percent.\n1 2 3 4 5 6 7 # loss function criterion = nn.CrossEntropyLoss() # optimizer lr = 5e-3 optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.99) # weight_decay=0.01 # optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=5e-5) # weight_decay=0.01 scheduler = StepLR(optimizer, step_size=10, gamma=0.8) 1 2 records = [] # A variable to record data for each epoch so we can save the data in a csv later # model = torch.load(\u0026#39;./argumentModel\u0026#39;, map_location=device) 1 2 # Waking from suspend cause the nvidia driver to fail sometimes, this command remove and add nvidia_uvm module to solve this problem # !sudo modprobe -r nvidia_uvm \u0026amp;\u0026amp; sudo modprobe nvidia_uvm We are using a mixed precision training here to speed up the training process.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 scaler = torch.cuda.amp.GradScaler(enabled=True) model.train() for epoch in range(200): epoch_loss = 0 epoch_accuracy = 0 for data, label in tqdm(train_loader): data = data.to(device) label = label.to(device) with torch.autocast(device_type=\u0026#39;cuda\u0026#39;, dtype=torch.float16): output = model(data) assert output.dtype is torch.float16 loss = criterion(output, label) assert loss.dtype is torch.float32 scaler.scale(loss).backward() # loss.backward() scaler.step(optimizer) # scheduler.step() # optimizer.step() scaler.update() optimizer.zero_grad() acc = (output.argmax(dim=1) == label).float().mean() epoch_accuracy += acc / len(train_loader) epoch_loss += loss / len(train_loader) with torch.no_grad(): epoch_val_accuracy = 0 epoch_val_loss = 0 for data, label in valid_loader: data = data.to(device) label = label.to(device) val_output = model(data) val_loss = criterion(val_output, label) acc = (val_output.argmax(dim=1) == label).float().mean() epoch_val_accuracy += acc / len(valid_loader) epoch_val_loss += val_loss / len(valid_loader) records.append([int(epoch+1), epoch_loss.detach().cpu().numpy(), epoch_accuracy.detach().cpu().numpy(), epoch_val_loss.detach().cpu().numpy(), epoch_val_accuracy.detach().cpu().numpy()]) print( f\u0026#34;Epoch : {epoch+1} - loss : {epoch_loss:.4f} - acc: {epoch_accuracy:.4f} - val_loss : {epoch_val_loss:.4f} - val_acc: {epoch_val_accuracy:.4f}\\n\u0026#34; ) 1 2 pytorch_total_params = sum(p.numel() for p in model.parameters()) \u0026#34;Total Model parameters: \u0026#34; + str(pytorch_total_params) 'Total Model parameters: 3245508' We save the training datas into a csv file.\n1 2 3 4 5 6 7 8 9 # Remove epoch use index instead def saveModel(modelName): torch.save(model, \u0026#39;./\u0026#39; + modelName) df = pd.DataFrame(np.array(records), columns=[\u0026#39;epoch\u0026#39;, \u0026#39;epoch_loss\u0026#39;, \u0026#39;epoch_accuracy\u0026#39;, \u0026#39;epoch_val_loss\u0026#39;, \u0026#39;epoch_val_accuracy\u0026#39;]) with open(\u0026#39;./\u0026#39; + modelName + \u0026#39;.csv\u0026#39;, \u0026#39;a\u0026#39;) as file: df.to_csv(\u0026#39;./\u0026#39; + modelName + \u0026#39;.csv\u0026#39;, mode=\u0026#39;a\u0026#39;, index=False) file.close() return df saveModel(\u0026#39;argumentModel3m\u0026#39;) epoch epoch_loss epoch_accuracy epoch_val_loss epoch_val_accuracy 0 1.0 4.733254 0.011282 4.483023 0.020965 1 2.0 4.468264 0.025909 4.345340 0.039953 2 3.0 4.409954 0.033072 4.277212 0.046183 3 4.0 4.363261 0.037640 4.194314 0.049150 4 5.0 4.269330 0.052741 4.093012 0.071104 ... ... ... ... ... ... 367 168.0 0.801882 0.782972 3.416599 0.398141 368 169.0 0.807127 0.779703 3.434716 0.401800 369 170.0 0.790040 0.789462 3.495779 0.403085 370 171.0 0.802609 0.784545 3.469838 0.404371 371 172.0 0.815568 0.778655 3.411168 0.407437 372 rows × 5 columns\n1 2 def getModelCSV(modelName): return pd.read_csv(\u0026#39;./\u0026#39; + modelName + \u0026#39;.csv\u0026#39;) 1 2 3 4 5 6 import hvplot.pandas def getModelPlot(modelName): return getModelCSV(modelName).hvplot(title=f\u0026#39;{modelName}\u0026#39;, xlabel=\u0026#39;epoch\u0026#39;, ylabel=\u0026#39;%\u0026#39;, use_index=True, y=[\u0026#39;epoch_loss\u0026#39;, \u0026#39;epoch_accuracy\u0026#39;, \u0026#39;epoch_val_loss\u0026#39;, \u0026#39;epoch_val_accuracy\u0026#39;], kind=\u0026#39;line\u0026#39;) getModelPlot(\u0026#39;argumentModel3m\u0026#39;) Output Samples 1 2 3 4 5 6 7 8 9 10 11 12 13 random_idx = np.random.randint(1, len(valid_list_transformed), size=9) fig, axes = plt.subplots(3, 3, figsize=(16, 12)) model.eval() for idx, ax in enumerate(axes.ravel()): randIndex = random_idx[idx] # input tensor must be [batch size, channels, h, w] predictLabel = model(valid_list_transformed[randIndex][0].unsqueeze(0).to(device)).argmax(dim=1) trueLabel = valid_list_transformed[randIndex][1] ax.set_title(\u0026#39;Prediction: \u0026#39; + textLabel[predictLabel] + \u0026#39;\\n\u0026#39; + \u0026#39;True Label: \u0026#39; + textLabel[trueLabel]) ax.imshow(transforms.ToPILImage()(valid_list_transformed[randIndex][0])) Future Works The large patch size might caused be the cause for the model to fail on complex shapes, but the model was able to succuessfully capture common patterns in simple objects as shown above. We could improve the prediction on complex images by implementing a Compact Convolutional Transformers or use a Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition that dynamically reduce the patch size to better predict complex images.\n","date":"2024-12-22T14:33:00-08:00","image":"https://7490c39e.personal-blog-6c9.pages.dev/p/vit-example-on-cifar/cifarvit_copy_files/cifarvit_copy_10_0_hu12580334287679304493.png","permalink":"https://7490c39e.personal-blog-6c9.pages.dev/p/vit-example-on-cifar/","title":"Vit Example on CIFAR"}]