연도 | 논문 | 내용 |
---|---|---|
Vision | ||
2014 | VAE (Kingma and Welling) | [✓] Training on MNIST [✓] Visualizing Encoder output [✓] Visualizing Decoder output [✓] Reconstructing image |
2015 | CAM (Zhou et al.) | [✓] Applying GoogLeNet [✓] Generating 'Class Activatio Map' [✓] Generating bounding box |
2016 | Gatys et al. | [✓] Experimenting on input image size [✓] Experimenting on VGGNet-19 with Batch normalization [✓] Applying VGGNet-19 |
YOLO (Redmon et al.) | [✓] Model architecture [✓] Visualizing ground truth on grid [✓] Visualizing model output [✓] Visualizing class probability map [ㅤ] Loss function [ㅤ] Training on VOC 2012 |
|
DCGAN (Radford et al.) | [✓] Training on CelebA at 64 × 64 [✓] Sampling [✓] Interpolating in latent space [ㅤ] Training on CelebA at 32 × 32 |
|
Noroozi et al. | [✓] Model architecture [✓] Chromatic aberration [✓] Permutation set |
|
Zhang et al. | [✓] Visualizing empirical probability distribution [ㅤ] Model architecture [ㅤ] Loss function [ㅤ] Training |
|
2014 2017 |
Conditional GAN (Mirza et al.) WGAN-GP (Gulrajani et al.) |
[✓] Training on MNIST |
2016 2017 |
VQ-VAE (Oord et al.) PixelCNN (Oord et al.) |
[✓] Training on Fashion MNIST [✓] Training on CIFAR-10 [✓] Sampling |
2017 | Pix2Pix (Isola et al.) | [✓] Experimenting on image mean and std [✓] Experimenting on nn.InstanceNorm2d() [✓] Training on Google Maps [✓] Training on Facades [ㅤ] higher resolution input image |
CycleGAN (Zhu et al.) | [✓] Experimenting on random image pairing [✓] Experimenting on LSGANs [✓] Training on monet2photo [✓] Training on vangogh2photo [✓] Training on cezanne2photo [✓] Training on ukiyoe2photo [✓] Training on horse2zebra [✓] Training on summer2winter_yosemite |
|
2018 | PGGAN (Karras et al.) | [✓] Experimenting on image mean and std [✓] Training on CelebA-HQ at 512 × 512 [✓] Sampling |
DeepLabv3 (Chen et al.) | [✓] Training on VOC 2012 [✓] Predicting on VOC 2012 validation set [✓] Average mIoU [✓] Visualizing model output |
|
RotNet (Gidaris et al.) | [✓] Visualizing Attention map | |
StarGAN (Yunjey Choi et al.) | [✓] Model architecture | |
2020 | STEFANN (Roy et al.) | [✓] FANnet architecture [✓] Colornet architecture [✓] Training FANnet on Google Fonts [✓] Custom Google Fonts dataset [✓] Average SSIM [ㅤ] Training Colornet |
DDPM (Ho et al.) | [✓] Training on CelebA at 32 × 32 [✓] Training on CelebA at 64 × 64 [✓] Visualizing denoising process [✓] Sampling using linear interpolation [✓] Sampling using coarse-to-fine interpolation |
|
DDIM (Song et al.) | [✓] Normal sampling [✓] Sampling using spherical linear interpolation [✓] Sampling using grid interpolation [✓] Truncated normal |
|
ViT (Dosovitskiy et al.) | [✓] Training on CIFAR-10 [✓] Training on CIFAR-100 [✓] Visualizing Attention map using Attention Roll-out [✓] Visualizing position embedding similarity [✓] Interpolating position embedding [✓] CutOut [✓] CutMix [✓] Hide-and-Seek |
|
SimCLR (Chen et al.) | [✓] Normalized temperature-scaled cross entropy loss [✓] Data augmentation [✓] Pixel intensity histogram |
|
DETR (Carion et al.) | [✓] Model architecture [ㅤ] Bipartite matching & loss [ㅤ] Batch normalization freezing [ㅤ] Training on COCO 2017 |
|
2021 | Improved DDPM (Nichol and Dhariwal) | [✓] Cosine diffusion schedule |
Classifier-Guidance (Dhariwal and Nichol) | [✓] Training on CIFAR-10 [ㅤ] AdaGN [ㅤ] BiGGAN Upsample/Downsample [ㅤ] Improved DDPM sampling [ㅤ] Conditional/Unconditional models [ㅤ] Super-resolution model [ㅤ] Interpolation |
|
ILVR (Choi et al.) | [✓] Sampling using single reference [✓] Sampling using various downsampling factors [✓] Sampling using various conditioning range |
|
SDEdit (Meng et al.) | [✓] User input stroke simulation [✓] Applying CelebA at 64 × 64 [ㅤ] Total repeats. [ㅤ] VE SDEdit. [ㅤ] Sampling from scribble. [ㅤ] Image editing only on masked regions. |
|
MAE (He et al.) | [✓] Model architecture for self-supervised pre-training [✓] Model architecture for classification [ㅤ] Self-supervised pre-training on ImageNet-1K [ㅤ] Fine-tuning on ImageNet-1K [ㅤ] Linear probing |
|
Copy-Paste (Ghiasi et al.) | [✓] COCO dataset processing [✓] Large scale jittering [✓] Copy-Paste (within mini-batch) [✓] Visualizing data [ㅤ] Gaussian filter |
|
ViViT (Arnab et al.) | [✓] 'Spatio-temporal attention' architecture [✓] 'Factorised encoder' architecture [✓] 'Factorised self-attention' architecture |
|
2022 | CFG (Ho et al.) | |
Language | ||
2017 | Transformer (Vaswani et al.) | [✓] Model architecture [✓] Visualizing position encoding |
2019 | BERT (Devlin et al.) | [✓] Model architecture [✓] Masked language modeling [✓] BookCorpus data processing [✓] SQuAD data processing [✓] SWAG data processing |
Sentence-BERT (Reimers et al.) | [✓] Classification loss [✓] Regression loss [✓] Constrastive loss [✓] STSb data processing [✓] WikiSection data processing [ㅤ] NLI data processing |
|
RoBERTa (Liu et al.) | [✓] BookCorpus data processing [✓] Masked language modeling [ㅤ] BookCorpus data processing ('SEGMENT-PAIR' + NSP) [ㅤ] BookCorpus data processing ('SENTENCE-PAIR' + NSP) [✓] BookCorpus data processing ('FULL-SENTENCES') [ㅤ] BookCorpus data processing ('DOC-SENTENCES') |
|
2021 | Swin Transformer (Liu et al.) | [✓] Patch partition [✓] Patch merging [✓] Relative position bias [✓] Feature map padding [✓] Self-attention in non-overlapped windows [ㅤ] Shifted Window based Self-Attention |
2024 | RoPE (Su et al.) | [✓] Rotary Positional Embedding |
Vision-Language | ||
2021 | CLIP (Radford et al.) | [✓] Training on Flickr8k + Flickr30k [✓] Zero-shot classification on ImageNet1k (mini) [✓] Linear classification on ImageNet1k (mini) |
-
-
- Seoul, Republic of Korea
Pinned Loading
-
train_easyocr
train_easyocr PublicFine-tuning 'EasyOCR' on the '공공행정문서 OCR' dataset provided by 'AI-Hub'.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.