Text2image AI
{
“nbformat”: 4,
“nbformat_minor”: 0,
“metadata”: {
“colab”: {
“private_outputs”: true,
“provenance”: [],
“collapsed_sections”: [
“1tthw0YaispD”
],
“machine_shape”: “hm”,
“include_colab_link”: true
},
“kernelspec”: {
“name”: “python3”,
“display_name”: “Python 3”
},
“language_info”: {
“name”: “python”
},
“accelerator”: “GPU”
},
“cells”: [
{
“cell_type”: “markdown”,
“metadata”: {
“id”: “view-in-github”,
“colab_type”: “text”
},
“source”: [
“<a href="https://colab.research.google.com/github/isaacandy/DVCR-i-AI-Artist/blob/main/create_realistic_ai_generated_images_with_dvc_ri.ipynb" target="parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>”
]
},
{
“cell_type”: “markdown”,
“metadata”: {
“id”: “clJsMT0Eqizk”
},
“source”: [
“# Create Text to Image AI-Generated Images With DALL-E combining VQGAN + CLIP\n”,
“\n”,
“by Isaac Andy. \n”,
“\n”,
“This notebook allows you to create realistic AI generated images with as few clicks as possible for free! No coding or machine learning knowledge required!\n”,
“\n”,
“This notebook is forked with significant usability and technical optimizations from the original Colab notebook by @ak92501 which includes an implementation of VQGAN + CLIP w/ Pooling. The Notebook was originally made by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). The original BigGAN+CLIP method was by https://twitter.com/advadnoun. Added some explanations and modifications by Eleiber#8347, pooling trick by Crimeacs#8222 (https://twitter.com/EarthML1). For more elaborate customization, see the original notebook or Zoetrope 5 by @classpectanon.\n”,
“\n”,
“To get started:\n”,
“\n”,
“1. Copy this notebook to your Google Drive to keep it and save your changes. (File -> Save a Copy in Drive)\n”,
“2. Run the cells below by clicking the Play button on the left of the cell (also visible when mousing-over the cell)\n”,
“\n”,
“_Last Updated: Aug 22th 2021\n”
]
},
{
“cell_type”: “markdown”,
“metadata”: {
“id”: “CppIQlPhhwhs”
},
“source”: [
“## Setup”
]
},
{
“cell_type”: “code”,
“metadata”: {
“id”: “TkUfzT60ZZ9q”,
“cellView”: “form”
},
“source”: [
“#@title Check GPU\n”,
“#@markdown Run this cell to see what GPU the Colab Notebook is running. Ideally, it’s not a K80 which is the slowest one.\n”,
“\n”,
“!nvidia-smi”
],
“execution_count”: null,
“outputs”: []
},
{
“cell_type”: “code”,
“metadata”: {
“id”: “VA1PHoJrRiK9”,
“cellView”: “form”
},
“source”: [
“#@title Download Models and Install/Load Packages (may take a few minutes)\n”,
“\n”,
“!git clone https://github.com/openai/CLIP\n”,
“!git clone https://github.com/CompVis/taming-transformers.git\n”,
“!git clone https://github.com/minimaxir/icon-image.git\n”,
“!pip install Pillow numpy fire icon_font_to_png\n”,
“!pip install ftfy regex tqdm omegaconf pytorch-lightning\n”,
“!pip install kornia\n”,
“!pip install imageio-ffmpeg \n”,
“!pip install einops\n”,
“!pip install imagio \n”,
“!mkdir steps\n”,
“\n”,
“print("Downloading ImageNet 16384")\n”,
“\n”,
“!curl -L -o vqgan_imagenet_f1616384.ckpt -C - ‘https://heibox.uni-heidelberg.de/f/867b05fc8c4841768640/?dl=1’\n”,
“!curl -L -o vqgan_imagenet_f16_16384.yaml -C - ‘https://heibox.uni-heidelberg.de/f/274fb24ed38341bfa753/?dl=1’\n”,
“\n”,
“import argparse\n”,
“import math\n”,
“from pathlib import Path\n”,
“import sys\n”,
“\n”,
“sys.path.insert(1, ‘/content/taming-transformers’)\n”,
“sys.path.insert(1, ‘/content/icon-image’)\n”,
“\n”,
“from icon_image import gen_icon\n”,
“from IPython import display\n”,
“from base64 import b64encode\n”,
“from omegaconf import OmegaConf\n”,
“from PIL import Image\n”,
“from PIL.PngImagePlugin import PngInfo\n”,
“from taming.models import cond_transformer, vqgan\n”,
“import taming.modules \n”,
“import torch\n”,
“from torch import nn, optim\n”,
“from torch.nn import functional as F\n”,
“from torchvision import transforms\n”,
“from torchvision.transforms import functional as TF\n”,
“from torch.optim.lr_scheduler import StepLR\n”,
“from tqdm.notebook import tqdm\n”,
“from shutil import move\n”,
“import os\n”,
“\n”,
“from CLIP import clip\n”,
“import kornia.augmentation as K\n”,
“import numpy as np\n”,
“import imageio\n”,
“from PIL import ImageFile, Image\n”,
“ImageFile.LOAD_TRUNCATED_IMAGES = True\n”,
“\n”,
“def sinc(x):\n”,
“ return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))\n”,
“\n”,
“\n”,
“def lanczos(x, a):\n”,
“ cond = torch.logical_and(-a < x, x < a)\n”,
“ out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))\n”,
“ return out / out.sum()\n”,
“\n”,
“\n”,
“def ramp(ratio, width):\n”,
“ n = math.ceil(width / ratio + 1)\n”,
“ out = torch.empty([n])\n”,
“ cur = 0\n”,
“ for i in range(out.shape[0]):\n”,
“ out[i] = cur\n”,
“ cur += ratio\n”,
“ return torch.cat([-out[1:].flip([0]), out])[1:-1]\n”,
“\n”,
“\n”,
“def resample(input, size, align_corners=True):\n”,
“ n, c, h, w = input.shape\n”,
“ dh, dw = size\n”,
“\n”,
“ input = input.view([n * c, 1, h, w])\n”,
“\n”,
“ if dh < h:\n”,
“ kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)\n”,
“ pad_h = (kernel_h.shape[0] - 1) // 2\n”,
“ input = F.pad(input, (0, 0, pad_h, pad_h), ‘reflect’)\n”,
“ input = F.conv2d(input, kernel_h[None, None, :, None])\n”,
“\n”,
“ if dw < w:\n”,
“ kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)\n”,
“ pad_w = (kernel_w.shape[0] - 1) // 2\n”,
“ input = F.pad(input, (pad_w, pad_w, 0, 0), ‘reflect’)\n”,
“ input = F.conv2d(input, kernel_w[None, None, None, :])\n”,
“\n”,
“ input = input.view([n, c, h, w])\n”,
“ return F.interpolate(input, size, mode=’bicubic’, align_corners=align_corners)\n”,
“\n”,
“\n”,
“class ReplaceGrad(torch.autograd.Function):\n”,
“ @staticmethod\n”,
“ def forward(ctx, x_forward, x_backward):\n”,
“ ctx.shape = x_backward.shape\n”,
“ return x_forward\n”,
“\n”,
“ @staticmethod\n”,
“ def backward(ctx, grad_in):\n”,
“ return None, grad_in.sum_to_size(ctx.shape)\n”,
“\n”,
“\n”,
“replace_grad = ReplaceGrad.apply\n”,
“\n”,
“\n”,
“class ClampWithGrad(torch.autograd.Function):\n”,
“ @staticmethod\n”,
“ def forward(ctx, input, min, max):\n”,
“ ctx.min = min\n”,
“ ctx.max = max\n”,
“ ctx.save_for_backward(input)\n”,
“ return input.clamp(min, max)\n”,
“\n”,
“ @staticmethod\n”,
“ def backward(ctx, grad_in):\n”,
“ input, = ctx.saved_tensors\n”,
“ return grad_in * (grad_in * (input - input.clamp(ctx.min, ctx.max)) >= 0), None, None\n”,
“\n”,
“\n”,
“clamp_with_grad = ClampWithGrad.apply\n”,
“\n”,
“\n”,
“def vector_quantize(x, codebook):\n”,
“ d = x.pow(2).sum(dim=-1, keepdim=True) + codebook.pow(2).sum(dim=1) - 2 * x @ codebook.T\n”,
“ indices = d.argmin(-1)\n”,
“ x_q = F.one_hot(indices, codebook.shape[0]).to(d.dtype) @ codebook\n”,
“ return replace_grad(x_q, x)\n”,
“\n”,
“\n”,
“class Prompt(nn.Module):\n”,
“ def init(self, embed, weight=1., stop=float(‘-inf’)):\n”,
“ super().init()\n”,
“ self.register_buffer(‘embed’, embed)\n”,
“ self.register_buffer(‘weight’, torch.as_tensor(weight))\n”,
“ self.register_buffer(‘stop’, torch.as_tensor(stop))\n”,
“\n”,
“ def forward(self, input):\n”,
“ input_normed = F.normalize(input.unsqueeze(1), dim=2)\n”,
“ embed_normed = F.normalize(self.embed.unsqueeze(0), dim=2)\n”,
“ dists = input_normed.sub(embed_normed).norm(dim=2).div(2).arcsin().pow(2).mul(2)\n”,
“ dists = dists * self.weight.sign()\n”,
“ return self.weight.abs() * replace_grad(dists, torch.maximum(dists, self.stop)).mean()\n”,
“\n”,
“\n”,
“def parse_prompt(prompt):\n”,
“ vals = prompt.rsplit(‘:’, 2)\n”,
“ vals = vals + [’’, ‘1’, ‘-inf’][len(vals):]\n”,
“ return vals[0], float(vals[1]), float(vals[2])\n”,
“\n”,
“\n”,
“class MakeCutouts(nn.Module):\n”,
“ def init(self, cut_size, cutn, cut_pow=1.):\n”,
“ super().init()\n”,
“ self.cut_size = cut_size\n”,
“ self.cutn = cutn\n”,
“ self.cut_pow = cut_pow\n”,
“\n”,
“ self.augs = nn.Sequential(\n”,
“ # K.RandomHorizontalFlip(p=0.5),\n”,
“ # K.RandomVerticalFlip(p=0.5),\n”,
“ # K.RandomSolarize(0.01, 0.01, p=0.7),\n”,
“ # K.RandomSharpness(0.3,p=0.4),\n”,
“ # K.RandomResizedCrop(size=(self.cut_size,self.cut_size), scale=(0.1,1), ratio=(0.75,1.333), cropping_mode=’resample’, p=0.5),\n”,
“ # K.RandomCrop(size=(self.cut_size,self.cut_size), p=0.5),\n”,
“ K.RandomAffine(degrees=15, translate=0.1, p=0.7, padding_mode=’border’),\n”,
“ K.RandomPerspective(0.7,p=0.7),\n”,
“ K.ColorJitter(hue=0.1, saturation=0.1, p=0.7),\n”,
“ K.RandomErasing((.1, .4), (.3, 1/.3), same_on_batch=True, p=0.7),\n”,
“ \n”,
“)\n”,
“ self.noise_fac = 0.1\n”,
“ self.av_pool = nn.AdaptiveAvgPool2d((self.cut_size, self.cut_size))\n”,
“ self.max_pool = nn.AdaptiveMaxPool2d((self.cut_size, self.cut_size))\n”,
“\n”,
“ def forward(self, input):\n”,
“ sideY, sideX = input.shape[2:4]\n”,
“ max_size = min(sideX, sideY)\n”,
“ min_size = min(sideX, sideY, self.cut_size)\n”,
“ cutouts = []\n”,
“ \n”,
“ for _ in range(self.cutn):\n”,
“\n”,
“ # size = int(torch.rand([])self.cut_pow * (max_size - min_size) + min_size)\n”,
“ # offsetx = torch.randint(0, sideX - size + 1, ())\n”,
“ # offsety = torch.randint(0, sideY - size + 1, ())\n”,
“ # cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n”,
“ # cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))\n”,
“\n”,
“ # cutout = transforms.Resize(size=(self.cut_size, self.cut_size))(input)\n”,
“ \n”,
“ cutout = (self.av_pool(input) + self.max_pool(input))/2\n”,
“ cutouts.append(cutout)\n”,
“ batch = self.augs(torch.cat(cutouts, dim=0))\n”,
“ if self.noise_fac:\n”,
“ facs = batch.new_empty([self.cutn, 1, 1, 1]).uniform_(0, self.noise_fac)\n”,
“ batch = batch + facs * torch.randn_like(batch)\n”,
“ return batch\n”,
“\n”,
“\n”,
“def load_vqgan_model(config_path, checkpoint_path):\n”,
“ config = OmegaConf.load(config_path)\n”,
“ if config.model.target == ‘taming.models.vqgan.VQModel’:\n”,
“ model = vqgan.VQModel(config.model.params)\n”,
“ model.eval().requires_grad(False)\n”,
“ model.init_from_ckpt(checkpoint_path)\n”,
“ elif config.model.target == ‘taming.models.vqgan.GumbelVQ’:\n”,
“ model = vqgan.GumbelVQ(config.model.params)\n”,
“ model.eval().requires_grad_(False)\n”,
“ model.init_from_ckpt(checkpoint_path)\n”,
“ elif config.model.target == ‘taming.models.cond_transformer.Net2NetTransformer’:\n”,
“ parent_model = cond_transformer.Net2NetTransformer(config.model.params)\n”,
“ parent_model.eval().requires_grad_(False)\n”,
“ parent_model.init_from_ckpt(checkpoint_path)\n”,
“ model = parent_model.first_stage_model\n”,
“ else:\n”,
“ raise ValueError(f’unknown model type: {config.model.target}’)\n”,
“ del model.loss\n”,
“ return model\n”,
“\n”,
“\n”,
“def resize_image(image, out_size):\n”,
“ ratio = image.size[0] / image.size[1]\n”,
“ area = min(image.size[0] * image.size[1], out_size[0] * out_size[1])\n”,
“ size = round((area * ratio)0.5), round((area / ratio)0.5)\n”,
“ return image.resize(size, Image.LANCZOS)\n”,
“\n”
],
“execution_count”: null,
“outputs”: []
},
{
“cell_type”: “markdown”,
“metadata”: {
“id”: “1tthw0YaispD”
},
“source”: [
“## Icon Background (Optional)\n”,
“\n”,
“A surprisingly effective trick to improve the generation quality of images if you have a specific outcome in mind to generate an icon to serve an initial image to start generation and/or an image to target during generation. You can select any of the free Font Awesome icons to use. Just click on an icon you want to get the icon_name such as fas fa-robot, then use that with next cell will generate an icon image to help steer the AI image generation.\n”,
“\n”,
“See this GitHub repository for more information on configuration.\n”,
“\n”
]
},
{
“cell_type”: “code”,
“metadata”: {
“id”: “qxrUUDzpshPn”,
“cellView”: “form”
},
“source”: [
“icon_name = "fas fa-tv" #@param {type:"string"}\n”,
“bg_width = 600 #@param {type:"integer"}\n”,
“bg_height = 600 #@param {type:"integer"}\n”,
“icon_size = 500 #@param {type:"integer"}\n”,
“icon_color = "black" #@param {type:"string"}\n”,
“bg_color = "white" #@param {type:"string"}\n”,
“icon_opacity = 0.8 #@param {type:"slider", min:0, max:1, step:0.1}\n”,
“bg_noise_opacity = 0.5 #@param {type:"slider", min:0, max:1, step:0.1}\n”,
“align = "center" #@param ["center", "left", "right", "top", "bottom"]\n”,
“\n”,
“icon_config = {\n”,
“ "icon_name": icon_name,\n”,
“ "bg_width": bg_width,\n”,
“ "bg_height": bg_height,\n”,
“ "icon_size": icon_size,\n”,
“ "icon_color": icon_color,\n”,
“ "bg_color": bg_color,\n”,
“ "icon_opacity": icon_opacity,\n”,
“ "bg_noise_opacity": bg_noise_opacity,\n”,
“ "align": align,\n”,
“ "seed": 42\n”,
“}\n”,
“\n”,
“try:\n”,
“ for filename in [‘fa-brands-400.ttf’, ‘fa-regular-400.ttf’, ‘fa-solid-900.ttf’, ‘fontawesome.min.css’]:\n”,
“ move(os.path.join("/content", ‘icon-image’, filename), os.path.join("/content", filename))\n”,
“except FileNotFoundError:\n”,
“ pass\n”,
“\n”,
“gen_icon(*icon_config)\n”,
“display.display(display.Image(‘icon.png’))”
],
“execution_count”: null,
“outputs”: []
},
{
“cell_type”: “markdown”,
“metadata”: {
“id”: “p0qN8T1EzPn7”
},
“source”: [
“## AI Image Generation Settings\n”,
“\n”,
“The following cell allows you to set the training parameters for image generation:\n”,
“\n”,
“### Generation Settings\n”,
“\n”,
“- texts: The text prompt(s) you want the AI to generate an image from.\n”,
“ - You can include multiple prompts by separating them with a |, and the AI will attempt to optimize for all prompts simultaneously, e.g. apple | painting of a calm sunset\n”,
“ - You can apply a weight to each prompt by appending a :{weight} to each prompt, and the AI will attempt to favor prompts with a higher weight proportionally more, e.g. apple:3 | painting of a calm sunset\n”,
“ - You can apply a negative weight to get the *opposite of what the text is, which can result in chaos. (in the case of a portrait of Elon Musk:3 | 3d rendering in unreal engine:-1, what is the opposite of a 3d rendering? Only one way to find out!)\n”,
“\n”,
“- width, height: Width and height of the image in pixels. Smaller images generate faster but are less detailed.\n”,
“ - Going too high above the default 600x600px size may result in the GPU going out-of-memory.\n”,
“ - For 4:3 images, I recommend 640x480; for 16:9 images, I recommend 640x360.\n”,
“\n”,
“- init_image: The initial image filename for starting the generation and finetuning. You can upload an image by opening the Colab Notebook sidebar, clicking the Folder icon, and uploading an image to the top level.\n”,
“ - If not specified, generation will start with a solid color.\n”,
“ - The image will be resized to the specified width/height.\n”,
“ - init_image_icon will use the icon specified in the previous cell as the init_image.\n”,
“\n”,
“- target_images: The target image filename(s) for the generation to target. \n”,
“ - You can use multiple images as noted in the texts section. It’s strongly recommended to tweak weights of both text prompts and image prompts if doing so.\n”,
“ - target_image_icon will use the icon specified in the previous cell as the target_image.\n”,
“\n”,
“### Training Settings\n”,
“\n”,
“- learning_rate: Learning rate for the model which controls the speed in which the model optimizes for the prompts. If too high, model can diverge; if too low, model may not train.\n”,
“ - ~0.2 is recommend if training without an init_image; ~0.1 is recommended if using one.\n”,
“\n”,
“- max_steps: Number of steps for training the model; the more steps, the better the generation.\n”,
“\n”,
“- images_interval: Number of steps for the training to check in and output an image of what is trained so far.\n”
]
},
{
“cell_type”: “code”,
“metadata”: {
“id”: “Pf8a78a2WKoU”,
“cellView”: “form”
},
“source”: [
“# Fixed parameters\n”,
“icon_path = "icon.png"\n”,
“model_name = "vqgan_imagenet_f16_16384"\n”,
“seed = 42\n”,
“\n”,
“texts = "iZND powered by GPT-3 technology logo" #@param {type:"string"}\n”,
“width = 600 #@param {type:"integer"}\n”,
“height = 600 #@param {type:"integer"}\n”,
“init_image = "" #@param {type:"string"}\n”,
“init_image_icon = False #@param {type:"boolean"}\n”,
“if init_image_icon:\n”,
“ assert os.path.exists(icon_path), "No icon has been generated from the previous cell"\n”,
“ init_image = icon_path\n”,
“\n”,
“target_images = "" #@param {type:"string"}\n”,
“target_image_icon = False #@param {type:"boolean"}\n”,
“if target_image_icon:\n”,
“ assert os.path.exists(icon_path), "No icon has been generated from the previous cell"\n”,
“ target_images = icon_path\n”,
“\n”,
“#@markdown —\n”,
“learning_rate = 0.24 #@param {type:"slider", min:0.00, max:0.30, step:0.01}\n”,
“max_steps = 400#@param {type:"integer"}\n”,
“images_interval = 100#@param {type:"integer"}\n”,
“\n”,
“gen_config = {\n”,
“ "texts": texts,\n”,
“ "width": width,\n”,
“ "height": height,\n”,
“ "init_image": "