Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Animacion 3D

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

MATHs in Deforum v05

This guide is an introduction to some of the parameters within deforum that can be
controlled and altered using math expressions and functions. It aims to assist intermediate
and advanced users of animation scheduling, by providing examples and descriptions of
use cases. MATH functions are not mandatory in the functionality of deforum’s
parameters, and only serve as a dynamic tool to better enhance manipulation of values
during an animation.

The specific tool documentation that has been added to Deforum V05 can be found here:
NumExpr 2.0 User Guide — numexpr 2.6.3.dev0 documentation

1.)
Parameters that can be altered using MATH:
In deforum, any parameter that accepts a string format of instructions (type = `string`) can
be altered using a math expression, a schedule, or a combination of both. These
parameters are typically denoted with 0:(0) where the preceding number is the frame, and
the parentheses number is the value to be enforced during the designated frame. In the
example of 0:(0), the render will reference frame0 and assign 0.0 as its value indefinitely
unless instructed otherwise.
Parameters that are controlled by strings are as follows: angle, zoom,
translations_xyz, rotations_3D_xyz, perspective_flips_theta,phi,gamma,fv , noise_schdule,
strength_schedule, and contrast_schdule.
Scheduled values will “tween” linearly between two instructional elements in a
string. In the example of 0:(-2), 100:(4) The render will start at frame0 with a value of -2
and rise up over time, increasing its value to 4 by the time it reaches frame100. During
frame50 of that render, we would observe a value of 1.0 being enforced, since the
midpoint between frame0 and frame100 falls on the line drawn between the two values at
1.0
When using math expressions however, the “tweening” follows an approximation of
values within elements of the string in such a way that a curve is drawn between values.
Consider the following example: 0:(sin(t)), 100:(4) The function at frame 0 in this case is a
sine wave, where “t” represents the frame number. The value at frame 0 will start to
calculate the sin(t) to produce its initial value, and quickly fluctuate causing peaks and
valleys, while it slowly climbs to a constant value of 4 by frame100. We can observe an
effect of the sine wave starting at full strength, and finally losing all amplitude 100 frames
later - a “ripple” effect.
If a math expression is used as the sole element of a string, it will indefinitely
calculate and produce its value for as long as it is defined, without interruption. If at any
point, a parameter falls out of the range of acceptable values, the render will adhere to the
next available calculation of that function. (ex. A value approaches infinity, asymptotic or
undefined) This can sometimes be a desired effect if a “pulsing” or “sawtooth” function is to
be achieved.

2.)
How MATH expressions affect the animation:
Many combinations and complex functions can be expressed during an animation
schedule to achieve patterns and motion that would otherwise take extremely long strings
of manual information to achieve. Consider a sine function, where previously, we would
have to enter in each frame’s respective value to simulate a waving pattern. The longer
our animation, the more frame instructions we’d have to manually enter. Now, with MATH
functions, we can populate a never-ending list of instructions simply contained in one
expression. The method that we use is to reference the variable “t”. When we use that
variable in our math statements, a calculation is performed such that “t” = the current
frame number. Since the frame number steadily increases in increments of +1, we can
now define an “x axis”. With that aspect in place, we can use “t” to alter the value across
the “y axis” in sequence. A frames (time) progresses forward, the MATHs performed on “t”
will allow us to control what values are to be enforced at that exact snapshot in time. In the
default notebook of deforumV05, the “translation_x” schedule is defined as:
0:(10*sin(2*3.14*t/10)) We can see “t” along with a sine wave (sin) being performed. This
will cause the image to translate left and right over time. We will examine in more detail
how this function works.

3.)
Anatomy of a simple MATH expression:
We saw the expression 0:(10*sin(2*3.14*t/10)) being used in the default notebook of
deforumV05. Let’s observe how it is “driving” our parameter. When we use the most
simple of math expressions 0:(t) we define the value at any frame to be equal to its frame
number. However, this value will soon rise off into unusable values above any
recommended range within the animation parameters. At frame0, we start at 0, by frame1,
we’re at 1, and by frame 200, we’re at 200 - so on and so forth. So a method of
“containing” this value must be expressed somehow, as to prevent the number from flying
off into infinity. The 2 best methods are sine/cosine functions as well as modulus functions
(more info on modulus later).
So, in our example, we can see a “sin( )” being used. If we were to take the sine of
our frame number, or “sin(t)”, we’d generate a wave shape. The value would swing up and
down quickly as each frame was calculated.
While this does keep our value from ever increasing - it is not enough to control our
parameter in a realistic way. A simple sine wave is too fast, shallow and rapid. So our
example includes more expressions being performed. We see that a familiar value 3.14 is
multiplied by “t”. This causes the period of our sine wave to fall on integers (approximately)
at its wavelength. More specifically, this wavelength is 2. So our example goes further to
multiply that variable by 2 also. When we take the sin(2*3.14*t) , we yield a wave that has
a period of 1 and an amplitude of 1 (it peaks and valleys between -1 and 1). All that is left
is to add math that will control how high the value should bounce(amplitude), and how
often(frequency). So our example finally multiplies the whole expression by 10, and also
divides “t” by 10. This results in a wave that will alternate between +10 and -10 and repeat
every 10 frames. → cont on next page

->cont.

But what if we wanted even MORE control. We notice our example suffers the
property of always passing through 0 as its baseline - but what if we wanted the baseline
to start at -3? We just need to take the whole expression, and subtract 3 from it, and our
new baseline is established. 0:(10*sin(2*3.14*t/10)-3) Now our wave bounces between 7
and -13, keeping its amplitude and frequency intact. More functionality can be added as
we build our expression, including exponents, cosine properties, and negative amplitudes.

A recap of our examples anatomy: 0:(10*sin(2*3.14*t/10))


0: = the current frame instruction
10 = the amplitude or “height”
t = frame count
10 = the frequency or wavelength

4.)
Advanced Expression to drive your parameters
When constructing a complex schedule of effects during your animation, more control and
special techniques will yield a better dynamic result. Let’s examine a specific use case.
The artist wants to use a constant value of 0.8 as their strength schedule. However, they
wish they could have more detail appear in their animation. A value of 0.45 is great for
adding new enriched content to a scene, but it causes very little coherency. The artist
decides that they should only introduce the value of 0.45 periodically about every 25
frames, yet keep it 0.8 for most of the sequence. How should the artist express this using
MATHs?
Let’s observe the following solution, then discuss.
0:(-0.35*(cos(3.141*t/25)**100)+0.8)
A massively powerful function, with a simple elegance to it. Our artist uses this function in
their example to achieve the desired result. At frame 0 and all frames after, a value is
being calculated. In this expression we’re selecting a cosine function (cos) to allow our
wave to have small periodic dips instead of peaks. The double asterisk acts as an
exponent function and brings the cosine to the 100th power, tightening the dips into small
indents along the timeline. The addition of +0.8 sets the baseline at 0.8 which the artist
agreed was desirable for the animation, and starts the function with -0.35 knowing that it
will dip below the established baseline from 0.8 down to 0.45 as expected. An
approximation of pi is being used again (3.141) to align the frames to integers, and t is
being divided by 25 to enforce the dip to occur only at frames that are multiples of 25. Our
artist has achieved the schedule using one expression that will be calculated for the
duration of the animation frames.

Remember that expressions can be changed along the schedule to “tween” along the
frames.
0:(10*sin(2*3.14*t/10)), 50:(20*sin(4*3.14*t/40)), 100:(cos(t/10)) is an acceptable format.
Another useful tool is the modulus function. Represented by “%” is typically used to
calculate the remainder of a function. In deforum, we use modulus to affect “t” frame count
as a repeating limiter. Consider the following syntax:
translate_3D_z: 0:(0.375*(t%5)+15)
If “t” is the frame count, it would increase indefinitely, however in our example, we’ve set
the modulus to 5. This means as the frame rises (01,2,3,4,5,6,7,8… etc) the value of “t”
will repeat a sequence of 0,1,2,3,4,5,0,1,2,3,4,5,0,1… etc, without ever increasing over 5.
This graphically produces a sawtooth wave. In order to bend the “blades” of the sawtooth
to stretch over time, we multiply by 0.375. This acts as the slope of each line. A multiplier
of 1 would yield a 45° line. Higher multipliers will increase the frequency even further, while
numbers closer to 0 will lay the line near flat. Since we’re controlling the Z translate in 3D
mode, we want our baseline to be at 15, hence our addition of it at the end of the syntax.
The overall effect of this parameter causes our animation to consistently zoom forward, yet
with pulses, similar to the perspective of nodding your head to music while riding in a car.

Many more clever approaches can be used to create elaborate functions and animations,
as well as just simplifying the instruction of long frame counts. There are tools that exist
such as graphing calculators to better help envision what a function would look like
linearly. This can be simulated by the format of y = x instead of deforum’s 0: (t) where “y” is
the frame, and “x” is t
This calculator can be used to solve similar functions, yet some syntax may vary.
Desmos | Graphing Calculator

We encourage the users to share their experiences with formulas and expressions, since
there will be endless discoveries with how MATHs can work in unique applications.

DEFORUM STABLE DIFFUSION


WORK IN PROGRESS.. Always changing, updating, and fixing….

***Official Notebook Link

CURRENT DEFORUM UPDATE STATUS


Deforum Automatic 1111 Extension is recommended (link)

WHAT IS IT?
Deforum Stable Diffusion (Design Forum) builds upon Stability
AI’s Stable Diffusion Model and add’s a lot of additional functionality
not seen in the default notebook by Stability. Since Stability AI (blog
post) has released this model for free and commercial usages a lot
of amazing new notebooks have come out that push this technology
further.
Deforum Stable Diffusion (DSD) as of this writing has additional
features such as animation in the form of 2D and 3D, Video Init, and
a few other masking options.

It’s magic. And also, free. (!)

Burning Man Steampunk

The image above was created with DSD using just the text prompt:

“ultra detailed portrait Burning Man festival in the Black Rock Desert,
steampunk burning man artwork, night time with fractal clouds,
volumetric lighting, cinematic portrait”

Here is a bit more info on whats going on..

“The model itself builds upon the work of the team at CompVis and
Runway in their widely used latent diffusion model combined with
insights from the conditional diffusion models by our lead
generative AI developer Katherine Crowson, Dall-E 2 by Open
AI, Imagen by Google Brain and many others. We are delighted that
AI media generation is a cooperative field and hope it can continue
this way to bring the gift of creativity to all.

The core dataset was trained on LAION-Aesthetics, a soon to be


released subset of LAION 5B. LAION-Aesthetics was created with a
new CLIP-based model that filtered LAION-5B based on how
“beautiful” an image was, building on ratings from the alpha testers
of Stable Diffusion. LAION-Aesthetics will be released with other
subsets in the coming days on https://laion.ai.” – Source Stability.ai

YGGDRASIL TREE OF LIFE KISS CUT STICKER –


NORSE MYTHOLOGY SYMBOL OF CONNECTION
AND GROWTH
$9.57 – $9.68 Select options

NORSE TREE OF LIFE T-SHIRT – SYMBOL OF


CONNECTION AND GROWTH IN VIKING
MYTHOLOGY
$22.61 – $29.48 Select options

ALL SEEING EYE UNISEX T-SHIRT


$23.44 – $25.70 Select options

SPIRIT WOLF UNISEX T-SHIRT


$24.44 – $26.70 Select options
This quick user guide is intended as a LITE reference for different
aspects and items found within the Deforum notebook.

It is intended for version 0.4, which was released 9/11/2022

Most documentation has been updated to reflect changes in


version 0.5 released on October 1st, 2022.

IMPORTANT

This document is created from additional resources and should be


used as a reference only. Some things may get outdated or things
might change and we will make our best effort to keep things update
and add new things as they come.
****Please note this document also shares a lot of information from
the Disco Diffusion Document that I was kindly allowed to use and
reuse for this document by Chris Allen (@zippy731 on twitter)

GETTING STARTED
Launch the Google Colab Notebook here.
Deforum Stable Diffusion (DSD) (currently version 0.4) is
intimidating and inscrutable at first. Just take it in small steps and
you’ll make progress.

BEFORE YOU START

This guide assumes you understand the basics of accessing and


running a notebook using Google’s Colab service. If you don’t, please
check the appendix for some recommended resources to get that
understanding.

DSD does not come with the stable diffusion model ready to
download and you will have to do this process manually.

DOWNLOADING STABLE DIFFUSION MODEL

You will need to create an account on HugginFace first and then


after that you can download the model.

1. Go here to download the 1.4 model (current version at the time


of writing)
2. Click Accept
3. On the next page you will see a link to download sd-v1-4.ckpt
4. Download that to your local hardrive.
Next steps are to upload this model to your Google drive
folder. Since you are running Google Colabs I’m going to assume
you know you have a Google Drive for file storage.

1. Go to drive.google.com when your logged into your Google


Account
2. Go into your AI/models folder.
1. If you do not have an AI folder create one
2. Inside the AI folder create a models folder
3. These is case sensitive
3. Drag the sd-v1-4.ckpt to this folder and wait for it to upload to
your Google Drive.
This should complete the first step of getting ready to fully run the
DSD Notebook.

**This may change from time to time so please look at the notebook
on how to properly get the model.

CONFIRM COLAB IS WORKING

When you launch the DSD notebook in Colab, it’s already set up with
defaults that will generate a lighthouse image like the one above.
Before changing any of the settings, you should just run
all (Runtime\Run all) to confirm everything’s working. Colab will
prompt you to authorize connecting to your google drive, and you
should approve this for DSD to work properly.

Afterward, DSD will spend a few minutes setting up the environment,


and will eventually display a diffusion image being generated at the
very bottom of the notebook. Once you’ve confirmed that all of this
is working, you can interrupt the program (Runtime\Interrupt
Execution) whenever you like.

QUICK START

Using Default Settings

After the initial setup, you can start creating your own
images! There are many options, but if you want to just type
phrases and use the default settings to generate images:
Deforum Stable Diffusion Prompts
• Initialize the DSD environment with run all, as described just
above. Interrupt the execution.
• Scroll to the bottom of the notebook to the Prompts section
near the very bottom of the notebook. Take careful note of the
syntax of the example that’s already there. Replace the
sentences with your own text prompt.
• Remember prompts are for single images and you can do one
after the other and animation_prompts are for animation mode
which we will get into later.
• Click the run button for the prompts cell. This will update the
text prompt for the next run.
• Click the run button next to ‘Do the Run!’
• Watch the magic happen and it try to build images based upon
your prompts.
New to v0.5 are prompt weights and the ability to use a prompt from
a model checkpoint. These are the examples taken from the
notebook.

• “a nousr robot, trending on Artstation”


o use “nousr robot” with the robot diffusion model (see
model_checkpoint setting)
• “touhou 1girl komeiji_koishi portrait, green hair”
o waifu diffusion prompts can use danbooru tag
groups (see model_checkpoint in the notebook)
• “this prompt has weights if prompt weighting enabled:2 can
also do negative:-2”
o (see prompt_weighting in the run section below)
TEXT PROMPTS
Prompt Engineering

This is the main event. Typing in words and getting back

pictures. It’s why we’re all here, right?

In DSD, prompts are set at the very bottom of the notebook. Prompts
can be a few words, a long sentence, or a few sentences. Writing
prompts is an art in and of itself that won’t be covered here, but the
DSD prompts section has some examples including the formatting
required.

THOUGHT PROCESS

Phrase, sentence, or string of words and phrases describing what


the image should look like. The words will be analyzed by the AI and
will guide the diffusion process toward the image(s) you describe.
These can include commas and weights to adjust the relative
importance of each element. E.g. “ultra detailed portrai of a wise
blonde beautiful female elven warrior goddess, dark, piercing glowing
eyes, gentle expression, elegant clothing, photorealistic, steampunk
inspired, highly detailed, artstation, smooth, sharp focus, art by
michael whelan, artgerm, greg rutkowski and alphonse mucha .”
Elven Steampunk
Princess
Notice that this prompt loosely follows a structure: [subject],
[prepositional details], [setting], [meta modifiers and artist]; this is a
good starting point for your experiments.

Developing text prompts takes practice and experience, and is not


the subject of this guide. If you are a beginner to writing text
prompts, a good place to start is on a simple AI art app like Night
Cafe Studio, starry ai or WOMBO prior to using DSD, to get a feel for
how text gets translated into images by GAN tools. These other
apps use different technologies, but many of the same principles
apply.

ADDITIONAL PROMPT INFO

In the above example, we have two groupings of prompts: the still


frames *prompts* on top, and the animation_prompts below. During
“NONE” animation mode, the diffusion will look to the top group of
prompts to produce images. In all other modes, (2D, 3D etc) the
diffusion will reference the second lower group of prompts.

Careful attention to the syntax of these prompts is critical to be able


to run the diffusion.

For still frame image output, numbers are not to be placed in front of
the prompt, since no “schedule” is expected during a batch of
images.

The above prompts will produce and display a forest image and a
separate image of a woman, as the outputs.

During 2D//3D animation runs, the lower group with prompt


numbering will be referenced as specified. In the example above, we
start at frame 0: – an apple image is produced. As the frames
progress, it remains with an apple output until frame 20 occurs, at
which the diffusion will now be directed to start including a banana
as the main subject, eventually replacing the now no longer
referenced apple from previous.

Interpolation mode, however, will “tween” the prompts in such a way


that firstly, 1 image each is produced from the list of prompts. An
apple, banana, coconut, and a durian fruit will be drawn. Then the
diffusion begins to draw frames that should exist between the
prompts, making hybrids of apples and bananas – then proceeding
to fill in the gap between bananas and coconuts, finally resolving
and stopping on the last image of the durian, as its destination.
(remember that this exclusive mode ignores max_frames and draws
the interpolate_key_frame/x_frame schedule instead.

Many resources exist for the context of what a prompt should


include. It is up to YOU, the dreamer, to select items you feel belong
in your art. Currently, prompts weights are not implemented yet in
deforum, however following a template should yield fair results:

[Medium] [Subject] [Artist] [Details]


[Repository]

Ex. “A Sculpture of a Purple Fox by Alex Grey, with tiny ornaments,


popular on CGSociety”,

PROMPT ENGINEERING HAS GONE TOO FAR!


NOW THERE’S MATH IN IT! (AS OF V0.5)

A numerical prompt weight feature has been added to Deforum as a


selectable feature. When enabled, the run will interpret the values
and weights syntax of the prompt for better control and token
presence. The numerical values are applied to all words before the
colon, but parenthesis weights are coming soon. But there’s no
explicit ‘negative prompt’ feature… Instead, all weights less than
zero are added to the negative prompt automatically. Guess, what
does it allow for? And what do you think, weights values adhere to
MATH expressions for even more control!

Now with a master prompt like

eggs:`cos(6.28*t/10)`, bacon:`-cos(6.28*t/10)`
You can go back and forth with stuff in just one line of text!

Prompt weighting + MATH demo:

https://user-images.githubusercontent.com/14872007/193699307-
dc0994ff-fe9e-4e27-bc67-e6c190dcea0b.gif

SETUP
As stated before the notebook requires an Nvidia GPU to run and
also that you have downloaded the current stable diffusion model
and put it in the proper folder that your notebook references.

MODEL AND OUTPUT PATHS:

• models_path, looks in runtime for uploaded model


• output_path, directs images/file to a place in the runtime
Google Drive Path Variables (Optional):

• mount_google_drive, when selected will redirect paths to drive


instead of runtime
• models_path_gdrive , location of model on Google
Drive (default /content/drive/MyDrive/AI/models)
• output_path_gdrive, location of images/file to be output in
Google Drive
The notebook expects the following path variables to be defined:
models_path and output_path. These locations will be used to
access the Stable Diffusion .pth model weights and save the
diffusion output renders, respectively. There is the option to use
paths locally or on Google Drive. If you desire to use paths on
Google drive, mount_google_drive must be True. Mounting Gdrive
will prompt you to access your Drive, to read/write/save images.

SETUP ENVIRONMENT:

• setup_environment , when checked will build environment to


handle pip/installs
• print_subprocess, choose to show items being pulled and built
Running this cell will download github repositories, import python
libraries, and create the necessary folders and files to configure the
Stable Diffusion model. Sometimes there may be issues where the
Setup Environment cells do not load properly and you will encounter
errors when you start the run. Verify the Setup Environment cells
have been run without any errors.

PYTHON DEFINITIONS:

• pulls/pips/installs functions and definitions into built


environment for later use during a run
• defines variables from libraries and loads them to runtime
Running this cell will define the required functions to proceed with
making images. Verify the Python Definitions cell has been run
without any errors.

SELECT AND LOAD MODEL:

• model_config, type of instruction file: default .yaml, or custom


option
• model_checkpoint, the dataset to be matched to your
downloaded .ckpt file
o This is a dropdown and may include other models to test
out such as robo-diffusion-v1 and waifu-diffusion-v1-3
• custom_config_path, blank unless intending to use a custom
.yaml file
• custom_checkpoint_path, blank unless using a .cpkt file not
listed
• load_on_run_all , when checked will be an include cell for RUN
ALL function
• check_sha256 , will perform comparison against checksum
(check hash for file integrity)
• map_location, utilizes CUDA cores on GPU[default], or uses
CPU[slow] (not recommended)
In order to load the Stable Diffusion model, Colab needs to know
where to find the model_config file and the model_checkpoint. The
model_config file contains information about the model
architecture. The model_checkpoint contains model weights which
correspond to the model architecture. For troubleshooting verify that
both the config and weight path variables are correct. By default the
notebook expects the model config and weights to be located in the
model_path. You can provide custom model weights and config
paths by selecting “custom” in both the model_config and
model_checkpoint drop downs. Sometimes there are issues with
downloading the model weights and the file is corrupt. The
check_sha256 function will verify the integrity of the model weights
and let you know if they are okay to use. The map_location allows
the user to specify where to load model weights. For most colab
users, the default “GPU” map location is best.

ANIMATION
Until this point, all of the settings have been related to creating still
images. DSD also has several animation systems that allow you to
make an animated sequence of stable diffusion images. The frames
in the animation system are created using all of the same settings
described above, so practice making still images will help your
animated images as well.

There are 4 distinct animation systems: 2D, 3D, video and


interpolation. All of the animation modes take advantage of DSD’s
image init function, and use either the previously created frame
(2D/3D) or a frame from a separate video (video) This starting
image is injected into the diffusion process as an image init, then
the diffusion runs normally.

When using any of the animation modes, temporal coherence


between frames is an important consideration, so you will need to
balance between the strength of the image init, the strength of the
text prompt and other guidance, and the portion of the diffusion
curve you will use to modify the image init.

The animation system also has ‘keyframes’ available, so you can


change several values at various frames in the animation, and DSD
will change direction. You can even update the text prompt mid-
animation, and DSD will begin to morph the image toward the new
prompt, allowing for some great storytelling power!

ANIMATION_MODE

As of version 0.5 you can now use custom math functions.

Users may now use custom math expressions as well as typical


values as scheduling for parameters that allow strings, such as
zoom, angle, translation, rotation, strength_schedule, and noise”
Many wave functions can now be achieved with simple instructions
using “t” as a variable to express frame number. Please refer to the
link provided for more info about math functions.

NumExpr 2.0 User Guide — numexpr 2.6.3.dev0 documentation

Another MATH Guide

More Math Expression Details:

Users may now use custom math expressions as well as typical


values as scheduling, such as zoom, angle, translation,
rotation, strength_schedule, and noise. Many wave functions can
now be achieved with simple instructions using t as a variable to
express frame number. No more bothering with tables! Wherever
there’s math, there’s a cheat sheet!

It was also suggested that if you change your strength schedule to


also adjust your noise schedule. Sample starting points are:
st=0.9 | noise = 0.00
st=0.8 | noise = 0.01
st=0.7 | noise = 0.02
st=0.6 | noise = 0.03
st=0.5 | noise = 0.04

Amazing Math Expressions Doc from the team (link)

None, 2D, 3D or video animation options. Details in each section


below. We also have a much deeper breakdown into the animation
mode further below, so keep reading!

• None: When selected, will ignore all functions in animation


mode and will output batches of images coherently unrelated
to each other, as specified by the prompts list. The prompts
used will follow the non-scheduled, non-animation list. The
number of images that are to be produced is defined in a later
cell under “n_batches”.
• 2D animation When selected will ignore the “none mode”
prompts and refer to the prompts that are scheduled with a
frame number before them. 2D mode will attempt to string the
images produced in a sequence of coherent outputs. The
number of output images to be created is defined by
“max_frames”. The motion operators that control 2D mode are
as follows:
“Border, angle, zoom, translation_x, translation_y,
noise_schedule, contrast_schedule, color_coherence,
diffusion_cadence, and save depth maps”. Other animation
parameters have no effect during 2D mode.
Resume_from_timestring is available during 2D mode.
• 3D animation
When selected will ignore the “none mode” prompts and refer
to the prompts that are scheduled with a frame number before
them. 3D mode will attempt to string the images produced in a
sequence of coherent outputs. The number of output images
to be created is defined by “max_frames”. The motion
operators that control 3D mode are as follows:
“Border, translation_x, translation_y, rotation_3d_x,
rotation_3d_y, rotation_3d_z, noise_schedule,
contrast_schedule, color_coherence, diffusion_cadence, 3D
depth warping, midas_weight, fov, padding_mode,
sampling_mode, and save_depth_map.
Resume_from_timestring is available during 3D mode. (more
details below)

• Video Input When selected, will ignore all motion parameters


and attempt to reference a video loaded into the runtime,
specified by the video_init_path. Video Input mode will ignore
the “none mode” prompts and refer to the prompts that are
scheduled with a frame number before them. “Max_frames” is
ignored during video_input mode, and instead, follows the
number of frames pulled from the video’s length. The notebook
will populate images from the video into the selected drive as
a string of references to be impacted. The number of frames to
be pulled from the video is based on “extract_nth_frame”.
Default of 1 will extract every single frame of the video. A value
of 2 will skip every other frame. Values of 3 and higher will
effectively skip between those frames yielding a shorter batch
of images. Currently, video_input mode will ignore all other
coherence parameters, and only affect each frame uniquely.
Resume_from_timestring is NOT available with Video_Input
mode.
• Interpolation Mode When selected, will ignore all other motion
and coherence parameters, and attempt to blend output
frames between animation prompts listed with a schedule
frame number before them. If interpolate_key_frame mode is
checked, the number of output frames will follow your prompt
schedule. If unselected, the interpolation mode will follow an
even schedule of frames as specified by
“interpolate_x_frames”, regardless of prompt numbering. A
default value of 4 will yield four frames of interpolation
between prompts.

PERSPECTIVE 2D FLIPPING

This feature allows extra parameters during 2D mode to allow


a faux Roll, Tilt, Pan canvas function only found in 3D mode. Users
may use angle control to simulate a 2.5D effect, using only a 2D
canvas mode. It may be particularly helpful in local mode, when
you’re low on vram. See Post (this
post)[https://www.reddit.com/r/StableDiffusion/comments/xhnaaj/i
_added_2d_perspective_flipping_to_the_deforum/].

Perspective flip demo:

https://user-images.githubusercontent.com/14872007/190828601-
43dd60d0-2619-455b-8ba6-87840751b818.gif

• flip_2d_perspective, This feature allows extra parameters


during 2D mode to allow a “faux” Roll, Tilt, Pan canvas function
only found in 3D mode. Users may use “theta, phi & gamma”
angle control to simulate a 2.5D effect, using only a 2D canvas
mode.

ANIMATION SETTINGS

• animation_mode, selects type of animation (see above)


• max_frames, specifies the number of 2D or 3D images to
output
• border, controls handling method of pixels to be generated
when the image is smaller than the frame. “Wrap” pulls pixels
from the opposite edge of the image, while “Replicate” repeats
the edge of the pixels, and extends them. Animations with
quick motion may yield “lines” where this border function was
attempting to populate pixels into the empty space created.

MOTION PARAMATERS

Motion parameters are instructions to move the canvas in units per


frame

• angle, 2D operator to rotate canvas clockwise/counter


clockwise in degrees per frame
• zoom, 2D operator that scales the canvas size,
multiplicatively [static = 1.0]
• translation_x, 2D & 3D operator to move canvas left/right in
pixels per frame
• translation_y, 2D & 3D operator to move canvas up/down in
pixels per frame
• translation_z, 3D operator to move canvas towards/away from
view
o [speed set by FOV]
• rotation_x, 3D operator to tilt canvas up/down in degrees per
frame
• rotation_y, 3D operator to pan canvas left/right in degrees per
frame
• rotation_z, 3D operator to roll canvas clockwise/counter
clockwise
• noise_schedule, amount of graininess to add per frame for
diffusion diversity
• strength_schedule, amount of presence of previous frame to
influence next frame
o also controls steps in the following formula [steps –
(strength_schedule * steps)] (more details under: “steps”)
• contrast_schedule, adjusts the overall contrast per frame
o [default neutral at 1.0]

COHERENCE

• color_coherence, select between NONE, LAB, HSV, RGB


o LAB, Perceptual Lightness* A * B axis color balance.
▪ (search cielab for more info)
o HSV: Hue Saturation & Value color balance.
o RGB: Red Green & Blue color balance.
The color coherence will attempt to sample the overall pixel color
information, and trend those values analyzed in the 0th frame, to be
applied to future frames. LAB is a more linear approach to mimic
human perception of color space – a good default setting for most
users.

HSV is a good method for balancing presence of vibrant colors, but


may produce unrealistic results – (ie.blue apples) RGB is good for
enforcing unbiased amounts of color in each red, green and blue
channel – some images may yield colorized artifacts if sampling is
too low.
• diffusion_cadence, controls the frequency of frames to be
affected by diffusion [1-8]
The diffusion cadence will attempt to follow the 2D or 3D schedule
of movement as per specified in the motion parameters, while
enforcing diffusion on the frames specified. The default setting of 1
will cause every frame to receive diffusion in the sequence of image
outputs. A setting of 2 will only diffuse on every other frame, yet
motion will still be in effect. The output of images during the
cadence sequence will be automatically blended, additively and
saved to the specified drive.

This may improve the illusion of coherence in some workflows as


the content and context of an image will not change or diffuse
during frames that were skipped. Higher values of 4-8 cadence will
skip over a larger amount of frames and only diffuse the “Nth” frame
as set by the diffusion_cadence value. This may produce more
continuity in an animation, at the cost of little opportunity to add
more diffused content. In extreme examples, motion within a frame
will fail to produce diverse prompt context, and the space will be
filled with lines or approximations of content – resulting in
unexpected animation patterns and artifacts. Video Input &
Interpolation modes are not affected by diffusion_cadence.

Example:

• Cadence5 -> diffuse 1 frame, draw 4 non-diffused -> output 5


total
• Cadence1 -> diffuse 1 frame, draw 0 non-diffused -> output 1
total
• Cadance8 -> diffuse 1 frame, draw 7 non-diffused -> output 8
total
The cadence number will always equal the final number of outputs,
with the first of that group to be the diffused one.

3D DEPTH WARPING

• use_depth_warping, enables instructions to warp an image


dynamically in 3D mode only.
• midas_weight, sets a midpoint at which a depthmap is to be
drawn: range [-1 to +1]
• fov, adjusts the scale at which a canvas is moved in 3D by the
translation_z value
FOV (field of view/vision) in deforum, will give specific instructions
as to how the translation_z value affects the canvas. Range is -180
to +180. The value follows the inverse square law of a curve in such
a way that 0 FOV is undefined and will produce a blank image
output. A FOV of 180 will flatten and place the canvas plane in line
with the view, causing no motion in the Z direction. Negative values
of FOV will cause the translation_z instructions to invert, moving in
an opposite direction to the Z plane, while retaining other normal
functions.A value of 30 fov is default whereas a value of 100 would
cause transition in the Z direction to be more smooth and slow. Each
type of art and context will benefit differently from different FOV
values. (ex. “Still-life photo of an apple” will react differently than “A
large room with plants”)

FOV also lends instruction as to how a MiDaS depth map is


interpreted. The depth map (a greyscale image) will have its range of
pixel values stretched or compressed in accordance with the FOV in
such a fashion that the illusion of 3D is more pronounced at lower
FOV values, and more shallow at values closer to 180. At full FOV of
180, no depth is perceived, as the MiDaS depth map has been
compressed to a single value range.

• padding_mode, instructs the handling of pixels outside the


field of view as they come into the scene. ‘Border” will attempt
to use the edges of the canvas as the pixels to be drawn.
“Reflection” will attempt to approximate the image and
tile/repeat pixels, whereas “Zeros” will not add any new pixel
information.
• sampling_mode, choose from Bicubis, Bilinear or Nearest
modes.
In image processing, bicubic interpolation is often chosen over
bilinear or nearest-neighbor interpolation in image resampling, when
speed is not an issue. In contrast to bilinear interpolation, which only
takes 4 pixels (2×2) into account, bicubic interpolation considers 16
pixels (4×4). Images resampled with bicubic interpolation are
smoother and have fewer interpolation artifacts.

• save_depth_map, will output a greyscale depth map image


alongside the output images.

VIDEO INPUT SETTINGS


As noted above, video input animation mode takes individual frames
from a user-provided video clip (mp4) and uses those sequentially
as init_images to create diffusion images.

• video_init_path, Source path for the user-provided video to be


used as the source for image inputs for animation. To use a
video init, upload the video to the Colab instance or your
Google Drive, and enter the full source path. A typical path will
read /content/video_name.mp4.
• extract_nth_frame, (2|1-6) allows you to extract every nth
frame of video. If you have a 24fps video, but only want to
render 12 frames per second of DSD images,
set extract_nth_frame to 2.
• overwrite_extracted_frames, By default, a video file will extract
and save its frames to drive every run. This new option allows
the user to bypass this process for future runs, and skip right
ahead to render.
• use_mask_video, During Video Input mode, users may select to
also include an additional video to be used as a mask. Frames
will be extracted for both the video init, as well as the video
mask, and used in conjunction. This should be a black and
white video, currently alpha channel isn’t supported as a mask.
• video_mask_path, Source path for the user-provided video to
be used as the source for image inputs for animation. To use a
video init, upload the video to the Colab instance or your
Google Drive, and enter the full source path. A typical path will
read /content/video_mask_name.mp4.
More Details about Dynamic Video Masking
During Video Input mode, users may select to also include an
additional video to be used as a mask. Frames will be extracted for
both the video init, as well as the video mask, and used in
conjunction. Now you can be a fire-mage (or an anime girl, whatever
you like) without changing the rest of the environment!

Dynamic masking demo (sorry for the quality, had to compress it to


fit on Github. Visit the Discord server for the full version):

https://user-images.githubusercontent.com/14872007/193702526-
be62d3c8-de3d-4f0f-89fe-cd9db851e5f2.gif

The mask used:

https://user-images.githubusercontent.com/14872007/193700941-
d0674cc6-f5f0-4597-8ecf-7ffef5cdd21c.gif

FYI: video input does not work with cadence. It ignores your cadence
values.

INTERPOLATION
• interpolate_key_frames, selects whether to ignore prompt
schedule or _x_frames.
• interpolate_x_frames, the number of frames to transition thru
between prompts
when interpolate_key_frames = true, then the numbers in front of the
animation prompts will dynamically guide the images based on their
value. If set to false, will ignore the prompt numbers and
force interpolate_x_frames value regardless of prompt number

RESUME ANIMATION
• resume_from_timestring, instructs the run to start from a
specified point
• resume_timestring, the required timestamp to reference when
resuming
Currently only available in 2D & 3D mode, the timestamp is saved as
the settings .txt file name as well as images produced during your
previous run. The format follows:
yyyymmddhhmmss – a timestamp of when the run was started to
diffuse.

ANIMATION MODE BREAKDOWNS (DETAILED)


2D ANIMATION SETTINGS
Remember that in 2d animation mode, DSD is shifting the CANVAS
of the prior image, so directions may feel confusing at first.

ANGLE:

(0|-3 to 3) (2D only) Rotates image by () degrees each frame.


Positive angle values rotate the image counter-clockwise, (which
feels like a camera rotating clockwise.)

ZOOM:

(2D only) (1.10|0.8 – 1.25) Scales image by () percentage each


frame. zoom of 1.0 is 100% scaling, thus no zoom. zoom values
over 1.0 are scale increases, thus zooming into an image. 1.10 is a
good starting value for a forward zoom. Values below 1.0 will zoom
out.

TRANSLATION_X, TRANSLATION_Y

In 2D mode

(0|-10 to 10) In 2D mode, the translation parameter shifts the


image by () pixels per frame.

• X is left/right; positive translation_x shifts the image to the


right (which feels like camera shift to the left)
• Y is up/down; positive translation_y shifts the image down the
screen (which feels like a camera shift upward)

3D ANIMATION SETTINGS
Recall that in 3D animation mode, there is a virtual 3d space create d
from the prior animation frame, and a virtual camera is moved
through that space.

(Image CC BY 4.0 by Joey de Vries, from https://learnopengl.com)

3D Rotations follow the diagram above, with positive values


following the direction of the arrows. NOTE: DSD are measured in
degrees.

ROTATION_3D_X:

(3D only) (0|-3 to 3) Measured in degrees. Rotates the camera


around the x axis, thus shifting the 3D view of the camera up or
down. Similar to pitch in an airplane. Positive rotation_3d_x pitches
the camera upward.

ROTATION_3D_Y:

(3D only) (0|-3 to 3) Measured in degrees. Rotates the camera


around the y axis, thus shifting the 3D view of the camera left or
right. Similar to yaw in an airplane. Positive rotation_3d_y pans the
camera to the right.

ROTATION_3D_Z:

(3D only) (0|-3 to 3) Measured in degrees. Rotates the camera


around the z axis, thus rotating the 3D view of the camera clockwise
or counterclockwise. Similar to roll in an airplane. Positive
rotation_3d_z rolls the camera clockwise.

TRANSLATION_X, TRANSLATION_Y, TRANSLATION_Z:

In 3D Mode ONLY

(0|-10 to 10) In 3D mode, translation parameters behave differently


than in 2D mode – they shift the camera in the virtual 3D space.

• X is left/right; positive translation_x shifts the camera to the


right
• y is up/down; positive translation_y shifts the camera upward
• z is forward/backwards (zooming); positive translation_z shifts
the camera forward
The distance units for translations (x, y or z) in 3D mode are set to
an arbitrary scale where 10 units is a reasonable distance to zoom
forward via translate_z. Depending on your scene and scale, you will
need to experiment with varying translation values to achieve your
goals.

RUN SETTINGS
After your prompt and settings are ready, visit the Do the Run! code
cell near the bottom of the notebook, edit the settings, then run it.
DSD will start the process, and store the finished images in your
batch folder.

LOAD SETTINGS

Users may now select an “override” function that will bypass all
instructions from the notebook settings, and instead run from a
settings.txt file previously saved by the user. This function is reverse
compatible to v04. This feature does not auto-populate settings into
your notebook, however it directly runs the instructions found within
the .txt file.

• override_settings_with_file, enable this feature


• custom_settings_file, path for your settings file you want to
use

IMAGE SETTINGS – WIDTH AND HEIGHT

([Width, Height]|limited by VRAM) desired final image size, in pixels.


You can have a square, wide, or tall image, but each edge length
should be set to a multiple of 64px. If you forget to use multiples of
64px in your dimensions, DSD will adjust the dimensions of your
image to make it so.

The model was trained on a 512×512 dataset, and therefore must


extend its diffusion outside of this “footprint” to cover the canvas
size. A wide landscape image may produce 2 trees side-by-side as a
result, or perhaps 2 moons in either side of the sky.. A tall portrait
image may produce faces that are stacked instead of centered.

Significantly larger dimensions will use significantly more memory


(and may crash DSD!) so start small at first.

TIPS FOR PORTRAITS.

I tend to use 448×704 as a starting resolution for portirate style


images, anything taller can lead to double head situations.

Resolution of 512×832
Stable Diffusion Double Head
Let us know what works well for you.

SAMPLING SETTINGS:

• seed, a starting point for a specific deterministic outcome, ( -1


= random starting point)
• sampler, method in which the image is encoded and decoded
from latent space
o klms = Kernel Least Mean Square
o Dpm2 = Denoise Probabilistic Model
o Dpm2_Ancestral = dpm2 with reverse sampling path
o Heun = founded off of Euler by Karl Heun (maths &
derivative solving)
o Euler = fractional-order anisotropic denoise (Euler-
Lagrange equations)
o Euler_Ancestral = reverse sampling path to Euler
o Plms = Pre-trained Language Model(s)
o Ddim = Denoising Diffusion Probabilistic Models
• steps, the number of iterations intended for a model to reach
its prompt
Things to remember with steps values:

Considering that during one frame, a model will attempt to reach its
prompt by the final step in that frame. By adding more steps, the
frame is sliced into smaller increments as the model approaches
completion. Higher steps will add more defining features to an
output at the cost of time. Lower values will cause the model to rush
towards its goal, providing vague attempts at your prompt. Beyond a
certain value, if the model has achieved its prompt, further steps will
have very little impact on final output, yet time will still be a wasted
resource. Some prompts also require fewer steps to achieve a
desirable acceptable output.

During 2D & 3D animation modes, coherence is important to produce


continuity of motion during video playback. The value under Motion
Parameters, “strength_schedule” achieves this coherence by utilizing
a proportion of the previous frame, into the current diffusion. This
proportion is a scale of 0 – 1.0 , with 0 meaning there’s no cohesion
whatsoever, and a brand new unrelated image will be diffused. A
value of 1.0 means ALL of the previous frame will be utilized for the
next, and no diffusion is needed. Since this relationship of previous
frame to new diffusion consists of steps diffused previously, a
formula was created to compensate for the remaining steps to
justify the difference. That formula is as such:
Target Steps – (strength_schedule * Target Steps)

Your first frame will, however, yield all of the steps – as the formula
will be in effect afterwards.

• scale, a measurement of how much enforcement to apply to an


overall prompt. A normal range of 7-10 is appropriate for most
scenes, however some styles and art will require more extreme
values. At scale values below 3, the model will loosely impose
a prompt with many areas skipped and left uninteresting or
simply grayed-out. Values higher than 25 may over enforce a
prompt causing extreme colors of over saturation, artifacts
and unbalanced details. For some use-cases this might be a
desirable effect. During some animation modes, having a scale
that is too high, may trend color into a direction that causes
bias and overexposed output.
• ddim_eta, ONLY enabled in ddim sampler mode, will control a
ratio of ddim to ddpm sampling methods, with a range of -1 to
+1 with 0 being less randomized determinism.
SAVE & DISPLAY SETTINGS:

• save_samples, will save output images to the specified drive,


including cadence frames
• save_settings, will save a snapshot .txt of all settings used to
start a run with a timestamp
• display_samples, shows on-screen image of the completed
output
• save_sample_per_step, Users may now choose to view
intermediate steps of a frame, as well as the option to save
those steps as output images to drive. This powerful feature
may use a lot of drive space, and browser cache if many steps
are used in long renders. 500 steps will display/save 500
images.
• show_sample_per_step, visually show this in the notebook.

PROMPT SETTINGS

A numerical prompt weight feature has been added to deforum as a


selectable feature. When enabled, the run will interpret the values
and weights syntax of the prompt for better control and token
presence. Bonus: weights values adhere to MATH expressions for
even more control.

• prompt_weighting, no documentation, assumption is turning


the ability to use weights in prompts on.
• normalize_prompt_weights, ??
• log_weighted_subprompts, ??

BATCH SETTINGS:

This feature allows you to run a batch on your prompt or prompts


and have it generate various images with different seed values an d
then output this as a grid if you enabled this.
You can play around with this value and also adjust the
seed_behavior to see how the results turn out. Remember this is all
about experimenting so take the time to test this feature out.

Settings Details:

• n_batch, produces n amounts of outputs per prompt in ‘none’


animation mode
• batch_name, will create a folder and save output content to
that directory location
• seed behavior, will perform progressive changes on the seed
starting point based on settings:
o Iter = incremental change (ex 77, 78, 79 ,80, 81, 82, 83…)
o Fixed = no change in seed (ex 33, 33, 33, 33, 33, 33…)
o Random = randomized seed (ex 472, 12, 927812, 8001,
724…)
o Note: seed -1 will choose a random starting point,
following the seed behavior thereafter
o Troubleshoot: a “fixed” seed in 2D/3D mode will over
bloom your output. Switch to “iter”
• make_grid, will take take still frames and stitch them together
in a preview grid
• grid_rows, arrangement of images set by make_grid

INIT SETTINGS:

You can use your own custom images to help guide the model to try
to mimic more of the look and feel from the image you want.

• use_init, uses a custom image as a starting point for diffusion


• strength, determines the presence of an init_image/video on a
scale of 0-1 with 0 being full diffusion, and 1 being full init
source.
Note: even with use_init unchecked, video input is still affected.

• init_image, location of an init_image to be used


Note: in ‘none’ animation mode, a folder of images may be
referenced here.

• use mask, adds an image for instructions as to which part of


an image to diffuse by greyscale
• mask_file, location of the mask image to be used
• invert_mask, ranges the greyscale of a mask from “0 to 1” into
“1 to 0”
• mask_brightness_adjust, changes the value floor of the mask,
controlling diffusion overall
• mask_contrast_adjust, clamps min/max values of the mask to
limit areas of diffusion
Note: lighter areas of the mask = no diffusion, darker areas enforce
more diffusion

Examples to come showing off the init images, init videos and mask
features!

CREATE VIDEO FROM FRAMES


• skip_video_for_run_all, when running-all this notebook, video
construction will be skipped until manually checked and the
cell is re-run. It is off by default.
• fps, frame rate at which the video will be rendered
• image_path, location of images intended to be stitched in
sequence. The user must update this parameter to reflect the
timestamp needed.
• mp4_path, location to save the resulting video to
• max_frames, the quantity of images to be prepared for
stitching

MISCELLANEOUS
I want to run DSD on my super powerful home PC with the wicked
smart graphics card.

Info coming soon, but I highly recommend Visions of Chaos


Windows App for the Swiss Army knife of ML and AI Art scripts! I
should have a tutorial on this soon. IF you do go this route follow
the instructions verbatim!

GETTING YOUR OUTPUT


DSD will store your your images and videos into your google drive in:

\My Drive\AI\StableDiffusion\<date>\folder name based upon


the batch_name setting

You can browse to this directory in a second window to monitor


progress, and download the entire folder when your project is
complete.

That’s all folks!

RESOURCES
Here are some useful links:
DEFORUM STABLE DIFFUSION NOTEBOOK

https://colab.research.google.com/github/deforum/stable-
diffusion/blob/main/Deforum_Stable_Diffusion.ipynb

DISCORD

Deforum Stable Diffusion User Discord (JOIN THIS!)

DEFORUM STABLE DIFFUSION OFFICIAL DOCS


Most of our guide is a combination of these, which usually has all
the latest info.

• DSD Settings Guide


• DSD v0.5 Details and Explanation’s
• Math Guide
Credit is due to the respective authors.

STABLE DIFFUSION PARAMETER STUDIES:

• Stable Diffusion Ultimate Beginners Guide


• SD Guide for Artists and Non-Artists***

MODIFIERS

• CFG Studies
• Sampler Studies
• Sampler / Step Count Comparison with timing info
• Stylistic Lighting Studies
• Math Guide for Animation Settings

ARTIST STUDIES

• Disco Diffusion Artists Study


• Stable Diffusion – Micro Art Studies
• WIP list artists for SD v1.4
• SD Artist Collection

STYLE STUDIES / TEXT PROMPTS


• Trending on Artstation and other Myths
• Test seeds, clothing and clothing modifications

You might also like