Image-to-Image Chains
This page will tell you how to create image-to-image workflows using chains on Flush.
Unlike text-to-image models, image-to-image models take in a prompt and an image as input. We discuss this more here. Now, we’ll look at both situations in more complex examples.
from flushai import Chain
Example 1: Images as Input
Let’s say we want to generate an image with a text-to-image model. However, we are not satisfied with its quality, so we want to use an upscaler to refine the generated image.
First, let’s initialize our models. We’ll be using Stable Diffusion XL as the text-to-image model and RealESRGAN as the upscaler.
from flushai.models.diffusion.img2img import StableDiffusionXL
from flushai.models.diffusion.img2img.upscalers import RealESRGAN
diffusion = StableDiffusionV21(api_key="YOUR_API_KEY")
upscaler = RealESRGAN(api_key="YOUR_API_KEY", scale=4)
Now, let’s define a prompt to pass into this chain with Flush AI’s PromptTemplate
.
from flushai.prompts import PromptTemplate
prompt = '''a giant monster hybrid of {animal1} and {animal2},
in dark dense foggy forest'''
prompt_template = PromptTemplate(prompt)
Finally, let’s put everything together with a Chain
.
params = {
"num_images": 1
}
chain = Chain(
diffusion_output = (diffusion, prompt_template, params),
upscaler_output = (upscaler, "{diffusion_output[0]}")
)
Note that we pass in diffusion_output[0]
. This is because diffusion_output
is an array of string image urls, and image-to-image models only take in a string of one image url.
We can run this chain with the following command.
result = chain.run(animal1="dragon", animal2="spider")
Example 2: Text as Input
As shown previously here, we can use chains to improve prompts for text-to-image stable diffusion models. Let’s do the exact same, but for image-to-image models.
First, lets initialize our models. We’ll be using Flush’s OpenAI’s GPT-4 wrapper as the LLM and Stable Diffusion XL as the image-to-image model.
from flushai.models.llms import OpenAI
from flushai.models.diffusion.img2img import StableDiffusionXL
llm = OpenAI(model_name="gpt-4", api_key="YOUR_OPENAI_API_KEY")
diffusion = StableDiffusionXL(api_key="YOUR_API_KEY")
Let’s use the same PromptTemplate
from the text-to-image example:
from flushai.prompts import PromptTemplate
prompt = '''
(subject of the image), (5 descriptive keyword), (camera type),
(camera lens type), (time of day), (style of photography),
(type of film), (Realism Level), (Best type of lighting for the subject).
Based on the above structure, create a detailed narrative of the scene in 20
words. Generate only 1 variation. Return strictly only the narrative. Subject
of the prompt is: {subject}
'''
prompt_template = PromptTemplate(prompt)
Now, let’s put everything together with our Chain.
params = {
"image": "https://flush-user-images.s3.amazonaws.com/generated_images/8fe20804-c677-491c-be3f-262fe0c3653a/image_168.jpg"
"num_images": 1
}
chain = Chain(
llm_output = (llm, prompt_template),
diffusion_output = (diffusion, "{llm_output}", params)
)
chain.run(subject="cowboy")