Unlike text-to-image models, image-to-image models take in a prompt and an image as input. We discuss this more here. Now, we’ll look at both situations in more complex examples.

from flushai import Chain

Example 1: Images as Input

Let’s say we want to generate an image with a text-to-image model. However, we are not satisfied with its quality, so we want to use an upscaler to refine the generated image.

First, let’s initialize our models. We’ll be using Stable Diffusion XL as the text-to-image model and RealESRGAN as the upscaler.

from flushai.models.diffusion.img2img import StableDiffusionXL
from flushai.models.diffusion.img2img.upscalers import RealESRGAN

diffusion = StableDiffusionV21(api_key="YOUR_API_KEY")
upscaler = RealESRGAN(api_key="YOUR_API_KEY", scale=4)

Now, let’s define a prompt to pass into this chain with Flush AI’s PromptTemplate.

from flushai.prompts import PromptTemplate

prompt = '''a giant monster hybrid of {animal1} and {animal2},
in dark dense foggy forest'''
prompt_template = PromptTemplate(prompt)

Finally, let’s put everything together with a Chain.

params = {
    "num_images": 1
}

chain = Chain(
    diffusion_output = (diffusion, prompt_template, params),
    upscaler_output = (upscaler, "{diffusion_output[0]}")
)

Note that we pass in diffusion_output[0]. This is because diffusion_output is an array of string image urls, and image-to-image models only take in a string of one image url.

We can run this chain with the following command.

result = chain.run(animal1="dragon", animal2="spider")

Example 2: Text as Input

As shown previously here, we can use chains to improve prompts for text-to-image stable diffusion models. Let’s do the exact same, but for image-to-image models.

First, lets initialize our models. We’ll be using Flush’s OpenAI’s GPT-4 wrapper as the LLM and Stable Diffusion XL as the image-to-image model.

from flushai.models.llms import OpenAI
from flushai.models.diffusion.img2img import StableDiffusionXL

llm = OpenAI(model_name="gpt-4", api_key="YOUR_OPENAI_API_KEY")
diffusion = StableDiffusionXL(api_key="YOUR_API_KEY")

Let’s use the same PromptTemplate from the text-to-image example:

from flushai.prompts import PromptTemplate

prompt = '''
(subject of the image), (5 descriptive keyword), (camera type), 
(camera lens type), (time of day), (style of photography), 
(type of film), (Realism Level), (Best type of lighting for the subject).

Based on the above structure, create a detailed narrative of the scene in 20 
words. Generate only 1 variation. Return strictly only the narrative. Subject 
of the prompt is: {subject}
'''

prompt_template = PromptTemplate(prompt)

Now, let’s put everything together with our Chain.

params = {
    "image": "https://flush-user-images.s3.amazonaws.com/generated_images/8fe20804-c677-491c-be3f-262fe0c3653a/image_168.jpg"
    "num_images": 1
}

chain = Chain(
    llm_output = (llm, prompt_template),
    diffusion_output = (diffusion, "{llm_output}", params)
)

chain.run(subject="cowboy")