Example 1: Prompt Enhancement

Many complex image generation workflows involve using LLMs to vastly improve prompts for stable diffusion. Below, we show a simple example of how we can use an LLM to enhance a starter prompt to generate images using a text-to-image model with Flush.

from flushai import Chain

First, lets initialize our models. We’ll be using Flush’s OpenAI’s GPT-4 wrapper as the LLM and Stable Diffusion XL as the text-to-image model.

from flushai.models.llms import OpenAI
from flushai.models.diffusion.text2img import StableDiffusionXL

llm = OpenAI(model_name="gpt-4", api_key="YOUR_OPENAI_API_KEY")
diffusion = StableDiffusionXL(api_key="YOUR_API_KEY")

Now, let’s define a prompt to pass into this chain. We use Flush’s prompt templates to create a prompt for GPT-4 to format a prompt for stable diffusion simply based on a topic:

from flushai.prompts import PromptTemplate

prompt = '''
(subject of the image), (5 descriptive keyword), (camera type), 
(camera lens type), (time of day), (style of photography), 
(type of film), (Realism Level), (Best type of lighting for the subject).

Based on the above structure, create a detailed narrative of the scene in 20 
words. Generate only 1 variation. Return strictly only the narrative. Subject 
of the prompt is: {subject}
'''

prompt_template = PromptTemplate(prompt)

As such, we can pass in a topic into the subject parameter in our chain, and GPT-4 will generate a prompt optimized for stable diffusion. Now, let’s put everything together with our Chain.

params = {
    "num_images": 1
}

chain = Chain(
    llm_output = (llm, prompt_template),
    diffusion_output = (diffusion, "{llm_output}", params)
)

As such, if we look at the chain we constructed above, llm_output, is the output from the LLM (Refer here for more information about formatting chains). This is then passed as the prompt for the stable diffusion model. Finally, diffusion_output is the final output that is returned from this chain.

Note that we are able to pass in either a PromptTemplate or a string for the first model in the chain, but every other prompt after that must be a string in the format of a PromptTemplate, i.e. Python’s f-string. We can also add to this prompt with the previous llm_output if desired:

params = {
    "num_images": 1
}

chain = Chain(
    llm_output = (llm, prompt_template),
    diffusion_output = (diffusion, "{llm_output}, realistic, 4k", params)
)

We can then run this chain with the following command:

result = chain.run(subject="urban photography")

Note how we pass in “urban photography” as the subject due to the prompt template we defined above. This gives us the following output:

This is a significant improvement that if we just passed “urban photograpy” as the prompt into Stable Diffusion XL: