New to AI image generation and prompt engineering? Feeling lost? Welcome!
We’ve all been there. Writing prompts for artificial intelligences is not easy. Everything feels overwhelming and new.
That’s why we’ve created this comprehensive Stable Diffusion prompt guide (also featuring other AI models like Midjourney and DALL-E!)
This guide contains everything you need to know to get you from absolute zero to creating amazing images, including Stable Diffusion tips and plenty of Stable Diffusion examples!
So sit back, relax, and enjoy your prompting experience!
💡 Pro tip: if you really want to go from zero to hero in prompt engineering, check out our crash course in AI image generation!
- «Alexa, play “Despac-”» – Your first prompt
- General prompting
- «What does it mean?» – Glossary
You may want to check out first: What is Artificial Intelligence? · AI generated images · Stable Diffusion vs DALL-E 2 vs Midjourney
«Alexa, play “Despac-”» – Your first prompt
Let’s get straight into business: you have picked one of the AI’s, you have this idea in your mind you want the AI to capture, you write the prompt, you press the button and…
You had the whole thing planned in your head, and by the pictures you’ve seen, everyone is having great results. Why not you?
Same, dude. Same.
Let me guess, you have written something like “a green valley with mountains at the back”, am I right? You haven’t specified anything else. It’s a raw prompt. That’s ok – been there, done that. Something you need to keep in mind is that the AI is not a mind-reader (yet) and it’s absolutely necessary it understands what you’re trying to say to it.
Remember a few years ago, when the song “Despacito” was EVERYWHERE? And unless you were a Spanish-speaking person, the lyrics were incomprehensible. However, everyone around the world was singing to the tune and asking for it in every single pub – not really caring about it or, eventually, getting the lyrics correctly. Something similar happens when prompting – you will get an output from the AI but it won’t be as accurate as you would like in the first place because you’re hardly able to communicate with the AI, and the dialogue is, well, incomprehensible. Then again, you will get what you attempted at some point, when you learn what and what not to ask the AI. Luckily for you, that’s what this guide is all about.
When it comes to prompting, there’s more than one way to skin a cat, although there are a few tricks everyone uses.
We’ve talked about this before – it’s the simplest way of describing what you want to generate, like:
- A dog.
- A knight on a horse.
This is the most basic building block for any prompt. Most new people start by only using raw prompts, as you and I. However, this could be a mistake as the images you generate this way tend to get random and chaotic. These are the images I’ve generated using the prompts above:
As you can see, these images have random scenery and don’t look very aesthetically pleasing, they could be considered as basic concepts of art, but we both know we could do better.
And that brings me to my next point…
Style is one of the most crucial parts of the prompt. The AI, when is lacking a specified style, usually chooses one that has seen the most in related images (as in the dog above, for example). Having a well chosen style + raw prompt is sometimes enough, as the style influences the image the most right after the raw prompt.
Here are a few of these styles, why don’t you have them a go?
If there’s, however, a specific artist you like, why don’t you shoot their name in the prompt? What could go wrong? 😉
How to be descriptive
This could be easily the most difficult part. Ironic, huh? But it’s true – we find ourselves sometimes struggling with the fact that we have this image in our heads and not enough or accurate words to describe it. Therefore, the AI will give us an image maybe close enough to our idea but not entirely.
Here are some tips that may help you with this blocking:
- Order matters!!! Just keep in mind order matters – words near the front of your prompt are weighted more heavily than the things in the back of your prompt.
- If you’re still using the word “very” before any other word, STOP IT. IMMEDIATELY. Try to find an accurate word instead of adding “very” to everything in order to highlight it. There is this website that might help you out with this.
- Try to follow this steps: content type > description > style > composition.
- Content type: What type of artwork you want to achieve? Is it a photograph, drawing, sketch, 3D render..?
- Description: define the subject, subject attributes, environment/scene. The more descriptive you are with the use of adjectives, the better the output.
- Style: we’ve seen the most common ones above, but there are also “sub-categories” – lightning, detail…
- Composition: it refers to aspect ratio, camera view and resolution.
Finally, there are some words to improve your prompt, and obviously, the image you’re gonna get. These could be considered as final touches, and you can add as many and as random as you want, but here are a few examples:
Related to: lighting
accent lighting, ambient lighting, backlight, blacklight, blinding light, candlelight, concert lighting, crepuscular rays, direct sunlight, dusk, Edison bulb, electric arc, fire, fluorescent, glowing, glowing radioactively, glow-stick, lava glow, moonlight, natural lighting, neon lamp, nightclub lighting, nuclear waste glow, quantum dot display, spotlight, strobe, sunlight, ultraviolet, dramatic lighting, dark lighting, soft lighting, gloomy
Related to: detail
highly detailed, grainy, realistic, unreal engine, octane render, bokeh, vray, houdini render, quixel megascans, depth of field (or dof), arnold render, 8k uhd, raytracing, cgi, lumen reflections, cgsociety, ultra realistic, volumetric fog, overglaze, analog photo, polaroid, 100mm, film photography, dslr, cinema4d, studio quality
Related to: artistic techniques and materials
Digital art, digital painting, color page, featured on pixiv (for anime/manga), trending on artstation, precise line-art, tarot card, character design, concept art, symmetry, golden ratio, evocative, award winning, shiny, smooth, surreal, divine, celestial, elegant, oil painting, soft, fascinating, fine art
Related: to camera view and quality
ultra wide-angle, wide-angle, aerial view, massive scale, street level view, landscape, panoramic, bokeh, fisheye, dutch angle, low angle, extreme long-shot, long shot, close-up, extreme close-up, highly detailed, depth of field (or dof), 4k, 8k uhd, ultra realistic, studio quality, octane render,
Related to: style and composition
Surrealism, trending on artstation, matte, elegant, illustration, digital paint, epic composition, beautiful, the most beautiful image ever seen,
Related to: colours
Triadic colour scheme, washed colour
«What does it mean?» – Glossary
We have covered quite a lot of info about AIs, with loads of specific vocabulary. To recap:
Model: machine learning model, deep learning model, AI model, statistical model… they all mean the same thing. The model is just a mathematical expression that takes something as input and spits out something as output. An AI is just a mathematical model. It tries to replicate (that is: models) something in the real world, whether that’s raw data, pictures, music, etc. In AI image generation, the model takes text as an input and spits out images as output.
Guidance Scale/CFG (Classifier Free Guidance) Scale: it adjusts how much the image will be like your prompt. Higher values keep your image closer to your prompt.
Diffusing: the mechanism used by AI image generation models to generate images. In a nutshell: the AI starts with an image that consists entirely of just random noise, and step by step it tries to remove the noise until the final image is created. The noise that is removed in every step is conditioned by the prompt given, that’s how you end up with a clear image and not with just random noise.
Open-source: technology whose source code is publicly available. Anyone can access the source code and read it. Depending on the open source license used by each project, the technology might be modified, redistributed or available for commercial and non-commercial uses. Stable diffusion is an open-source technology. It means everyone can see its source code, modify it, create something based on Stable Diffusion and launch new things based on it.
Prompt: the description of the image the AI is going to generate.
Render: the act of transforming an abstract representation of an image into a final image. In 3D modelling, if you’re creating a 3D model, that’s just polygons and mathematical equations. To get an actual image out of them, you need to render it (which involves calculating shadows accurately, computing how light reflects off surfaces and what colors it generates in doing so, etc.). Technically speaking, this is not what Stable Diffusion does. This is the old way. Stable Diffusion diffuses an image, rather than rendering it.
Sampler: the diffusion sampling method.
Sampling Method: this is quite a technical concept. It’s an option you can choose when generating images in Stable Diffusion. In short: the output looks more or less the same no matter which sampling method you use, the differences are very subtle and it shouldn’t matter much which one you select. Some people say there are three groups: group A (DDIM, Euler, DPM2, HEUN, LMS, DPM_adaptive and PLMS) is more soft and artsy; group B (DPM_fast) gives more variety and random results; and group C (DPM2, Euler_a) gives results that are a bit more photorealistic and clear. To recap: if you want soft and artsy, you could use DPM_adaptive or DDIM; if you want variety go for DPM_fast; and if you’re looking for photorealism try DPM2 or Euler_a.
Seed: used to limit randomness. Generations with the same prompt, params and seed will result in the same image.
Steps: how many steps to spend generating (diffusing) your image. More steps, more image quality and time to generate.
Text-to-image: A type of AI, like Stable Diffusion, that takes text prompts as input and outputs images.
You may also be interested in what is artificial intelligence, AI generated images, or an Stable Diffusion vs DALL-E 2 vs Midjourney comparison.