Over the past few weeks and months, many creative AI tools have seen the light of day. Tools that allow you to create images from text, text to speech and translations, and even text to video — we’ve seen them all. So which tools are you there, and how can they help your creative and design processes? Let’s take a look!
Text to images
As I mentioned in the intro, the most popular tools we’ve seen these past weeks definitely have been the text-to-image tools. There are three different ones that have had major publicity: Dall-E, Midjourney and StableDiffusion. And while all three are doing the same thing, they all do it in slightly different ways, and are more or less user-friendly in their use.
OpenAI’s DALL-E (version 2)
Named after the painter, DALL-E is an AI that can create images and art from natural language. Basically, it does what most text-to-image generation tools do. However, what makes DALL-E special is its ability to make changes to existing images. Through the very friendly user interface, you can erase any parts of an image you don’t like. Then you can type your prompt as normal, and DALL-E will attempt to fill in the erased parts with whatever you’ve asked for. It allows you to customise images very easily and makes image generation much more versatile.
You can try and use DALL-E directly from their website: https://openai.com/dall-e-2/, making it very easy to get started. The first few tries are free, and you will get additional credits for free each month. You can also pay for more credits to use via the website. Next to that, for more advanced usage, you can directly use APIs to create the images.
Some example images generated/edited via DALL-E:
Ever since one of the images generated by Midjourney won an art contest, it’s been heavily praised for the images and digital art it can generate. And with the recent introduction of the ‘v4 algorithm’, the image results have become even crazier. With the previous algorithm, it took a lot more trial and error to get the best results you need. That all changes with v4, where it almost becomes too easy to get the results you want. Let’s quickly compare for the same prompt: ‘a person under an umbrella in the rain, in a park with trees sitting on a park bench’.
Aren’t those results completely insane? The v4-generated images may as well be handcrafted by a designer. They are almost, albeit with some small details here and there, completely realistic and without fault.
Using Midjourney is also incredibly easy, it does however take a different approach than DALL-E. Midjourney is primarily used through their Discord bot, to which you can send your image prompts. You will then automatically get a message once your image is generated, and allows for further options. While Midjourney does not offer the ability to erase part of the image and regenerate it, there is something else that it does have working for it. As you can see from the images above, I can generate images with different algorithms. There are however more options that I can pass along when I want to get my images. I can specify the aspect ratio, scale up certain images, give it an image to use as a reference, and more. As you can see, there are quite a lot of options to tailor the way Midjourney generates your images. All that allows for incredibly realistic results.
You can get started with Midjourney through their website: https://www.midjourney.com/. This will also explain how to get the Discord bot going, as well as provides more details on the options I explained above. Similar to DALL-E, you get free images every month; next to that, you can subscribe for faster processing and a higher amount of images you can make.
The final text-to-image tool I want to address is Stable Diffusion. Contrary to DALL-E and Midjourney, Stable Diffusion is actually open source and free to use both commercially and non-commercially. While that makes it more attractive for a certain group of people, it’s also more difficult to actually use. There are no ‘official’ ways to generate images. While there are some tools available, most notably DreamStudio (currently in Beta), the ecosystem around generating images currently isn’t as big as the other two tools. But for most other cases you’ll likely set up your own environment and manage the images that way. For more info on that, you can go to Stable Diffusions GitHub: https://github.com/CompVis/stable-diffusion.
Here is the same example as the prompt I used in the other tools:
Something I feel worth noting is Google's effort in the text-to-image area. They’ve recently released a research paper on their tool called ‘Imagen’. While the research paper and published images look incredibly good, they’ve not yet made any tools available for the public to use. I did come across a GitHub repo (https://github.com/lucidrains/imagen-pytorc) that’s an implementation of Imagen. Similarly to Stable Diffusion, you’ll need to manage the tech yourself to use it.
Music and sounds
Sound is everywhere. Whether you need it for background music in a video or podcast, or just want to listen to a song in general. Content isn’t the same without that piece of music to add an additional level of mood.
With this tool called ‘Mubert’, you can easily get any type of music or sound that your heart desires. The UI is incredibly easy to use. You can start by selecting your genre, mood or even activity. Then it’s a matter of setting the duration and you’re off to the races. Mubert is free for personal use with a limit of 25 tracks per month. For more tracks per month as well as professional use, you’ll need a subscription.
Another tool in the same style as Mubert, is Amper Music. While you can use Mubert immediately out of the box, Amper Music does require a login before you can start creating. Sadly though, if you actually want to download and use the generated music, you’ll have to pay for a license. I found the UI to be a bit weird at the beginning, but once you get the hang of it you realise there are actually more options available than you would expect. You have to again start by selecting a duration and genre, and an initial song will be generated. Once that is done, you can listen to and adjust the song to your liking. You can do that by changing instruments and the way they are placed in your song, changing the key or just the genre in itself. It definitely allows for more customisation than you get from other tools.
Speech, text and translations
Writing blogs is definitely my thing, but having help is always welcome! I might use Grammarly for spelling, but that’s honestly it. There are however some tools you can use to transform the way you’re writing, as well as translating, texts. Let alone using speech-to-text for easier writing and other applications.
Arguably the best tool out there right now when it comes to doing speech-to-text is OpenAI’s Whisper. Whisper provides four different algorithms, all with varying degrees of accuracy and speed. Each of those algorithms has the ability to both do speech-to-text, as well as translate those texts contextually. It’s basically your one-stop shop for all your textual needs; whether you’re translating a blog, using your voice to write rather than typing, or anything in between. Whisper is currently only API driven, so I’m afraid it’s not that easy to use for non-developers yet. However, I’m certain there will be some tools popping up soon that will be leveraging the underlying Whisper APIs.
It’s a fairly technical paragraph title for incredibly cool technology. I’ll not try and explain it in too many details, other people with in-depth knowledge can do that much better: https://www.youtube.com/watch?v=4Bdc55j80l8. In any case, it is the technology behind being able to generate human-readable text. There are several transformers out there that have proven to be incredibly good. I’d recommend trying one of the following: GPT-3 or GPT-J-6B and GPT-NeoX-20.
Either of the options comes with an easy-to-use playground that you can use for testing out the functionality. As with most tools, the first use is free, after that, you’ll have to pay a small fee for the usage. The playground for either model does however give you all the options to see how it works and what it can do for you. Writing has never been this easy!
Conclusion — A lot of AI
I’m going to be honest here. I haven’t even scratched the surface of all the AI tools out there right now. Let alone the ones that are going to be coming out in the near future. AI is very much alive (hah) and we’ll be seeing so many cool developments coming out that are going to be amazing for everyone to use, not just creatives.
Hopefully the above has at least given you a start of the tools that you could start using, many of them even right away and without too much hassle. They are incredibly fun to use, and of course very helpful in making your creative mind even more creative.
PS. If you feel I’ve missed something important, let me know!