How to Create Convincing Human Voices With Bark AI

How to Create Convincing Human Voices With Bark AI

The Bark AI model is an innovative technology that can be used to generate realistic human voices. This technology takes advantage of the latest advancements in natural language processing, deep learning, and voice synthesis. By combining these technologies, it is able to accurately replicate the nuances and inflections of real human speech. This differs from the outright robotic sounding options (such as Microsoft SAM) or the more convincing but still quite "off" sounding solutions published by Google. The resulting voices from Bark are far more similar to a real person's voice, making them potentially suitable for a variety of applications such as audio narration, podcasts, narration, and even video games. Furthermore, the Bark AI model also supports multiple languages and dialects so that it can be used in various contexts around the world.

It should be noted off the bat that Bark is a model that does not support voice cloning directly, as that is a more complicated process that will come in a later article. Rather, Bark is a simple more fire-and-forget option that can be ideal for smaller projects as it is simple to install and use. It can be installed within a minute or two in almost any container and just works right out of the box with a minimum of configuration.

Installing Bark on RunPod

Bark is not particularly picky on resources, and to install it I actually ended up just sticking it in a text generation pod that I had conveniently at hand. But if you're setting up a pod from scratch, then just a simple PyTorch pod will do just fine. As long as you have at least 12gb of VRAM in your pod (which is basically almost any option available on RunPod) you're good to go. Just make sure that the volume has about 10gb volume space free in the container (not volume) so that you'll be able to download the model.

The installation is the best part of it. It's one single line. Just go ahead and run this in your preferred directory within the web terminal or in Jupyter Notebook.

pip install git+https://github.com/suno-ai/bark.git

Then, copy and paste the following into a Jupyter notebook and run it:

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
from IPython.display import Audio
preload_models()
text_prompt = """
Hello, my name is Suno. And, uh — and I like pizza. [laughs]
But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)
write_wav("bark_generation.wav", SAMPLE_RATE, audio_array)
Audio(audio_array, rate=SAMPLE_RATE)

Once you run the cell by pressing Ctrl+Enter, it will create a downloadable audio file with the requested voice speaking the requested text. Alter the text_prompt variable to change what you would like the voice to say. Bark comes with dozens of different potential voices out of the box that you can experiment with, and they have a handy prompt library to review here. To use a specific voice prompt, consult the library for the list and add it to the generate_audio function as listed below.

audio_array = generate_audio(text_prompt, history_prompt="en_speaker_1")

You can check out Suno AI's Bark Github repo for further documentation if needed.

So what can you use AI voice synthesis for?

There's a number of use cases for creating voices in AI, including but not limited to:

  • Automated customer service functions (such as voices for phone trees)
  • Instructional videos
  • Narration for apps
  • Audiobook narration
  • Mockups and scripts for voice acting (e.g. helping actors with line reads)

Essentially, anything that you would use Google's TTS for, you can use Bark for, with the bonus that Bark simply outputs a voice that is less recognizably synthesized. In addition, Google voices are already widely well-known and heard everywhere within their own ecosystem (such as Google Maps) so your voices will stand out as being more unique. This is especially important on certain services such as Youtube, which tend to frown on using text to speech narration and are already well-tuned to catch commonly used TTS voices, but Bark's much richer and more natural voice creation will help you sound more convincing.

Join the community

If you have any questions on how to best utilize Bark, feel free to pop into our Discord and ask us a question - we would love to hear from you!