Generative AI, including the conversational AI ChatGPT, has made great strides and is already being widely used in everyday life and business. Generative AI can easily generate a variety of content such as images, audio, and text, and is expected to bring many benefits, including improved business efficiency and the creation of new ideas.
As new services emerge one after another, it will be necessary to have a grasp of the basics in order to choose the most suitable one and utilize it for your company.
This article provides an easy-to-understand explanation of the types of generative AI, how to use it, and what it can do. Please use this article as a reference to learn about generative AI, which is a field to keep an eye on in the future for market trends and the emergence of new services, and to utilize it in your business or work
What is generative AI?
Generative AI is a type of AI (artificial intelligence) that is also known as “generative AI.” It is characterized by its ability to generate creative outcomes using AI, and can generate a wide range of things, including music, images, videos, program code, and text.
Generative AI is a machine learning model built using ” deep learning, ” a technology in which the AI searches for answers and learns on its own , and is a relatively new model in the world of AI.
What sets AI apart from conventional AI is that it can produce creative results like humans. Examples include the image generation AI ” Stable Diffusion ” and the text generation AI ” ChatGPT .”
Generative AI is expected to be used as a tool to support human work and tasks. For example, text generation AI can be used to summarize reports, and music generation AI can be used to create simple background music for video production.
Basic usage of generative AI
To use generative AI, you need to input data in a format that is compatible with each AI tool. As an example, we will introduce the following three generative AI examples:
- Text generation AI: Text (prompt) input
- Image generation AI: Image input
- Transcription generation AI: Voice input
In text generation AI, requests or questions (prompts) for the AI are written in a text box on the web and sent to the AI, which then analyzes the input and returns an appropriate answer.
Image generation AI includes those that generate images based on a prompt, like text generation AI, and those that input images to be learned into an AI tool, where the AI learns from the input images and generates completely new images with the characteristics of those images. For example, one method would be to load dozens or hundreds of images of a work of art.
Well-known transcription generation AI tools include “Whisper,” which inputs audio data into an AI and outputs it as text.
Differences between AI and the definition
AI often refers to discriminative AI, which is a type of AI that distinguishes whether given data is correct or incorrect. As seen in character recognition, OCR, and AI cameras, discriminative AI learns from large amounts of data in advance and is widely used in fields such as product quality checks and image recognition.
However, since the launch of the image generation AI “Midjourney” in August 2022, interest in generative AI has increased and the definition of AI has changed.
Generative AI has the ability to generate new content from data and takes a different approach from conventional discriminative AI.
In this way, in recent years, the definition of AI has been expanding from discriminative systems to generative systems.
Types of Generative AI
There are several types of generative AI, including image generation, text generation, video generation, audio generation, etc. By using different generative AI depending on the application, you can create a result close to the desired form.
In recent years, generative AI, such as image generation and video generation, has been attracting particular attention, but technology is also developing for generative AI, such as text generation and speech generation. Here, we will explain in detail four types of generative AI.
Image Generation
Image generation AI is a system in which AI generates an original image based on the content of text entered by the user. Because it can generate a completely new image in just a few to several tens of seconds, it is expected to be used in a wide range of creative industries, including the design industry, to support business operations and generate new ideas.
One of the most well-known image generation AI services is “Stable Diffusion.” With Stable Diffusion, users can input the specific image they want to generate in English text, and the service can output a variety of images.
Text Generation
Text generation AI is a system in which a user enters a question into a text box, and the AI analyzes the content of the question and generates a text answer. The accuracy varies depending on the language model used, but “ChatGPT,” which has been attracting attention in recent years, is capable of providing highly accurate answers that sound as if a human were answering.
Text generation AI can also be used to, for example, input programming code that shows an error directly into the AI and have it point out the error.
However, because it learns from information on the web, it does not always return correct answers at present. It is necessary to use it while judging whether the answers are right or wrong, rather than just accepting them at face value.
Video Generation
Video-generation AI is also emerging as an advanced form of image-generation AI. For example, the developer of the aforementioned “Stable Diffusion” is developing an AI model called “Gen-1” that can remake an input video into a completely new video.
Examples of video generation AI that utilize “Gen-1” include Meta’s “Make-a-Video” and Google’s “Phenaki.” These video generation AIs work by allowing you to input the image of the video you want to generate in text, and then generate a short video that matches that image.
Due to its nature of “converting existing footage,” it has the potential to be able to generate longer videos, and it is a generative AI that is expected to continue to evolve further.
Speech generation
Voice generation AI is a generative AI that can learn the characteristics of voice by inputting voice data and generate new voice data. For example, “VALL-E” developed by Microsoft can learn human voices with high accuracy and faithfully reproduce them by simply inputting a three-second voice sample.
Once trained, the text-to-speech model not only reproduces the trained tone of voice, but also enables emotional expression.
By utilizing this technology, it is possible to perform operations such as automatically generating narration using specific human voices and using them as material.
How generative AI works and the generative models used
There are several different generative models used by generative AI to generate content, depending on the nature of the AI.
Here, we will explain four types of generative models: “VAE” and “GAN”, which are often used for image generation, the “diffusion model” used in Stable Diffusion, and “GPT-3”, which is incorporated into text generation AI.
Variational Autoencoder (VAE)
VAE is a generative model that uses deep learning and is called “Variational Autoencoder”. It can learn features from AI training data and generate “new content that is similar to the training data” based on the features of that data.
[VAE content generation image]
- Users provide the AI with data to learn from.
- AI learns features from given learning data
- Generate completely new content based on the characteristics of data that AI has learned
- Providing generated content to users
VAE is suited to learning from multiple works with a certain tendency and producing works that are similar to that style. For example, it can be used to learn the works of an illustrator or painter and produce new illustrations that have the characteristics of the artist.
VAE is also suitable for capturing the features of highly complex images, so it is also used for anomaly detection in industrial products with complex structures.
GAN (Generative Adversarial Networks)
GAN (Generative Adversarial Networks) is also a type of image generation model, but unlike the VAE mechanism, it generates new images using two network structures called a “Generator” and a “Discriminator.”
The Generator is randomly generated data, and the Discriminator is correct data for learning. By having the Generator and Discriminator compete with each other while learning, highly accurate images can be generated.
[Image of GAN content generation]
- Create a generator from random noise
- Prepare the correct data, the “Discriminator”
- Compare the Generator and Discriminator to determine whether the Generator is genuine.
- Repeat steps 1 to 3 to improve the accuracy of the Generator.
- Outputting images with sufficient accuracy
GANs can be used to generate high-resolution images from low-resolution images, or to generate entirely new images from text.
Diffusion Model
The diffusion model is a model used in image generation AI such as “Stable Diffusion” and “DALL-E2.”
In the diffusion model, noise is added to a training image, and then the noise is removed from the training image to restore the original image. By repeating this process of “adding noise to a given image and restoring the original image,” the AI learns how to generate images.
[Image of content generation using the diffusion model]
- Adding noise to training images
- Remove noise from training images that have noise added to them
- Repeat step 2 to restore the original image
- Steps 1 to 3 are repeated to generate a highly accurate image.
By utilizing the diffusion model, it is possible to generate images with even higher resolution than GAN. The diffusion model can be considered an advanced form of GAN.
GPT-3
GPT-3 is a type of language model developed by the US company OpenAI, and has attracted a lot of attention due to the involvement of people such as Elon Musk, CEO of the car manufacturer Tesla and the social networking site Twitter.
By learning from a massive amount of text data (approximately 45TB), the AI is able to predict with high accuracy the candidate words that will be written next to a given word, generating sentences that seem natural as if they were written by a human.
[How text generation works using GPT-3]
- The user writes a question in the text box, enters it, and submits it.
- AI analyzes the content of the question and derives the most appropriate answer
- The AI outputs the answer and conveys it to the user.
An example of a text generation AI using GPT-3 that has been attracting particular attention in recent years is “ChatGPT” developed by OpenAI. It is expected to be useful in a variety of situations, including summarizing long texts, shortening research time, and generating new ideas.
GPT-4
GPT-4 is one of the large-scale language models (LLM) provided by OpenAI, the developer of GPT-3, and has achieved results far surpassing the performance of GPT-3. GPT-4 can be used through the company’s app “ChatGPT”, and outputs highly accurate text in response to prompts. Users can experience an experience that feels as if they are conversing with a human, for example, “sending request emails in business situations”, “creating work manuals”, “conducting cross-reviews”, etc.
When the GPT-3 model was released, there was a problem in which the accuracy of the generated text could not be guaranteed, but the GPT-4 model, although not perfect, outputs text with very high accuracy. Therefore, it is possible to extract more text (information) with fewer instructions, and new ways of using it are being discovered by users every day.
Since the release of GPT-3, ChatGPT has released evolved GPT models, GPT-3.5 and GPT-4. In particular, when comparing the performance difference between the GPT-3.5 and GPT-4 models, interesting results were obtained, so if you are interested, please refer to this article.
What generative AI can do
Using generative AI, it is possible to streamline routine tasks, assist with creative proposals, create content at zero cost, etc. Using it in business will not only lead to solving problems such as increasing sales and reducing costs, but will also be useful for generating ideas for new product planning and developing new products.
Here, we will explain in detail the three things that generative AI can do and also introduce its benefits.
Improving the efficiency of routine tasks
By utilizing generative AI, we can expect to improve the efficiency of routine tasks. As mentioned above, there are various types of generative AI, but let’s consider the efficiency of work when using “transcription generative AI” as an example.
Transcription-generating AI is AI that can automatically recognize input speech and convert the speech into text. Therefore, in business, it can be used to convert recorded data from meetings into text and save it as minutes or to transcribe recorded data from call center responses and register it in a system.
This eliminates the need for humans to manually create minutes or listen back to recorded data, leading to improved work efficiency.
Assistance with creative proposals
Generative AI can also be useful in assisting with creative suggestions. For example, say a novelist is trying to write a new work, but while he has a vague idea of the main character, he is struggling to come up with a good idea for filling in the details.
In such situations, you can provide some of the information you have in your head to a text generation AI to get ideas for your work.
For example, by inputting information such as, “I’m writing a novel with a male protagonist in his early twenties, and I want the character to have a distinctive catchphrase. Can you give me some ideas for a good catchphrase?”, the AI will then suggest several ideas.
Zero-cost content creation
By utilizing AI, content creation that was previously handled in-house or outsourced can be replaced by generative AI, making it possible to create content at zero cost.
For example, if you are producing a product introduction video as part of your company’s marketing activities, you can completely reduce outsourcing costs by using image generation AI to create the illustrations used in the video that you would previously outsource to external illustrators.
In addition, if you are creating music in-house to be used in videos, you can reduce labor costs and resources by generating background music with voice generation AI. Until now, there has been a demand for “human resources to create content,” but it has been pointed out that in the future, AI may be responsible for the majority of content creation.
Strengthening customer relationships
Generative AI can also help strengthen relationships with customers. By using generative AI to efficiently analyze data such as customer purchase history and preferences, it becomes possible to provide personalized content and products, leading to increased repeat business and long-term loyalty.
Many companies have already introduced chatbots equipped with generative AI to reduce communication costs when handling customer support. Generative AI reduces the workload of operators while allowing customers to smoothly learn how to deal with problems, which is expected to improve customer satisfaction.
It also helps speed up internal communication, such as generating documents, improving them based on feedback, and sharing knowledge.
What generative AI can’t do
Generative AI is merely “AI that can generate original content by layering machine learning through deep learning,” and does not generate content by thinking like a human.
In other words, while they are good at creating “content with characteristics based on learned data,” they are unable to read human emotions and provide original content tailored to each individual.
As mentioned above, “AI that is empathetic to human emotions and can think like humans” is called “AGI (Artificial General Intelligence),” but AGI does not currently exist in reality.
However, the development of AI technology is remarkable, and it has been pointed out that AGI may appear in the near future, much faster than expected. On February 24, 2023, Sam Altman, CEO of OpenAI, the US company that released the conversational AI service “ChatGPT,” released a roadmap out of concern for the impact of AGI on society, and we are now living in a world where coexistence with AGI is expected.
How to deal with the problems of generative AI
There are currently three main concerns about large-scale language models (LLMs) of generative AI, such as ChatGPT :
- Model-dependent output precision
- The risk of hallucination (AI telling plausible lies)
- Insufficient protection against hostile prompts
Hallucination may be improved by improving learning data and experience, but it is difficult to completely prevent it. In addition, if someone or an organization were to exploit it by using “adversarial prompts” that attack language models using prompts, there would be a risk of social unrest.
The development of social infrastructure, such as laws and infrastructure, has not kept up with the speed of AI development, and there is a possibility that many problems will arise that cannot be resolved by law. Until rules regarding the use of AI are established, it is also important to take measures to prevent the indiscriminate dissemination of developed technologies.
Representative examples of services using generative AI
Services that utilize generative AI include:
- Image generation AI: Stable Diffusion
- Text generation AI: ChatGPT
- Transcription generation AI: Whisper
- Commercial copywriting AI: Catchy
- Icon generation AI: Canva
This article will provide a detailed explanation of the overview and features of each service, its specific mechanisms, and examples of use. Please refer to this article if you would like to know more about specific generative AI services.
[Image Generation AI] Stable Diffusion
Stable Diffusion is an image generation AI released by Stability AI in 2022. As mentioned in the “Types of Generative AI” section, it can generate images based on text entered by the user.
When generating an image with Stable Diffusion, first input the image of the image separated by English words. For example, if you want to generate an image with the image of “a girl looking at a beautiful lake”, try “beautiful lake,girl,see”.
In order to generate an image that is closer to your image, it is important to convey the image in as much detail as possible. If you enter only simple English words with a vague image, it is highly unlikely that a highly reproducible image will be generated, so once you become accustomed to it, try creating a prompt using English sentences.
Stable Diffusion is already trained by a machine learning model called the “latent diffusion model,” so users do not need to input special programs or understand complex algorithms.
[Text generation AI] ChatGPT
ChatGPT is a type of text-generation AI developed by OpenAI in the United States and was released in November 2022. It is a service in which users enter and send questions in a text box, and the AI answers questions in a dialogue format as if it were having a conversation with a human.
The language model used in ChatGPT is a model called “GPT” that was designed for use in automatically generating novels and creating in-game conversations. It is characterized by being trained to respond smoothly to complex questions from users by learning from the vast amount of data available on the web.
It also has a feature that remembers past conversations between users and ChatGPT, as well as a function to correct incorrect answers, allowing the system to improve its accuracy as users use it more.
However, as of February 2023, the system has only learned information up to 2021, so it is important to note that if you ask about relatively recent events, it may not give you the correct answer.
[Transcription generation AI] Whisper
Whisper, like ChatGPT, is a service developed by the U.S. company OpenAI. The AI can transcribe voice input, and when you input voice data, it automatically outputs text.
Whisper is a speech recognition model trained on a vast amount of multilingual speech data from the web, totaling 680,000 hours, as “supervised data,” and its transcriptions are highly accurate. It also has high accuracy in transcribing Japanese, with the word error rate published by OpenAI ranking it 6th overall at 5.3%, after Spanish, Italian, English, Portuguese, and German.
Since transcription is possible with only minor revisions to the output text, it is expected to be extremely useful in business in the future, for example for creating meeting minutes or converting recorded data from call center responses into text.
[Icon generation AI] Canva
The online design tool “Canva” offers “Mojo AI,” which allows you to create images and icons by entering text.
Since it can generate high-quality images such as fictional images, it can be used to design icons, banners, brochures, etc. A free account registration is required, but you can generate up to 100 images per day (4 images are generated per request, so you can make up to 25 requests).
Canva has a free version and a paid version, and if you subscribe to the paid plan Canva Pro, you can generate AI images an unlimited number of times.
Generative AI Summary
Generative AI, which can generate creative content, can generate a wide variety of content, including images, audio, and text, and has been attracting attention in recent years from individuals and companies in a variety of fields.
It has the potential to bring about various benefits, such as streamlining routine tasks, assisting with creative proposals, and reducing content production costs to zero. It can also lead to stronger relationships with customers.
As more and more domestic companies successfully develop generative AI, more and more companies are releasing products and services that utilize generative AI. It is no exaggeration to say that the introduction and operation of generative AI according to the company’s situation and purpose holds the key to a company or business.
Even in 2024, various generative AIs have been released, and this is a field where technological developments are expected to continue. If you would like to continuously collect the latest news related to generative AI, please follow us on social media.
FAQ
What is the difference between generative AI and AI?
Conventional AI is called “discriminative AI” and distinguishes whether input data is correct or incorrect based on pre-trained answers. On the other hand, generative AI is capable of generating creative outcomes (text, images, videos, music) from input data.
What are the benefits of generative AI?
The benefits of generative AI include streamlining routine tasks, assisting with creative suggestions, and making content creation zero-cost.
What are the disadvantages of generative AI?
Generative AI produces creative outcomes based on learned data, so it cannot think for itself and create original content like a human being. AI that can think like a human being is called “AGI (Artificial General Intelligence)” and research is progressing every day, but as of July 2023, it does not exist in reality.
❤️ If you liked the article, like and subscribe to my channel, “Securnerd”.
👍 If you have any questions or if I would like to discuss the described hacking tools in more detail, then write in the comments. Your opinion is very important to me!