Diving into the mechanics of generative AI may be stepping into a complex puzzle. At the surface, there’s a clear structure meticulously designed by human intellect over decades. Developers have engineered the connection of neurons and orchestrated the training process of these models. However, as you dig deeper, the puzzle becomes more complex, especially regarding the creative capability of these AI models.
Dean Thompson, a seasoned expert in AI, reflects on this mystery: “We don’t know how they do the actual creative task because what goes on inside the neural network layers is way too complex for us to decipher, at least today.” This statement sheds light on an intriguing reality. While these models’ structure and operational mechanisms are well understood, generative AI’s creative essence is still uncharted territory.
This isn’t just a philosophical problem; it’s a real barrier to advancing AI. The intricacy of neural networks calls for deeper inspection, urging you to unravel the web of artificial neurons and connections that power these innovative machines.
Start with the brain
Understanding generative AI often begins with reflecting on human intelligence, the brain being a natural starting point. In his book “On Intelligence,” Jeff Hawkins postulates that the brain constantly predicts upcoming events, learning from any deviations between its predictions and reality. This continuous prediction and learning cycle mirrors generative AI’s operational ethos.
You may envision the creation of an artificial neural network as the first milestone in this insightful journey. This network is a digital echo of the human brain, with layers of artificial neurons interconnected in a complex web. It’s through this network that data flows, gets refined, and yields predictions, much like the signalling between neurons in our brain. The terms’ weights’ and ‘parameters’ are the linchpins that fine-tune these connections, propelling the network in its predictive pursuits.
The saga of complexity doesn’t end with the structure; it extends into the vast expanse of data these networks navigate. Training emerges as a pivotal phase where the models sharpen their predictive accuracy. The models learn, stumble, and evolve through a cycle of predictions and feedback, inching closer to a refined version of themselves.
Build an artificial neural network
The birth of all generative AI models can be traced back to an artificial neural network encoded in software. Thompson likens a neural network to a familiar spreadsheet, but with a twist — it’s in three dimensions due to the layering of artificial neurons, much like the stacking of real neurons in the brain. Each artificial neuron, termed a “cell,” houses a formula linking it to other cells, emulating the varying strengths of connections between real neurons.
While each layer may host a myriad of artificial neurons, the focus is on something other than their quantity. Instead, the emphasis is on the number of connections among neurons. The strengths of these connections fluctuate based on their cell equations’ coefficients, commonly referred to as “weights” or “parameters.”
These connection-defining coefficients are referenced when you read about the GPT-3 model boasting 175 billion parameters. The latest iteration, GPT-4, is rumoured to house trillions of parameters, albeit unconfirmed. The arena of neural network architectures is quite diverse, with each possessing distinct traits conducive to specific modalities; for instance, the transformer architecture excels for large language models.
The narrative of complexity extends beyond just the structure; it sprawls into the vast sea of data these networks traverse. Training is a crucial stage where the models sharpen their predictive skills. Through cycles of predictions and feedback, the models learn, stumble, and evolve, inching closer to a refined version of themselves.
Teach the newborn neural network model
Embarking on the training voyage, large language models are inundated with massive volumes of text to process. They are tasked with simple predictive chores like anticipating the next word in a sequence or arranging a set of sentences correctly. However, in practice, these models operate in units called tokens, not words.
Thompson elaborates, “A common word may have its token, uncommon words would certainly be made up of multiple tokens, and some tokens may just be a single space followed by ‘th’ because that sequence of three characters is so common.” Each prediction journey begins with a token entering the base layer of a particular stack of artificial neurons; this layer processes it and forwards its output to the next layer, and the cycle continues until the final output emerges from the top of the stack. Although stack sizes vary significantly, they comprise tens of layers, not thousands or millions.
The model’s predictions could be more accurate in the initial training phases. However, after each prediction, a “backpropagation” algorithm tweaks the parameters—that is, the coefficients in each cell of the stack responsible for that prediction. These adjustments aim to increase the likelihood of a correct prediction, gradually refining the model’s performance over time.
Generative AI models
Generative AI encapsulates a broad spectrum of applications grounded on an increasingly rich trove of neural network variations. Despite all generative AI conforming to the overarching description in the preceding section, implementation techniques differ to accommodate various media, such as images versus text, and integrate advancements from research and industry as they surface.
Neural network models leverage repetitive patterns of artificial neurons and their interconnections. A neural network design—for any application, including generative AI—often repeats the same neuron pattern hundreds or thousands of times, typically reusing the same parameters. This repetition forms an integral part of a “neural network architecture.” The inception of new architectures has spearheaded substantial AI innovation since the 1980s, often fuelled by the objective of supporting a new medium. Subsequent to the invention of a new architecture, further progress is often achieved by employing it in unanticipated manners. Additional innovation stems from amalgamating elements of disparate architectures.
Two pioneering and still prevalent architectures include:
Recurrent neural networks (RNNs) surfaced in the mid-1980s and continue to be utilised. RNNs exemplified how AI could learn—and be deployed to automate tasks reliant on—sequential data, that is, information wherein sequence imparts meaning, such as language, stock market behaviour, and web clickstreams. RNNs, like music-generating apps, consider music’s sequential nature and time-based dependencies to form the core of numerous audio AI models. However, they also excel at natural language processing (NLP). RNNs also find applications in conventional AI functions, like speech recognition, handwriting analysis, financial and weather forecasting, and predicting fluctuations in energy demand, among various other applications.
Convoluted neural networks (CNNs) emerged about a decade later. They target grid-like data and, therefore, excel at spatial data representations, capable of generating images. Popular text-to-image generative AI apps, such as Midjourney and DALL-E, deploy CNNs to generate the final image.
The effort to improve RNNs led to a significant breakthrough: Transformer models have become more adaptable and potent to represent sequences than RNNs. They have several traits that allow them to process sequential data, like text, massively in parallel without losing their grasp of the sequences. This ability to process data in parallel is key to enabling ChatGPT to respond swiftly and effectively to straightforward, conversational prompts.
Why is generative AI important?
Imagine generative AI as your creativity calculator, easing content creation. Like how a calculator handles routine maths, leaving you to focus on complex calculations, generative AI can take care of the monotonous subtasks embedded in much of the knowledge work, allowing you to address the more intricate parts of your job.
This form of AI becomes a formidable ally for marketers who often encounter challenges deriving actionable insights from disorganised, inconsistent, and disconnected data. Usually, the first step would be to consolidate this data, requiring a substantial amount of custom software engineering to combine diverse data sources like social media, news, and customer feedback into a common format.
Basim Baig, an experienced engineering manager at Duolingo focusing on AI and security, sheds light on this, “With LLMs, you can directly feed information from various sources into the prompt, then ask for key insights, or which feedback to prioritise, or request sentiment analysis—and it will just work.” The advantage of LLMs here is in skipping the extensive and expensive engineering step, greatly simplifying the process.
Looking further, Thompson hints at the possibility for product marketers to utilise LLMs for tagging free-form text for analysis. Suppose you have a vast database of social media mentions about your product. You could design software that utilises an LLM and other technologies to extract the primary themes from each social media post, group the unique themes from individual posts into recurring themes, and identify which posts underline each theme.
Following this, explore the most frequently recurring themes, track their trends over time, and engage an LLM to explore deeper into a recurring theme for mentions of specific product characteristics. This way, generative AI simplifies data analysis and uncovers pivotal insights that could be instrumental in shaping product development and marketing strategies, ultimately propelling your business forward in a competitive market landscape.
In this rapidly evolving digital landscape, generative AI emerges as a beacon of innovation, poised to redefine how we interact with data, derive insights, and navigate the complexities of the digital world. The journey of generative AI is like exploring a world full of potential, with new possibilities at every turn.