At the Arteries team, we are constantly testing and exploring the concrete practical applications of today’s AI technologies that we can integrate into our applications and web systems. In this series of articles, we’ll take a look at areas where we can help our clients increase the efficiency of their business processes, improve their productivity or reduce their costs with AI, but here in the first part, allow us to give you some foundational, introductory paragraphs. Brew up a coffee and let’s start! 😉
The subfield of machine learning, artificial intelligence, has undergone an extraordinary evolution in recent years. This has been made possible mainly by large, high-quality, annotated data, the computing power of accelerating hardware and new, smarter algorithms and procedures.
Neural networks:
All of today’s state-of-the-art machine learning (“AI”) programs rely exclusively on artificial neural networks, where neurons are connected in a weighted way, layer by layer, in different architectures. The neurons (nerves) are able to build on the results of other neurons and so on, similar to the biological brain. The mathematical formulation is that a neural network is a “universal function approximator”, meaning that given knowledge of many input-output or effect-response data pairs, it can become the rule (function) that produced the output as a “function” of the input. Since everything in reality is governed by quantifiable functions, from the laws of physics to the workings of society, neural networks can theoretically do everything. In practice, of course, their usefulness is limited by the size of the network, its structure (architecture) and how well the quality and quantity of the data used to teach it can be used to infer and generalise about the rule(s) that generated it.
Let’s look at a simple example: the way neural networks work is similar to how a child learns to ride a bike. Initially, there are many mistakes and falls, but over time the brain ‘sets itself’ to the correct movement and cycling becomes automatic.
Teaching:
During the learning process, we iteratively modify the internal structure of the mesh thousands of times, based on a precisely quantifiable performance metric, so that it performs better and better in small increments for a given task, which can be anything. It is mostly used to reproduce known data so that it can predict the expected output for new inputs never seen before. This output can be e.g.: categorization, decision, prediction, time series, image, sound, text or anything that can only be quantified. This is the principle of supervised learning, which is the most common method used today. Another form of learning is unsupervised learning, where there is even less data about the rule to be learned. Here, the network can only rely on measures of efficiency, e.g. how fast a self-driving car reaches its destination without damage or rule violation.
Perhaps the most fundamental principle in the world is the “principle of least impact”: that is, of all possible outcomes, the one with the least “energy waste” will occur. This principle also applies to the learning of artificial neural networks: the network will converge into the structure that returns the data to be learned in the simplest way, with the least amount of work. If the size of the network is too large (hence its potential cognitive capacity), or it can learn from too little data, it will simply “memorize” them, because that is the easiest way. Then, of course, it scores very well on the memorized examples, but gets completely bad results on new data, making it unusable for real-world tasks. In practice, we never validate its real accuracy on the data used for teaching, but test it on new examples never seen before, thus ensuring good generalisation (overfitting prevention). In the process of choosing the teaching and mesh parameters, we must apply constraints such that the mesh has no choice but to really understand the rules deeply embedded in the data. The optimal choice of these mesh and learning parameters (e.g. learning rate, number of neurons and their connections (weights), number of “layers” stacked on top of each other and processing the output of the previous neurons) is not automated and requires great care by developers, because learning is a very resource intensive, costly and time consuming process! After teaching, the mesh no longer learns. We can think of it as a frozen brain in a time loop, per use. Single-use, i.e. computing output from new input data (inference), already requires many orders of magnitude less resources than teaching!
Imagine the teaching process as a school learning process. Just as a student repeats and practices material, the neural network “practices” until it reaches the desired performance. This can be illustrated by a graph showing the relationship between the number of learning iterations and performance.
The percentage accuracy and the number of training iterations are mostly logarithmic. This means that very impressive results can be obtained very quickly, but above a certain point, the numerical accuracy only increases incrementally, gradually slowing down, making it very difficult to predict or estimate the evolution of an AI model! At the beginning of the learning, in the upward branch, it may naively seem that the neural network will solve the problem completely in almost no time, which can easily lead to overly optimistic estimates! We might also think that we will soon reach an optimum point where the efficiency/resource-input ratio is maximized and from there on we can only continue learning with diminishing returns, but this is not quite the case!
Using the example of a self-driving car: if it has to make 1 decision per second with 99% driving accuracy, it will cause an accident in ~1 minute (69 sec) with 50% probability! For a 99.9% self-driving AI, this increases to ~10 minutes (690 sec), while a 99.99% accurate car is ~2 hours (6900 sec) likely to have an accident! The change in accuracy is incremental, but the real impact is very significant! The same logic applies to a content filter or quality assurance AI! It is very important to look at their effectiveness and their real application potential with a proper, statistical eye!
The information age:
The 21st century is the age of information, a precursor to the age of knowledge. Throughout human history, a great deal of data and knowledge has accumulated on all subjects, and the best way to process it is through machine learning. Since the data to be learned is mostly created or influenced by humans, as a result of intelligent, mostly logical processes, the neural network itself has to learn and apply these logical, intelligent processes to reproduce the data!
Ultimately, intelligence is the observation of observed effects, the logical, causal understanding of their mechanism of action, and the subsequent, purposeful application of that understanding to new, more complex processes, and so on (this layering is one of the main strengths of human intelligence).
With sufficiently large and many layered (deep) networks, given a huge, diverse amount of data, and with long teaching, artificial neural networks can also become capable of intelligence processes.
The most effective way to understand reality is through logic, rationality and intelligence, so if a neural network needs to understand a lot of real data, the easiest (and ultimately only) way is to make itself work on these principles and become intelligent to some extent! Then we can really only talk about artificial intelligence.
Based on the phenomenon of emergence, a new structure can emerge from the combination of simpler components (atoms, or in this case virtual neurons) that is capable of much more than its components alone. Intelligence is one such complex phenomenon.
The use of tools and the technology that has enabled and developed them has increased humanity’s capabilities to such an extent that we have become the dominant species on the planet and have made things unimaginable commonplace. However, it is not the tools created that are the most important or powerful, but the intelligence that created them! The translation of this intelligence into instrumental form, the lifting of its limitations and automation, is perhaps humanity’s greatest and last necessary work.
The parallel between the human brain and artificial intelligence is interesting and instructive. Just as the human brain is capable of learning and adapting, so too is artificial intelligence capable of absorbing and applying new information.
Today’s AI:
Today’s AIs are mostly small, specialised for specific tasks and at the lower end of the definition of intelligence (image recognition, classification, classification, etc.), but there are already very intelligent developments with broad capabilities that border on universal artificial intelligence (AGI). These are mostly large language models (LLMs), the essence of which is to predict the next “syllable” (token) of a human-generated text. Solving this task at a high level is almost impossible without intelligence, and there is a quasi-infinite (unmodifiable) universal resource pool for teaching. The interpretation of words and their interrelationships, the real concepts they briefly define, also facilitate the subsequent construction and layering of knowledge. Just as speech helped humans become intelligent, it also helped AI. The result is that GPT language models and their companions have practical intelligence and capabilities that already outperform a significant fraction of humans!
The use of AI:
In theory, machine learning can be applied to any task, but in practice there is limited data and computing power. Simple, traditional, algorithmic programs are not worth replacing with AI, but they are very effective in analysing knowledge and in mental tasks that humans can perform reflexively, such as image recognition. This in itself has huge automation potential! We can think of AI as a free and instantly working, highly knowledgeable apprentice with unlimited potential. The job of AI developers is to exploit this potential to the highest possible level and teach it new skills and knowledge.
The application of AI:
As we have already seen, learning from scratch is an extremely expensive process, and the more tasks an AI can do, the more intelligently understood concepts it can understand, the easier it is to acquire new knowledge because it can use existing knowledge. This not only allows for better results, but also for understanding from less data, as it does not need to rebuild all the necessary prior knowledge or skills limited to just that.
At an AI is better the less data it has, the faster it can make deeper and more universal connections, and this is very similar to the concept of IQ.
As with humans, we’re not trying to teach newborns to program, we’re trying to teach people who are already educated. In the same way, we are not writing a new programming language for a new project.
If you want AI for a new task, it is always better to start from an existing one and teach it, adapt it. Open source developments are the best for this, because we have full access to the mesh with all its internal parameters, and there are many standard existing tools for their use and integration.
In practice, although it is important to know how the meshes work, all the necessary computations are already written in several software libraries. Using them and processing the data to be taught is the most important!
Data processing:
In order to use a neural network, we need to be able to formulate the task to be learned in its “language”. Because of its digital nature, the data must be converted into quantified, bounded-range inputs and outputs with a fixed number and index, and the structure of the network must be chosen accordingly. It always depends on the task at hand, e.g. for image recognition, convolutional meshes are best, where the input neurons are the image pixels
Use for new problems, without teaching:
A neural network can basically only be applied reliably to the problem it has been trained for (interpolation), but it can also operate beyond that to some extent and in relatively close task domains (extrapolation). The wider the range of skills and knowledge, the further it can extrapolate. This, by the way, is also a very important factor for intelligence, because the acquisition of new knowledge and skills is also incremental, the result of many small iterative steps. AI works in a similar way to the human brain here.
This gives us the opportunity to use it for new tasks, without traditional fine tuning teaching! In this case, we do not modify the internal neural network structure, but build on the knowledge we already have, by choosing the input data wisely!
This can be done in several ways:
If we add a new category among similar categories, we can observe the activation of existing categories (e.g. in the case of a truck, the output neurons assigned to the categories “car” and “container” show higher but not full activity). The same is true for time series or trend analysis: in a small range outside the known pattern, we can still get relatively accurate results.
In another case, the problem is broken down into simpler, already known subtasks and runs on these elementary parts.
If the input of the mesh allows multiple inputs at once (e.g. long text) or has a short term internal memory (recurrent mesh architecture) then we can show it several new, already solved reference examples before the new one to be solved, thus providing support and increasing its efficiency. This is a proven technique for large language models (LLM), which is part of the prompt engineering discipline.
Running AI and options:
Teaching basic models often requires a super computer, but fine tuning can be done on more powerful personal computers. However, running (inference) is possible even on much smaller hardware.
Of course, running the biggest and smartest nets requires massive servers and can only be accessed via the internet. Within these broad limits, however, there are many possibilities that can be accessed, taught and used even by small companies or individuals!
In many cases it is cheaper or not feasible to run cloud-based AI, in which case a smaller model (edge computing) has to be run locally. For example: unsupervised or real-time decision making in remote locations away from the network, or handling private data.
We’ve explored it so far in this article, we’ll continue in the next one…
Summa-summary…
Artificial intelligence and machine learning are tools that are now available to everyone. They are not only revolutionising the world of technology, they are also changing our everyday lives in ways we hardly notice. Those who are involved are also “invisibly” gaining a major market advantage. As the technology continues to evolve, the integration of AI into software will become more and more of a requirement, even an absolute imperative, to maintain competitive advantage. AI is not just a force for the future, it is already playing a major role in the present – in our next blog post, we will detail some of the areas of AI applications and their benefits. We’ll talk about specific applications of language models, the differences between them, vector databases, langchains…etc.
The Arteries team is ready to help you understand and implement these new technologies so that our clients can make the most of them. If you want to take advantage of AI and learn how to apply it to your business, contact us!