"The computer was born to solve problems that did not exist before." — Bill Gates

AIs and LLMs are no exception. You might love them or hate them, but if you’re reading this blog, I’m sure you’re technical enough to have used one in the past four years or so.

Even though I’m studying electrical engineering, there’s a soft spot in my heart for Artificial Intelligence and Machine Learning, especially Large Language Models. I’ve worked with LLMs long enough to see both their brilliance and their blind spots, and I have talked to many, MANY people about them from technical experts to startup bros. But almost always, the person I was talking to had some misunderstanding about LLMs that blurred the line between what they are and what we want them to be, no matter their expertise with it.

Now I am a Bachelor’s student and I have 0 research papers in my name (yet). Some of the people I have talked to have worked with LLMs in industries, or at academia, and they also had misconceptions and misunderstandings which they found were obvious after I pointed them out. So I’m gonna quickly list some of them here.

“AI Can Think”

No, they can’t. This is a classic. Now before I say anything, you can start a 7 hours of argument with me about the philosophical question of what “thinking” is, and do humans even think? But for that, you need to buy me a coffee (or a beer, depending on what day and where we are).

But LLMs are overengineered “Autocomplete” software, you know, the one in your smartphone’s keyboard. It is a mathematical model that looks at what you have written and mathematically predicts what the next word should be. When it is done right, it feels like it is a machine that has somehow learned to think and understand, which is a very good illusion. After all, science when done right is indistinguishable from magic.

But sadly no, this does not imply that the model can “think” or “understand”. It is just mimicking what it has learned in the training data and autocompleting new data with some clever but basic statistics.

I asked an LLM (ChatGPT 4o December 2024) to come up with an analogy about this, and it said “Imagine a parrot trained on every conversation in the world. It can mimic any dialogue perfectly. But it doesn’t understand the meaning—it just knows what sounds right.” - which is also a clever analogy. But I choose to ignore this analogy as the philosopher among you might start arguing that we cannot prove parrots cannot think or understand what they are saying, and humans are also effectively doing the same since birth, mimicking what we see, and so on and so on, so it’s better to avoid the trap and just stick to the mathematics.

“AI is Dumb”

Now this is a fiery one. If I had a penny for every time I was in a scientific seminar, a hackathon, or a discussion and I heard “LLMs in this decade are dumb so they cannot do XYZ accurately”, I’d have my own data centre by now.

LLMs are not dumb. That’s because they aren’t smart either. There is nothing coherently intelligent about Machine Learning Models or Transformer Models. Saying an LLM is dumb is the equivalent of saying “My pen that I write with is dumb”. Of course it is, but just because there is no expectation of it being smart, does not mean it is dumb. It just means that - there is no expectation of it being smart.

LLMs are very amusing statistical models that do one thing and one thing only - predict the next words. Now of course if you write a program to predict the next words, it’s gonna be awful at that because you are not taking into account the hundred other clever nuances we have implemented in it. There is a lot of interesting science behind attention mechanisms, transformer architectures, and more which makes LLM good at doing what they do, which if you haven’t picked up yet: “predict next words”.

If you think LLMs are dumb because they cannot do a particular task, it probably means it was not created to do that particular task. Imagine trying to paint Monalisa and calling your pen dumb because it is not producing the proper colors or strokes, or imagine using Google to find American Nuclear Secrets. Just because those are not searchable in Google (I hope not), doesn’t imply Google is dumb (or smart if you found something you need).

“AI will evolve into AGI

This is going to be a long one and also break a lot of hearts (of many that I know personally who are researching AGI, so I’m sorry but it’s true).

Now this is not ENTIRELY wrong, there is some truth to it if you phrase it correctly. But what I mean is current AI as it is right now, will never produce something that it has not seen before, unless there is a paradigm shift.

Now I’ve already mentioned: that AIs are statistical models that predict the next data based on millions of data it has already seen. This is not anything revolutionary, this is the definition of science. If you do not know about the basic scientific method: we create a hypothesis (a statement we believe is true), collect data regarding that, try to find a relation in that data using statistical models like regression, and then run that model on newer data. That is how science has been conducted as long as there is history of science.

AI is no different. Not analogically, but literally it is the same mechanism. And just like how in a statistical model you will never ever get a (meaningful) output on an input which it was never trained on, you will never get anything new from an AI either.

This is very simple yet I have seen this misconception in many technical experts as well, so I’ll give an example that is completely outside the field of the magic of AI, maybe it will be easier for you to understand.

In medical science, the way we diagnose diseases is, that we have historical data of patients who had a specific disease, let’s call it disease A. We collect all types of data, bloodwork, genetic mutations, symptoms, and all the good stuff of 100 patients who have confirmed disease A in a big spreadsheet. We know beforehand, that these data somehow relate to disease A and the idea is we want to get the same data from another patient who is just admitted and try to find out if they also have disease A or not.

So what we do is, we run statistical models on the spreadsheet where we know that the patient has disease A. For example, we might see that all the 100 patients who had disease A, also had an increased TSH level, and white blood cell count between a certain range, compared to that of a healthy human being. So we know now, that if someone has increased TSH levels and high white blood cell count, they probably have disease A, and we now standardize that test, call it “A1-Test” in all hospitals. Sounds good? Kudos to scientists.

Now, something that we did not know, is that people with disease A, also have a weird anomaly: they all prefer a specific brand of coffee. Now in no period of the research when the A1-Test was developed, we ever asked the patients about their preference of brand of coffee. So there is no way, the A1-Test will be able to use this information in a way manner, or form. Neither will it take preference of coffee as a useful input, nor can it say with confidence that “this person prefers this specific brand of coffee because they have disease A”. Because the test does not know it. It is not a factor that was ever calculated, even though it is a true correlation.

As soon as some clever scientist figures it out and implements it in the test, and calls it A2-Test, which is trained on preferred coffee and disease A, only then it will be able to pick it up on that pattern. This means, that statistical models are NOT able to use or produce data which it has never seen before. It is not a technical limitation, it is a paradigm limitation. No matter how strong your computational power is (it could be infinite), or how good quality or quantity of data you have (you could have blood works of infinite patients with disease A), if you do not factor in a specific data point, you will not be able to recreate it.

There is a very clever way to argue about this: genetic algorithms. I asked about this misconception to a very renowned professor whom I respect a lot at the University of Oulu during Arctic AI Day 2024. And he gave me a nice answer that genetic models or evolutionary models can produce or recreate something it has not seen before. This is true, there is a particular statistical model called “genetic models” which kind of work how evolution works. Genes adapt to new situations and mutate to factor in new data, kind of how life evolved into land as the water level of the earth decreased.

This response, initially threw me off. I am going to be completely honest, I was a bit caught off guard by that response, and it disturbed me for a week. But after pondering this concept for a while, I realized even in genetic models, the fundamental principle holds: the mutations or adaptations aren’t random bursts of pure novelty—they are responses to existing environmental pressures. Evolution doesn’t invent entirely new building blocks out of thin air; it reshuffles, refines, and optimizes the material that already exists. Mammals did not evolve to have lungs just because it was cool to have lungs, they evolved to have lungs instead of gills because the environment (data) showed a correlation between having a sac of air with survival rate compared to having gills on land. There was a distribution where the outlier species who had cancerous mutations of air sacs in their body survived more than the ones who did not, and over time the distribution settled and the dominant species were the ones with lungs.

Genetic algorithms, inspired by this process, work the same way. They iterate, select, and mutate based on predefined fitness functions, which means they still operate within the constraints of what they know to be beneficial. This approach is very powerful, yes. It can generate art, code, and even help discover new drugs. But it cannot question its framework, nor can it transcend the boundaries of its training data.

“AI will take over my job”

This is not a misconception. This is true. They will take over your job, but not in the way you think it will, where the misconception lies.

This is a continuation of the previous misconception, read that one first if you haven’t already.

By now, you know what an LLM is: a statistical model. Like every statistical model, they also have a distribution, let’s assume this one to be a normal distribution.

LLMs are trained on one thing only: text data. (I’m ignoring omnimodals/multimodals right now because of the simplicity of understanding, but this explanation is valid for any sort of model or viewer’s exercise?). It reads billions of articles and code and text and produces a normal distribution of the relation between the words (actually tokens, but for simplicity, we are going to use words).

What does it mean? Well, it means, that based on the training data, we have a very objective distribution of every word and sentence of the language, for example, we know objectively that in the sentence “AI stands for Artificial ________”, the word in the blank will be “Intelligence”. This is fully objective and measurable. Think of a graph where on the X-axis are all the words of the dictionary, and on the Y-axis is the probability of that word being in the blank. For this example. The word intelligence will be in the middle with 100% probability unless someone made a typo or the data is poor.

Normal distribution showing probability of 'intelligence' in 'AI stands for artificial ______'

Let’s think of another example where the choice is not that objective, and there might be multiple correct answers: “The sky is ______”. If you ask 100 people, some will say “blue”, some will say “cloudy”, “clear” or many more. But it is still not subjective. It is still objective. No one will ever say “The sky is burger”. It does not make sense, there is no such data. But this time, there might be a more common Gaussian distribution, with “blue” in the middle with 60% confidence, “cloudy” and “clear” with 19% and 20%, and some other obscure words with 1%. But with certainty, we can say that “burger” will have 0%.

Normal distribution showing probability of 'blue', 'cloud' and 'clear' in 'The sky is _______

Where am I going with this?

Well, what do you think about the text or code that is available on the internet? It’s mediocre at best, and something public and open to all. LLMs are trained on whatever data we can find on the internet, and NOT on data of sophisticated software or documentation used in industries kept closed and secured as intellectual property. This means, the normal distribution model, will never produce something, that is completely innovative, industry-standard, or novel. It will always be stuck to the mediocrity of publicly available data on which it is trained.

Yes, there is high-quality, industry-standard code publicly available, and there are high-quality texts from literature and research papers. But if the AI was trained on it, it means anyone can train on it. If the LLM model somehow learned to make good websites with ReactJS by reading the publicly available documentation and publicly available projects of other people, what’s stopping you from learning as well? It’s a public resource, you just have to find them and understand the patterns, and you are future-proof. On top of that, if you’re applying for a job, the company needs someone who can bring innovation and novelty, and not someone who just knows how to write mediocre reactJS and make things that already exist in the public domain.

Two common arguments in this section are: “AI can learn faster than me and use multiple languages.” They cannot. To teach an AI what a spider looks like, you need to show it thousands of spiders before it can even start guessing half of the time. To teach a human, we just need to see one spider, or two.

“Companies can train AIs on their internal properties”. Yes they can, but again, training an AI to do something specific needs an insane amount of data, which no company will ever have, compared to what’s available in the public domain. It will be slightly better than mediocre, but not enough to replace humans.

There is also a very funny analogy from streamer ThePrimeAgen - imagine a company made an AI so powerful that it can create anything. Why will they ever release it to the public or sell it to others and why will they not use it to create businesses after businesses?

TL;DR

  • AI can’t think—it predicts based on data.
  • AI isn’t dumb or smart—it just does what it’s designed to do.
  • AI won’t become AGI—but primarily those that can be reduced to patterns and repetition.
  • AI won’t make humans obsolete—it will challenge us to redefine what it means to be uniquely human.

Personal Take

Working with LLMs has been a fascinating journey for me. The following is completely my own opinion and I do not resonate or represent any entity in my blogs. I am currently working with a very focused group at a startup to explore using LLMs for mental health. It might sound fishy to an avid technical person at first, but if you understand the statistics, a lot of mental health can be automated. Cognitive behavioral therapy is a scientific psychotherapy where a patient suffering from some specific mental health diseases, can be treated by prompting them some specific measurable questions and responding via some other specific measurable answers. This can be automated with LLMs, it is inherently how LLMs work: predict the next word based on the previous word.

But in today’s world, it’s easy to get caught up in an AI’s seemingly intelligent responses and mistake them for something more than they are. I’ll admit, even I’ve been tempted to think of them as more than just predictive text generators. But the truth is, they’re not sentient or capable of true understanding—they’re tools, powerful ones, but tools nonetheless. Tools that can be used to progress the world one step further.

But like any tool, they have limits. “It’s not the tools you have faith in. Tools are just tools. They work, or they don’t work. It’s the people you have faith in or not.” – Steve Jobs.