In today’s column, I continue my ongoing analysis of the latest advances and breakthroughs in AI, see my extensive posted coverage at the link here, and focus this discussion on a recent research study that cleverly leaned into agentic AI to devise a kind of automated end-to-end science researcher capacity. It is altogether impressive, thought-provoking, and a quite handy early-bird showcase of what can smartly be done with modern-times generative AI and large language models (LLMs) when astutely stitched together.

This is decidedly a stitch in time saves nine proposition, as you’ll soon see.

I will do a deep dive into the innovative approach and speculate on how this impacts the future of AI. Just as a heads-up, there are a number of existing bumps in the road and some gotchas that are important to be aware of concerning these matters. Those will get singled out in my discussion so that you can see where limitations, weaknesses, and challenges are afoot.

If that seems a bit gloomy, I certainly don’t want to underplay the happy-faced merits of the approach. The work the researchers did is laudable and deserves sufficient airtime. I hope too that other researchers and everyday practitioners will embrace the key tenets and use the ideas and novelty to make further progress. That’s how advances in AI keep advancing.

Cinch up your seatbelt for a great ride.

The Scientific Process That We Know And Cherish

Let’s start at the keystone of the subject matter at hand, namely the process of conducting science and seeking scientifically based discoveries.

I’m sure that you’ve been indoctrinated in the revered scientific method or oftentimes coined the scientific process, which we all learn about in school at the earliest of age. It goes like this. First, you come up with some idea or question that you think is worthy of scientific inquiry. The aim is to establish what you believe is valuable for pursuit. Some ideas or questions will be worthwhile, while others might be of little value and can be set aside.

You ought to next conduct some heads-down research into the relevant scientific literature to find out what’s already known and what is yet to be figured out on the chosen topic. That’s doing your homework before launching whole hog into doing a scientific study. No sense in reinventing the wheel or otherwise pursuing something that is a dead-end (well, unless you have some creative take or out-of-the-box perspective that hasn’t been tried before).

Once you’ve reached a point of belief where further pursuit makes sense, you can write up what you think seems to be a well-devised theoretical postulation, perhaps leaving the groundwork for others to undertake to test out your theories or hypotheses. I dare say that much of the time the temptation is great to go ahead and devise an experiment or series of experiments exploring the topic. You want to see with your own eyes what happens when the rubber meets the road, as it were.

Performing the delineated experiments is satisfying though can also be grueling. We hear about those experiments that yield a eureka moment and perhaps assume it happens all the time. Nope. A frequently noted claim is that Thomas Edison supposedly tried over 2,000 experimental attempts before landing on the carbon-filament bulb that became one of his most famous inventions.

Experimenters know that the reality of experimentation consists of arduous and tireless efforts. It isn’t all roses and champagne. Thorns are aplenty. Furthermore, the final result might be disappointing, and you will have a devil of a time explaining why you did not come out with an earth-shattering crescendo of a great discovery. This can be soul-crushing.

No matter what the result was, dutiful scientists write up their efforts and publish an analysis of what occurred.

Why?

Because that’s how scientific progress is made.

You want to aid others in knowing what seems to work and what doesn’t. They can then take up the mantle and try a different angle or pursue something else instead. One of the saddest aspects is when research that didn’t catch the brass ring is never described and shared. I know it is tempting to be mum when your work seems to have been a dud. Keep a stiff upper lip, analyze, write, and post. You are aiding humankind and can hold your head high accordingly.

I’ll add an additional step that doesn’t necessarily get much attention during those routine lectures and class discussions on the scientific method. The perhaps less apparent step is that when you provide your write-up, the write-up should be suitably reviewed. The review ought to be done by those who have expertise on the subject matter.

Part of the reason for doing reviews is to try and ensure that whatever is stated purports to make sense and is legitimately worth considering. You want to definitely sort out the wheat from the chaff. Suppose the experimenter or scientist lost their head and skipped vital scientific process steps. Just because a scientific write-up exists does not mean that the work was done soundly.

Another valuable purpose for reviews is that the scientist will hopefully take to heart the feedback that is provided via the reviews. Maybe they will incorporate the commentary into their ongoing scientific endeavors. Improvement ought to be the hallmark of all scientists. There is always room for doing things in better ways.

Okay, I think you get the gist of the above rough cut of the scientific process.

No doubt you’ve heard and seen it many times. Over and over. I’d bet that you can likely recite the steps in your mind and perhaps even while asleep. Do you dream of the scientific process? Well, that might be a bit of a stretch. Make sure to go outside and enjoy a hike or run.

Generative AI As An Aid For The Scientific Process

For the moment, put a mental pin into all this talk of the scientific process. We’ll come back to it momentarily. I want to shift gears and discuss AI, specifically the type of AI that is known as generative AI and large language models (LLMs).

I’m sure you’ve heard of generative AI, the darling of the tech field these days.

Perhaps you’ve used a generative AI app, such as the popular ones of ChatGPT, GPT-4o, Gemini, Bard, Claude, etc. The crux is that generative AI can take input from your text-entered prompts and produce or generate a response that seems quite fluent. This is a vast overturning of the old-time natural language processing (NLP) that used to be stilted and awkward to use, which has been shifted into a new version of NLP fluency of an at times startling or amazing caliber.

The customary means of achieving modern generative AI involves using a large language model or LLM as the key underpinning.

In brief, a computer-based model of human language is established that in the large has a large-scale data structure and does massive-scale pattern-matching via a large volume of data used for initial data training. The data is typically found by extensively scanning the Internet for lots and lots of essays, blogs, poems, narratives, and the like. The mathematical and computational pattern-matching homes in on how humans write, and then henceforth generates responses to posed questions by leveraging those identified patterns. It is said to be mimicking the writing of humans.

I think that is sufficient for the moment as a quickie backgrounder. Take a look at my extensive coverage of the technical underpinnings of generative AI and LLMs at the link here and the link here, just to name a few.

The reason I brought up generative AI is that you might be pleasantly surprised to know that generative AI is quite useful throughout the scientific process. Scientists can leverage generative AI in a multitude of ways. Easy-peasy.

For example, suppose a scientist is trying to come up with ideas of what to conduct research on. They can log into a generative AI app and engage in a discussion with the AI about various potential ideas. The AI acts like a sounding board. Assuming that the AI has encountered relevant data training on the subject matter at hand, the scientist can carry on a very useful interaction of the ins and outs of the ideas being considered.

The same can be said for the rest of the scientific process.

Want to do an in-depth literature search? Depending upon whether the AI has access to the Internet and/or has been fed relevant documents, a scientist can do their literature search hand-in-hand with generative AI. This can be a time saver.

Need to conduct experiments? Generative AI can provide suggestions on how to perform your experiments. You can use generative AI to keep track of how the experiments are going. There is also the possibility of having generative AI do assessments of the experiments, especially helpful if done while still underway with the experimentation. A scientist can adjust based on the assessments and possibly change direction so that the experiments will bear greater fruit.

The most obvious use of generative AI in this context would be to use AI for assistance in writing up the results. My guess is that most scientists tend to start with their generative use in that manner. They know they have a long haul ahead of them in terms of doing a painstakingly laborious write-up. The use of generative AI can turn writing drudgery into a less onerous task, plus generative AI is notable for being able to write fluently. Some scientists relish the fluency and editing capabilities of generative AI.

In case you don’t already know, there are heated discussions about whether scientists should be using generative AI to aid in writing their scientific papers. One viewpoint is that this is a form of cheating and that science needs to be written by humans. Others insist that since we already acknowledge and accept the use of word processing, which double-checks grammar and spelling, it is a logical extension to use generative AI as a writing tool. The usual retort is that spellchecking is a far cry from writing the whole kit and kaboodle.

Back and forth this goes.

One realm that has been mired in searing controversy is the use of generative AI to review scientific papers. A word of caution for you. Watch out if you bring up this topic with a scientific friend or colleague. Some ardently oppose the use of generative AI for evaluating the work of scientists. It is an outrage. Fellow experts are the right reviewers, not some gosh darn AI. There are various views on this. Some believe that if a human expert does a review and uses generative AI to assist, that’s fine. The problematic circumstance is said to be the use of generative AI as a reviewer without having a human reviewer in the loop.

Let’s not get bogged down here on those controversies here and now, instead, my core indication is that generative AI can be an aid to humans throughout the scientific process.

Do you agree?

I would guess that you would.

It seems innocent and straightforward. Humans use tools, such as hammers, shovels, picks, and the like. We use cars, ships, planes. It stands to reason that we can use a tool for aiding scientific research. The tool in this case would be generative AI and LLMs.

Period, end of story.

When Generative AI And Agentic AI Meet Up With Each Other

The story does not stop there. Keep reading. We might have prematurely assumed that generative AI is only an aid or side hand.

It is time to get to the grand reveal.

Imagine that we put together generative AI that would undertake the scientific process and do so without having a human in the loop.

Say what?

Yes, I am suggesting that we could set up generative AI and LLMs to do the entire A-to-Z soup-to-nuts steps of the scientific process. No human needed to perform any of the steps. All of it is being undertaken by AI.

Humans exit stage left.

Mull this over.

We could have generative AI automatically come up with ideas for scientific research. The generative AI could do an extensive literature search to figure out the merits of the idea. Generative AI could potentially carry out experiments, which is perhaps one of the trickiest elements, and we will unpack that further.

Generative AI could assess the experiments. And generative AI could do a write-up about the research (this point is somewhat obvious because of the comment I mentioned previously that human scientists will at times turn to generative AI to do their write-ups for them). We can also have generative AI review the write-ups that were done by generative AI.

Voila, the end-to-end use of generative AI for conducting scientific research.

Is that farfetched?

If you were hoping that I might say that it is only attainable in the far-ahead sci-fi future cause otherwise the prospects seem scary, you’d better take a sip of that wine that you were supposed to obtain. The correct answer is that it is entirely plausible and well within today’s capabilities, namely crafting together generative AI to do the whole end-to-end scientific process.

There is an added angle that comes into play.

It has to do with agentic AI.

You might vaguely know that the latest and hottest new trend in AI consists of agentic AI, see my coverage at the link here. Agentic AI is the concept and practice of devising AI that will act as an “agent” and perform tasks or a series of tasks accordingly. In some cases, you merely give an overarching goal and the agentic AI proceeds thereafter. Other times, the goals are formulated via the AI, which then proceeds to carry out whatever comes next to attain those goals.

A human analogy is often used to explain this agent-like concept. Imagine that you go to a roofing contractor and tell them you need to have a leaky hole in your roof fixed. Acting as an agent of sorts, the contractor will figure out what needs to be done and carry out the actions required. They might decide that some new roof shingles are needed and visit the local home improvement store to purchase the items. They then come to your house, climb up on the roof, install the shingles, and do a rain test via hose and water. When all is done, they present you with a bill for the service rendered.

That’s the idea of what an agent consists of.

We can somewhat do the same with AI in terms of establishing AI or a series of AI components that will act in an agent-like manner to undertake a complicated set of steps. There doesn’t necessarily need to be a human in the loop during those AI-performed steps. The AI could be launched and continue until the end or could alert a human if something midway has gone awry or needs special attention.

One crucial point about this is that you should be cautious in going overboard about how far these AI-based agents and their so-called semblance of agency can go, see my analysis of legal personhood for AI at the link here. I bring this up because, sadly, there are some brazenly proclaiming that AI agents are on par with humans such that the AI is essentially sentient.

Let me set the record straight. None of today’s AI is sentient. Sorry to break the jarring news to you.

Boom, drop the mic.

I mention this since there are lots of headlines that seem to proclaim or suggest otherwise.

AI is a mathematical and computational construct or mechanization that just so happens to often seem to act or respond in human-like ways. Be very careful when comparing AI to the nature of human capabilities, which for example I delicately cover and differentiate in my recent discussion about inductive and deductive reasoning associated with AI versus that of humans, at the link here. Another handy example is my recent coverage of AI that purportedly has “shared imagination” with other AI, see the link here.

The gist is that there is way too much anthropomorphizing of AI going on.

Agentic AI is a great metaphor and aspirational aim. We can devise AI that works based on goals and drives forward through a multitude of complex steps to do so, absent of a human having to push things along at each step. This does not mean that the AI is thinking or otherwise functioning on a sentient basis.

We will in the next section get into some drawbacks associated with today’s agentic AI.

Meanwhile, one prudent path toward agentic AI consists of leveraging generative AI and LLMs. We can readily stitch together generative AI to do one thing, another thing, and so on, continuing to string together various tasks and accomplish them one at a time, possibly even in parallel. Since generative AI is relatively fluent from a data and interaction perspective, it has capabilities that make agentic AI especially viable.

Gee let’s think about a circumstance or situation in which agentic AI as principally crafted via the use of generative AI might be really useful.

Thinking, thinking, thinking.

Aha, what about agentic AI and generative AI for automating the process of scientific research discovery? Yes, surely, that would make sense.

I’m glad that you thought of that since I am about to go deeper into that use case, thanks.

Agentic AI And Generative AI Bonding With The Scientific Process

A recently posted AI research paper entitled “The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery” by Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha, arXiv, August 12, 2024, provides a clever and important indication of agentic AI, via generative AI and LLMs, and how this can be brought together in a practical way.

The use case they concentrated on is the scientific process and scientific discovery.

Here’s what we will do.

I am going to quickly walk you through the essentials of the research. That will get you up-to-speed on what was undertaken. After covering those facets, I will dive into some of the gotchas and pitfalls that still need to be worked out when it comes to agentic AI and these kinds of generative AI and LLM stitchings. Please re-tighten that seatbelt another notch, the one that I earlier mentioned you might want to cinch up. If needed, also grab a glass of fine wine. You might need it.

Let’s get going.

First, here is what the overall stated purpose of the research study generally consisted of (excerpts from the source cited above):

  • “One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge.” (ibid).
  • “While frontier models have already been used as aids to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process.” (ibid).
  • “This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models (LLMs) to perform research independently and communicate their findings. “ (ibid).

I trust you can see that those points match my discussion so far about the realm of scientific research and the process of scientific discovery as an area rich for using AI.

For many years, all kinds of AI-augmented tools have been used by scientists and researchers as an aid or assistant while conducting various scientific research steps. Now, due to modern-day generative AI and LLMs, we can up the ante. AI can be set up to do the work on a start-to-finish basis.

I have a quick question for you.

What type of brand name or moniker might you give to an AI-based system composed of multiple agentic AI components that were tied together for the specific act of performing the scientific process?

Take a moment to come up with something on target.

The researchers opted to name the AI-based system “The AI Scientist”.

Well, if I might say so, I am a bit off-put by the chosen name.

We conventionally construe the word “scientist” as referring to a human who performs scientific work. Reusing the same word for an AI system is regrettably a tad steeped in the anthropomorphizing of AI. I get why they landed on the name. It is exciting. It is something that stirs vivid imagery. But I worry that it also suggests that the AI has reached a level of sentience and can do all the things that a human can do in truly humankind ways.

Let’s be honest. Despite the setup being impressive and abundantly worthy of accolades, it is not sentient. Using a name that might lead some people down that path is disquieting. Anyway, just wanted to add my two cents in there.

Another grinding of the teeth and a notable twist is worthy of bringing up.

It goes like this. The existing setup of The AI Scientist is focused on doing scientific research specifically about AI. That’s the first domain they opted to tackle. This makes sense since it is undoubtedly a realm that they know best and feel most comfortable in. It is the proverbial idea of starting with something that you know well. In this instance, they are AI researchers who know the AI field and are versed in research about and advancing AI.

Then branch out subsequently.

All’s good on that front.

Furthermore, per additional acceptable and agreeably glowing news, the researchers put together a system of a generalizable and tailorable structure that can be potentially used for other domains of research. They opted to begin this journey by concentrating on AI doing research about AI.

I’m okay with that.

The rub or issue is that the name of the system is once again somewhat confounding. What does The AI Scientist seem to refer to? If I say to you that the system does research about AI, and then I tell you that the name given to the system is The AI Scientist, you might find yourself interpreting that in two different ways. One interpretation is that this is an AI tool that does work akin to a scientist.

A second interpretation is that this is a scientist-like system that researches AI.

Do you see how those two interpretations are possible?

The problem with the second interpretation is that lamentedly some might pigeonhole The AI Scientist as a tool for exclusively and solely doing AI research. They might not realize that the tool is intended to be used for all kinds of research, across disciplines of all shapes and sizes.

I suppose that is enough on the naming conundrum.

Moving on, here’s what the research paper says in a more detailed way about the purpose of the system (excerpts):

  • “We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation.” (ibid).
  • “In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, acting like the human scientific community.” (ibid).
  • “We demonstrate the versatility of this approach by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics.” (ibid).

Observe that there is a subtle but telling sub-point in the first bullet.

They mention that The AI Scientist is devised to write code.

A primary reason for writing code in this context is that they are performing AI research and usually, the prevailing approach entails writing code to do experiments on. You come up with an idea or better way to get AI to do something, you write code to have something that can be tested, and you proceed to test it out. After doing so, you either change the AI code and try something else, or you feel confident about the code and showcase how it performed. This becomes a crucial part of your AI research study results.

Not all AI research studies make use of code. Many do. Indeed, there is often a show-me perspective that if you can’t make code to do what you proclaim, it could be that your ideas are not up to snuff. Sometimes you might sketch out an algorithm and not write code to implement it. Sometimes you might perform other experiments that have nothing to do with coding. There are lots of ways to perform AI research.

One aspect of the research paper that is particularly exciting is that they put their agentic AI system to a series of tests. Yes, they performed experiments on the system they had devised. The scientific process is used to assess a system devised to perform the scientific process. This might seem mind-bending but makes very good sense.

An old saying is that you ought to be willing to eat your own dog food. It suggests that if you make a product or service, you should be willing to use the said product or service. That’s somewhat of the case here.

Tangible Understanding Of The System

I assume you are with me on the notion of what the system does.

To give you a greater tangible understanding, I’ll be referring next to some aspects in their online blog posting entitled “The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery”, Sakana.ai, August 13, 2024, which gives an overview of the work performed. I will then intersperse some sample prompts that the core research paper provided to exemplify how generative AI for each step of the research process might be given suitable instructions.

In case you don’t know, you use prompts when making use of generative AI. A prompt might tell the AI to do this or that. A prompt might be a question that you want generative AI to answer. And so on. There are markedly good ways to compose prompts, and there are shaky ways to do so.

If you are an active user of generative AI, you might want to learn about prompt engineering, see my comprehensive coverage of over fifty significant prompting tactics and strategies, at the link here.

The research divided up the overarching process of scientific research into four key steps, consisting of (1) Idea generation, (2) Experimental iteration, (3) Paper write-up, and (4) Paper reviewing. Those are the steps that the agentic AI is set up to perform. Each step leads to the next step. All the steps are connected together, almost like being in an assembly line.

I’ll show you an excerpted depiction for each of the four steps (just something to whet your appetite and illustrate what this is about), plus an excerpted indication of what a generative AI prompt might be like for each respective step:

  • (1) Idea Generation. “Given a starting template, The AI Scientist first ‘brainstorms’ a diverse set of novel research directions.” (excerpt from the blog posting).
  • Sample Prompt (excerpt): “You are an ambitious AI PhD student who is looking to publish a paper that will contribute significantly to the field.” (from the research paper).
  • (2) Experimental Iteration. “Given an idea and a template, the second phase of The AI Scientist first executes the proposed experiments and then obtains and produces plots to visualize its results.” (excerpt from the blog posting).
  • Sample prompt (excerpt): “Your goal is to implement the following idea: {title}. The proposed experiment is as follows: {idea}. You are given a total of up to {max_runs} runs to complete the necessary experiments. You do not need to use all {max_runs}. First, plan the list of experiments you would like to run.” (from the research paper).
  • (3) Paper Write-up. “Finally, The AI Scientist produces a concise and informative write-up of its progress in the style of a standard machine learning conference proceeding in LaTeX.” (excerpt from the blog posting).
  • Sample prompt (excerpt): “We’ve provided the ‘latex/template.tex` file to the project. You will be filling it in section by section. First, please fill in the {section} section of the writeup. Some tips are provided below: {per_section_tips}. Before every paragraph, please include a brief description of what you plan to write in that paragraph in a comment.” (from the research paper).
  • (4) Automated Paper Reviewing. “A key aspect of this work is the development of an automated LLM-powered reviewer, capable of evaluating generated papers with near-human accuracy.” (excerpt from the blog posting).
  • Sample prompt (excerpt): “You are an AI researcher who is reviewing a paper that was submitted to a prestigious ML venue. Be critical and cautious in your decision. If a paper is bad or you are unsure, give it bad scores and reject It.” (from the research paper).

That scratches the surface of the elements of the system. Make sure to visit their blog page or read their research paper for the nitty gritty. It’s a worthy read.

They nicely provided the finished products of their testing. They used the budding system to do useful AI research. I mention this because an easier path would have been to do throwaway AI research, done simply to show proof of concept.

Instead, they had the system tackle tough topics and do very interesting AI research.

For example, here’s a sample paper that was produced, which had a solid title, abstract, and many pages of the literature review, in-depth ideas explored, substantial code written, multiple tests performed, extensive analysis of the results, and the final paper was composed by the AI (example):

  • Example paper title produced: “Dual-Scale Diffusion: Adaptive Feature Balancing For Low-Dimensional Generative Models”.
  • Example abstract produced (excerpt): “This paper introduces an adaptive dual-scale denoising approach for low dimensional diffusion models, addressing the challenge of balancing global structure and local detail in generated samples. While diffusion models have shown remarkable success in high-dimensional spaces, their application to low-dimensional data remains crucial for understanding fundamental model behaviors and addressing real-world applications with inherently low-dimensional data. However, in these spaces, traditional models often struggle to simultaneously capture both macro-level patterns and fine-grained features, leading to suboptimal sample quality. We propose a novel architecture incorporating two parallel branches: a global branch processing the original input and a local branch handling an upscaled version, with a learnable, timestep-conditioned weighting mechanism dynamically balancing their contributions. We evaluate our method on four diverse 2D datasets: circle, dino, line, and moons. Our results demonstrate significant improvements in sample quality, with KL divergence reductions of up to 12.8% compared to the baseline model.”

Good stuff.

Show-Me At A Viable Cost And Suitable Quality

Let’s explore some broad parameters associated with the system.

One facet would be whether the cost of using the tool is within a reasonable range. If it costs zillions of dollars to run and conduct research this way, you will have to be mindful of figuring out whether the ROI (return on investment) is sensible for going this route.

Many of today’s AI systems run on pricey servers in the cloud. The charges for the computer processing can rapidly add up. They were mindful of this and indicated that the cost of their tests was so low that it would seem to be nearly impossible to not consider going this path. I’m not saying it is a must, and only pointing out that the cost is apparently extraordinarily affordable, depending on what you use the system for.

Here’s what they indicated:

  • “Each idea is implemented and developed into a full paper at a meager cost of less than $15 per paper, illustrating the potential for our framework to democratize research and significantly accelerate scientific progress.” (ibid).

I’ll be interested to see if other researchers who replicate this work and make use of the AI system experience similar rock-bottom costs. Hopefully so.

Another consideration is whether the results of the AI-based end-to-end research are any good. If the quality stinks, it probably wouldn’t matter what the cost is, even if it is low. You want something that will produce suitable quality research. Enough said.

Here’s what they indicated:

  • “To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores.” (ibid).
  • “The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer.” (ibid).

I’ll say that the sample papers they showcased do match my experience as an AI research evaluator and peer reviewer for AI journals. I am assuming that the papers are representative of what the system can do. I mention this because sometimes a few good apples are in the barrel, but the rest of the barrel is not so good.

Along those lines, they are making the code open-sourced so that we can see for ourselves whether the talk walks the walk. Bravo for going open-sourced.

Readers of my column know that I give three cheers each time that I review or discuss AI research that also makes their code open-sourced. To me, that’s how we are going to make ongoing progress in advancing AI. Researchers who won’t reveal their AI code and inner workings tend to make life hard for everyone else because there isn’t a straightforward way to verify their results.

Here’s what they indicated:

  • “This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world’s most challenging problems. Our code is open-sourced.” (ibid).

As I say, hats off to open-sourcing their AI system.

Start grinding away on it and let’s see what happens.

Some Thoughts About Gotchas And Limitations

I will next do a rapid-fire rundown of some eyebrow-raising concerns about not just this effort but also any further efforts of this kind. Some might say we are opening a Pandora’s box. If so, let’s at least have our eyes wide open.

One other thing, when I start pointing out gotchas and limitations, some trolls say that I am being a gloomy Gus. Not so. It is important and indeed crucial that we consider the upsides and the downsides of all manner of AI research. We would be foolish to not do so. Putting one’s head in the sand is no way to proceed.

The points I am going to make are pretty much dutifully also pointed out by the researchers. That’s a welcome sign. Sometimes, AI researchers do not contemplate what issues their work might portend. In this case, it is readily apparent that the researchers were very mindful of AI Ethics and the desire to devise Responsible AI.

For my extensive coverage of AI Ethics and AI Law, see the link here.

Those are my caveats and fine print as a heads-up for you and the trolls.

On with the show.

(1) The Domains And Experiments

First, as I noted earlier, the current instance of the systems has been applied to AI research. The system seems to be generalizable and tailorable for other domains. As per the research paper (excerpt): “Here, we focus on Machine Learning (ML) applications, but this approach can more generally be applied to almost any other discipline, e.g. biology or physics, given an adequate way of automatically executing experiments.”

A sticking point is going to be the performing of experiments.

It is easy picking to do AI research since often this involves writing code and testing code. That’s something that generative AI is getting quite good at. Other areas or disciplines are less likely to be using software coding-based experiments. They are bound to perform real-world laboratory experiments such as in a chemical lab, a biological lab, a physics lab, etc.

Will this kind of AI system be able to set up and run those experiments via AI alone, or might there be a need to have a human in the loop?

That’s not a catastrophic downfall. It just means that the AI won’t be doing the science from end-to-end by itself. You might not care that the AI does everything and can’t do the actual experiments. Fine, you say, this is a vast improvement anyway and doesn’t undercut the intrinsic value.

In any case, until we see that this system has been adapted for other types of experiments, we will need to remain cautious and curious about whether and how well this can be done elsewhere.

I would say the same applies to working in other domains. Sure, we can see that this seems to work for AI research. Will it similarly do good work in domains such as chemistry, biology, physics, and the other hard sciences, even if we take out of the equation the aspects of the experiments? Plus, what about other domains that use the scientific method but are considered “soft sciences” such as sociology, political science, anthropology, etc.?

Proof will be in the pudding.

There is also a concern about AI doing experiments via automation connected to a lab setting. Imagine that the AI is running a chemical test in a remote lab somewhere. During the testing, yikes, something goes awry. Will the local system that is being commanded by the overarching AI be able to handle the problem? Will the AI trying to command the local system try to give commands that make the situation worsen?

Various AI safety considerations arise, see my coverage of such issues at the link here.

(2) Choosing What To Work On And How To Do So

I had mentioned that with agentic AI, sometimes you provide the goal, while sometimes the AI is established to make its own goals.

There are tradeoffs. If the AI will only work via humans submitting goals, this could be overly limiting. People might not come up with innovative goals and fall into a rut. AI might via computational means devise fresh goals that people would have said at the get-go is a dead-end, yet the goal might be one that no one has tried or tried previously and hadn’t thought to try again.

The researchers noted that currently their system is set up to be able to identify self-goals (excerpt): “The AI Scientist is able to generate its own scientific ideas and hypotheses, as well as a plan for testing them with experiments.” I’ll assume that it is likely you could programmatically override this or substitute human-devised goals. This might be the best way to leverage the two worlds.

So far, so good.

The disturbing part is this. Suppose that the self-goals are useless. You might not realize this is the case. Meanwhile, the AI proceeds to carry out the soup-to-nuts research. This might be a waste of money. It might also create a false impression that the ideas were of value when they weren’t.

A similar qualm would be that even if the idea is a good one, the rest of the process might be flawed or faltered. Imagine that a good idea is shown to be supposedly unworkable or useless. We might not dig into the details. People will roam around proclaiming that the AI system “proved” that the idea was worthless.

As per the comments by the researchers (excerpt): “The AI Scientist may incorrectly implement an idea, which can be difficult to catch.” From an equally disconcerting viewpoint, the ideas might be of low quality and yet given the full research treatment: “The AI Scientist can generate hundreds of interesting, medium-quality papers over the course of a week.” The volume might seem to be impressive by using this kind of automation, but there might not be any gems in the bunch.

I recently discussed in a posting that there is an interesting and possibly disconcerting aspect that today’s major generative AI and LLMs tend to be homogeneous, see the link here. The issue is that they are generally data trained the same way, tend to use similar models, and otherwise conform to prevailing beliefs about AI designs and construction.

The researchers noted an akin concern (excerpt): “The idea-generation process often results in very similar ideas across different runs and even models.” It could be that the AI-generated ideas are mediocre and keep staying that way. Over and over, minimal or middling research gets performed.

(3) Getting Ourselves Into A Mess

There is a famous thought experiment in the AI field known as the paperclip scenario, see my discussion and analysis at the link here.

It goes roughly like this. We establish an AI system and tell it to make paper clips. At first, the AI does a reasonable job. Then, the AI mathematically and computationally starts to calculate that even more paperclips could be made by grabbing hold of additional manufacturing plants. The AI does this. Soon, the AI opts to collect and consume all the earth’s resources to make paperclips.

Humans die without those resources.

End of story.

If we establish AI-based systems to perform scientific research, will we somehow lose control and find ourselves having made our own doomsday-like machine?

I’ll add more to that pressing question. Worries are that we might make AI that is smarter than humans, often referred to as ASI (artificial superhuman intelligence), see my discussion at the link here. Here is a chilling point made in the research paper on The AI Scientist (excerpt): “However, future generations of foundation models may propose ideas that are challenging for humans to reason about and evaluate. This links to the field of ‘superalignment’ or supervising AI systems that may be smarter than us, which is an active area of research.”

Your initial thought might be that if the AI is smarter than us, that’s fine and dandy. It might come up with ideas for research that we never could have divined. The next step is that we suddenly have a cure for cancer. Good for us.

I don’t want to burst any bubbles, but the opposite is also possible. The AI comes up with some ideas that could allow the AI to enslave humankind. The idea is sketchy and needs filling in. Voila, the AI uses the scientific research process to figure out the workable details. Humankind has shot its own foot.

This brings up a topic that I have hammered away on for years and years. Many people tend to see AI as either being all-good or all-bad. Either AI is going to save humanity, such as my discussion of how AI can aid in achieving the United Nations SDGs (sustainability development goals), see the link here, or AI will destroy us all, see my analysis at the link here.

I keep harping on the dual use of AI, see the link here.

We need to realize that depending upon how we design, build, and field AI, it can be used for both good and bad. My usual example is an AI system that is devised to detect deadly toxins. This is good. We can use this to warn people about killer toxins. The thing is, that kind of AI is often easily switched around by a few simple changes. An evildoer could use the same AI to identify new deadly toxins that no one is ready for. Evil doing ensues.

The research paper on The AI Scientist notes this same issue (excerpt) “As with most previous technological advances, The AI Scientist has the potential to be used in unethical ways. For example, it could be explicitly deployed to conduct unethical research, or even lead to unintended harm if The AI Scientist conducts unsafe research. Concretely, if it were encouraged to find novel, interesting biological materials and given access to ‘cloud labs’ where robots perform wet lab biology experiments, it could (without its overseer’s intent) create new, dangerous viruses or poisons that harm people before we can intervene.”

(4) Lots More To Worry About

As stated in the classic and endearing movie The Princess Bride, I’m only getting started on the litany of trepidations. I don’t want to drone on and on, so let’s just hit a few additional ones. We could be here all day long otherwise.

A big topic these days about generative AI and LLMs is the challenges associated with so-called AI hallucinations. I disfavor the catchphrase because it tends to anthropomorphize AI. For my in-depth exploration of the said-to-be AI hallucinations, see the link here. The general idea is that generative AI makes up fictitious content that appears to look real and factual, but it’s purely fakery. Some prefer to refer to this as AI confabulations. I’d go with that. Alas, I believe that we are stuck with the AI hallucinations wording since it is much more alluring and vividly evocative.

AI hallucinations can arise anywhere at any time in generative AI. Think about the scientific process that I earlier outlined. The research ideas that AI proposes might be zany. The literature review might have been falsely performed. The experimental design might contain falsehoods. The experiments might be misreported. The writing of the paper could be replete with made-up junk. The reviews undertaken by the AI could be lies and misstatements.

It might be extremely hard for humans to ferret out the embedded falsehoods. A paper might seem perfect. The experiments might seem impeccable. We proceed to use the results to do this or that. Bam, we discover that the thing was a sham, and it doesn’t work as advertised. Worse, maybe it harms us rather than simply not working as hoped for.

The researchers acknowledge the issues associated with AI hallucinations and have tried to take precautions: “At each step of writing, Aider is prompted to only use real experimental results in the form of notes and figures generated from code, and real citations to reduce hallucination.”

Right now, overall, it is still an open challenge of how to either stop AI hallucinations from occurring or find a means to catch them before they make their way into generated content, see my discussion about AI trust layers as an approach to this, at the link here.

You might also know that generative AI and LLMs can potentially contain biases, see my coverage at the link here. For example, suppose AI is data trained on data that says all people with blue eyes are dangerous (I doubt this to be the case; it’s an undue bias). The AI is merely doing computational pattern-matching and won’t likely discern that this is an unfounded discriminatory bias. The researchers mention this concern too: “As has been previously observed in the literature, judgments by LLMs can often have bias which we can observe in over-estimation of an idea’s interestingness, feasibility, or novelty.”

As a last topic for now (I’ve got many more), a notable question arises about the role of human scientists and researchers.

If you can use AI to do end-to-end scientific research, do we need to have human scientists? They can be expensive. They want coffee breaks. They only work ten-hour days. They want vacations. The glee of management that they could simply manage an AI system to do their same work is, well, immense.

The problem compounds itself. Suppose we have fewer and fewer scientists and researchers. We become more and more reliant on AI to find discoveries and test out theories. We begin to lose touch with how to do such work on our own. Humans realize that becoming a scientist or researcher is a career-ending path.

It is one of those humanity death spirals.

That seems gloomy. Oops. The other side of the coin is that AI takes the mundane work out of our hands. Scientists and researchers can be at the big-picture level. They can do ten times, maybe a hundred times more impactful research. More research is done. More scientific breakthroughs are discovered. It is the best of times. People flock to become scientists and researchers. Hurrah!

Here’s how the research paper phrased it (excerpt): “However, we do not believe the role of a human scientist will be diminished. We expect the role of scientists will change as we adapt to new technology and move up the food chain.”

Something we need to figure out.

Soon.

Maybe, right away.

Conclusion

Congratulations, you are now aware of what’s happening. Not many are.

Are you a scientist or researcher?

Do you know any scientists or researchers?

Anybody who is one way or another involved with or impacted by scientists and researchers ought to know what is taking place with AI in the realm of scientific research and discovery. I don’t want to make it seem that AI is taking over or becoming sentient. That’s not what this discussion has been about.

The discussion is that with today’s AI, we can do a lot more than a lot of people think can be done. This is possible without sentient AI. It is possible without AGI (artificial general intelligence) or ASI (artificial superhuman intelligence). No need to fly into the sci-fi sphere.

The upbeat or upside is that we can decide how this fares. Do we need or want to put in place AI Ethics provisions or AI Law pronouncements to shape or control where this goes? These are societal questions. The AI is the AI. Technologists do what they do.

It takes a village to ensure that AI comes out favorably toward humankind.

The final words for this discussion will go to the famed Albert Einstein.

He said this about science: “One thing I have learned in a long life: that all our science, measured against reality, is primitive and childlike — and yet it is the most precious thing we have.”

If science is precious, we should be mindful of how it is conducted and in what ways we can enhance science and prevent science from going off the rails.

For those of you who might be thinking we should ban AI from doing this type of end-to-end science, I have something else to tell you that Einstein said.

He said this: “Life is like riding a bicycle. To keep your balance, you must keep moving.”

Okay, we need to keep moving. Let’s make sure that it is in the right direction and that we don’t fall over or crash the bike.

Share.
Exit mobile version