Podcast: Using AI Code Generation to Migrate 20000 Tests

MMS Founder
MMS Sergii Gorbachov

Article originally posted on InfoQ. Visit InfoQ

Transcript

Shane Hastie: Good day, folks. This is Shane Hastie for the InfoQ Engineering Culture Podcast. Today I’m sitting down with Sergii Gorbachov. Sergii, welcome. Thank you for taking the time to talk to us today.

Sergii Gorbachov: Thank you. Thank you for inviting me.

Introductions [01:02]

Shane Hastie: We met because you recently gave a really interesting talk at QCon San Francisco. Do you want to just give us the high-level picture of that talk?

Sergii Gorbachov: Sounds good. The project that I presented was around code migration. It was a migration project that took about 10 months.

We moved from Enzyme to React Testing Library. Those are the libraries that help you create tests to test React.

The interesting piece there was that I combined a traditional approach using AST or Abstract Syntax Tree and LM. Together with the traditional and new approaches, I was able to save a lot of engineering hours and finish this project a lot faster than initially was planned.

Shane Hastie: Real-world, hands-on implementation of the large language models in production code bases. Before we go any further, who is Sergii?

Sergii Gorbachov: Sure, yes, I can talk more about myself. My name is Sergii Gorbachov, and I am a staff engineer at Slack.

I’m part of the Developer Experience Organization, and I’m a member of the front-end test frameworks team, so I deal with anything front-end testing related.

Shane Hastie: What got you interested even, in using the LLM models?

What Got You Into Using AI/LLMs? [02:30]

Sergii Gorbachov: Well, first of all, of course hype. You go on LinkedIn, or you go to any talks or conferences, you see that AI is very prominent. I think, everyone should probably learn about new technologies, so that was the initial push for me.

Also, before Slack, I worked at a fintech company, it was called Finn AI, where I built a testing framework for their chatbot.

I was already acquainted with some of the conversational systems that use artificial intelligence, not large language models, but regular more typical models.

Then of course, the reality of working as a front-end or software developer engineering test, working with front-end technologies, that’s changed quite often.

I think JavaScript is notorious for all of these changes, and the libraries change so drastically, that there are so many breaking changes that you need to, in our case, rewrite 20,000 tests

Doing it manually would take too long. It would be about, we calculated 10 to 15,000 engineering hours, if we went with the manual conversion and using developer time with that.

At that time, Anthropic came out with one of their LLM, large language models, and one of the use cases that people were talking about is code-generation and conversion, so that’s why we decided to try it.

Of course, at that time they were not very popular. There were not too many use cases that were successful, so we had to just try it out.

To be honest, we were desperate, because that much amount of work, and we had to do it and had to help ourselves.

Shane Hastie: What were the big learnings?

Key Learnings from AI Implementation [04:18]

Sergii Gorbachov: I’d say, one of the biggest learnings is that AI by itself was not a very successful tool.

I saw that it was an over-hyped technology, and if we used AI by itself to convert code A to code B, in our case from one framework to another framework, in the same language, so the scope was large but not very wildly large.

It was still not performing well, and we had to control the flow. We had to collect all the context ourselves, and we also had to use the conventional traditional approaches, together with AI.

I think, that’s the main biggest learning, is that our traditional common things that we have been using for decades are still relevant, and AI does not displace them completely, it only complements them, it’s just another tool.

It’s useful, but it cannot completely replace what we’ve been doing before.

Shane Hastie: What does it change? Particularly in how people interact and how our roles as developers change?

How Developer Roles Are Changing [05:29]

Sergii Gorbachov: Sure. In this specific example for Enzyme to RTL conversion, I’d say, the biggest part of AI was the generation part and large language models. Those are generative models, and that’s where we saw that there were no other tools available in the industry.

Usually that role is done by actual developers, where you, for example, convert something, you use those tools manually, and then you have to write the code.

That piece is now more automated, and in our case, for our project, was done by the LM models. The developers role shifted more to reviewing, fixing, verifying, so it was more a validation part that was done by the developers.

Shane Hastie: One of the consistent things that I’m hearing in the industry, and in almost any role adopting generative AI, is that shift from author to editor.

That is a mindset shift. If you’re used to being the author, what does it take to become the editor, to be that reviewer rather than the creator?

Sergii Gorbachov: Well, that’s more of a extreme use case. Maybe it exists in real life, but I would say, realistically it’s, you are a co-author.

Yes, you’re a co-author with the AI system, and you still have to write code, you definitely spend a lot more time reviewing the code and guiding the system, rather than writing everything yourself.

In that respect, I don’t think that I personally, in my development experience, I don’t miss that part, writing some of the scaffolding or code that is very straightforward, that I can just maybe copy and paste from other sources.

Now, I can just ask the LM to do that easy bit for me, and go produce the code and write code myself, that is more complex, where more thinking is necessary.

Yes, definitely the role of a developer, at least in my experience, has shifted to more of a co-author and a person who drives this process.

To a certain extent, it’s still also very empowering, because I’m in control of everything, but I’m not the person who does all the work.

Shane Hastie: We’ve been abstracting you further and further away from the underlying bare metal, so to speak. Is this just another abstraction layer, or is there something different?

AI as the Next Abstraction Layer [08:15]

Sergii Gorbachov: I think it’s definitely another abstraction layer on top of our regular work, or another layer, how we can interact with various coding languages or systems, but it could be the final one.

What is going above just natural language? Because that’s how we interact with the models. You use natural language to build tools, so the next step is just implanting it in our brains, and then controlling them with our minds, which I don’t think will happen.

The difference I think, with previous technologies that would change how developers work, is that this is the final level and we are able to use tools that we use in everyday life, like language or natural language, I guess.

This is the key that is so easy and sometimes it’s, I guess, it may be democratizing the process of writing software, because you don’t really need to know some of those hardcore algorithms, or maybe sometimes learn how a specific program and language works.

Rather, you need to understand the concepts, the systems, as something more abstract, and I guess intellectually challenging.

I see how many people changed in terms of how they write code, let’s say, managers, who typically do not write code, but they possess all of this very interesting information.

For example, what qualities of a systems are important, so they can codify that knowledge and then an AI system will just code it for them.

Of course, there are some limitations of what an AI system can do, but the key here is that it enables some people, especially those who possess some of the knowledge, but they don’t know the programming language.

Shane Hastie: Thinking of developer education, what would you say to a junior developer today? What should they learn?

Advice for Junior Developers [10:16]

Sergii Gorbachov: My background is in social sciences and humanities, so maybe this whole shift fits me very well, because I can operate at a higher level where you think about, you take a system, you break it down, and when for example, in humanities, you always look at the very non-deterministic systems, let’s say languages or things that are more related to humans, and it’s very hard to pinpoint what’s going to happen next, or how the system behaves.

I would suggest focusing more on system analysis or understanding, I guess, some of the … Or taking maybe more humanities classes, that help you analyzing very complex things that we deal with every day, like talking to other people or relationships, psychological courses.

Then, that would give the ability or this apparatus to handle something so non-deterministic, like dealing with an LM, and being able to create the right prompts that would generate good code.

Shane Hastie: If we think of the typical CI pipeline series of steps, how does that change? What do we hand over to the tools, and what do we still control?

Impact on CI/CD and Development Workflows [11:47]

Sergii Gorbachov: I think, we’re still controlling the final product, and we still I think, have the ownership of the code that we produce, regardless of what tools we use, AI or no AI tools.

We still have to be responsible. We cannot just generate a lot of code, and today generation is the easiest part. The most difficult part is that you have generated so much code that you don’t know what it does, you don’t know how to validate it.

Long-term, it could be problematic, because there is a lot of duplication, some of the abstractions are not created. Long-term, I think we should still think about code quality, and control what we generate and what code is produced.

As for the whole experience, human experience, does it change? With CI systems in particular, they have served very well for us to automate certain tasks.

Let’s say you create a PR, you write your code and all of those tests linters run there, and one of those, let’s say, linters on steroids, could be an AI system that would just check our code.

It would probably not change too much of our work today, but it will be able to provide us extra feedback that we should act on. That’s I think, the reality right now.

Long-term, for example, another initiative that I’ve been working on, is test generation, and test generation is a part that no other tools can do.

Let’s say, a developer or you or me, create a piece of code and then you do not cover it with tests.

In that case, you can hook up an AI system that would generate the tests, suggest it for you, and maybe create a background job that you would be able to come back later, in one day or two days, and fix those tests or add them.

Especially for features that are not production facing, let’s say, it’s a prototype. It changes how we might be doing our job, especially for testing, where it could switch how we view, for example, some of the tasks, if they can be outsourced to an AI system, and usually AI system take longer than 10 to 20 minutes to produce some artifact.

Then, we would just change our way of working. Rather than doing everything right away in an alternative fashion, we would just create the bare bones and then ask all of those systems to do something for us in the cloud, in CI systems, and come back the next day or in two days and continue on them, or just validate and verify that what has been produced is of good quality or not.

Shane Hastie: One of the things that we found in the engineering culture trends discussion recently, was pull requests seem to be getting bigger, more code. That’s antithetical to the advice that’s been the core for the last, well, certainly since DevOps was a thing. How is that impacting us?

Challenges with Larger Pull Requests [15:19]

Sergii Gorbachov: I guess, one of the metrics that I sometimes look at is PR throughput, and definitely, if for example, your PR has a lot more code, then it would take a longer time for other people to review it, or a developer to add a code for that feature to be working.

It’s definitely, probably making, AI systems make it more possible for you to generate and create more code.

I’m not sure exactly what the future with this is, but the idea that everyone had, for example, the AI systems will increase the PR throughput and merging a lot more code.

I mean, creating maybe more PR’s is not the reality, because there is just too much code and we just maybe are using their own metrics here, and there is a mismatch between what has been before, versus the speed that AI allows us to produce that code with.

There is more code, but I think that maybe the final metrics should be slightly different, rather than what has been before, or what has been popular in DevOps such as PR size or PR throughput.

Shane Hastie: Again, a fairly significant shift. What are some of the important metrics that you look at?

Important Metrics and Tool Selection [16:51]

Sergii Gorbachov: I work at a more local level, and I come from a testing background, and I deal with front-end test frameworks.

Some of the important metrics for me specifically, is how long developers wait for our tests to run, so they can go to the next step, or before they get feedback.

Things like test suite runtime, test file or test case runtime, and those are our primary metrics for developers to be happy with … Creating the PR and getting some feedback.

I think that’s the main one for us. Wouldn’t look for … There’s probably a lot more other metrics, that people at my company are looking at, but for our team, for our level, that’s the main one.

Shane Hastie: There’s an abundance of tools, how do we choose?

Sergii Gorbachov: I think, that at this time, especially in my work, I can see ourselves more a reactive team, and when we see a problem, we sometimes cannot foresee it, but when we see a problem, we should probably use whatever tools are available for us at that time, rather than spending time and investing too much into building our own things to predict whatever is going to come next.

For example, for my previous work, for all these tools for test generation or Enzyme to RTL conversion, of course, we looked at all of the available tools, but there was nothing, because some of those tools are not created with an AI component in mind.

I think right now, a lot of my work, or people in the DevOps role, they spend a lot more time on building the infrastructure for all of these AI tools to consume that information.

Let’s say, in end-to-end testing, which is at the highest level of the testing pyramid, you sometimes get an area or a failed test, and it’s very difficult to understand what’s the root cause of this, because your product, your application might be so distributed and there is no clear associations between some of those errors.

Let’s say, you have three services and one service fails, but in your test you only see that an element, for example, did not show up.

It’s a great use case for an AI system, that can intelligently identify the root cause. Without the ability to get all of the deformation from all of those services while the test is running, it would be impossible.

Those systems, those pipelines, they at this point do not exist. I see that, at my company, that sometimes we’re used to more traditional, typical use of, let’s say, logs, where you have very strict permission rules, so that you cannot get access to some of the information from your development environment, where tests are running.

Those are the things that I think, it makes sense to invest and build, in order to beef up these AI systems. The more context you provide for them, the better they are.

I think therefore, some of the tools that we’ve been building, that’s how our work has changed. We are building these infrastructure and context collection tools, rather than using AI for all of that.

Following, I think, an AI endpoint or using an AI tool is probably the easiest, but collecting all of that other stuff is usually 80% of the success in the project.

Shane Hastie: What other advice, what else would tell your peers about your experience using the generative AI tools, in anger, in the real world?

Advice for Others on Using AI Code [20:58]

Sergii Gorbachov: I would say, one of the things that I saw among my peers and in other companies, or on LinkedIn, is that people are trying to build these very massive systems that can solve any problem.

I think, the best result that we got from our tools, was when the scope is so minimal, the smaller the better. Try to analyze the problem that you are trying to solve and break it down as much as you can, and then, try to solve and automate that one little piece.

In that case, it’s very easy to understand if it works or it doesn’t work. For example, if you can quantify the output, let’s say, in our case we usually count things like saving engineering hours. How long would it take me to do that one little thing? Let’s say, we’ve save 30 seconds, but we run our tests 30,000 times a month.

There you go, that’s some metrics that you can go and tell to your managers or present to the company, so that you can get more time and more resources to build the system, and show that there is an actual impact.

Go very, very, small, and then try to show impact. I think, those are two things. I guess, the third one is just, try it out. If it doesn’t work, then it’s all right, and if it’s a small project, then you wouldn’t waste too much time, and always there’s the learning part in it.

Shane Hastie: Sergii, lots of good advice and interesting points here. If people want to continue the conversation, where do they find you?

Sergii Gorbachov: They can find me on LinkedIn. My name is Sergii Gorbachov.

Shane Hastie: We’ll include that link in the show notes.

Sergii Gorbachov: Perfect, yes.

Shane Hastie: Thanks so much for taking the time to talk to us today.

Sergii Gorbachov: Thank you.

Mentioned:

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.