Automating a simulation workflow
In many companies, the workflow around simulations is a highly manual and error-prone process. Automating a simulation workflow saves time and effort. Moreover, it makes the process far more reliable, and it opens new possibilities.
Strategic and operational simulations
Many engineering companies use simulation, for example to assess the quality of new or existing assets. Setting up and running simulations typically consists of various, interdependent steps. In general, there is a data preparation phase, a simulation phase, and a reporting phase. But these phases themselves often consist of multiple steps that must be done in the right order. On top of this there can be multiple iterations and several sub-studies in the process chain.
Challenges in complex simulations
In the data preparation phase, the input data comes from many different sources and will have all sorts of issues that need to be resolved. The data formats are often only partially standardized. Even where standards apply, these standards tend to change often to accommodate operational changes. Combining data from different sources leads to many issues regarding the matching of data items. An item may have one ID in one dataset but another ID in another dataset.
The simulation itself may run into all sorts of errors that need to be detected and handled. Running trial simulations or performing dry runs are therefore common parts of the process. And afterwards, various plots and reports need to be made.
Typically, experts that perform these simulations develop all sorts of scripts to help them in the process. But these scripts are mostly solutions for part of the process. There is a lot of manual work between each automated part of the process.
The disadvantages of a manual process
Having all this manual work involved has some obvious disadvantages. One is that the process can only be done by people who are experienced in the task. They know the steps to take, they know the scripts and how to use them and they know the pitfalls. Initiating a new colleague in that process is usually not easy. That poses a risk: if the experienced person is somehow not available, it will be hard to do the simulation at all.
Another problem is that such a non-automated process is error prone. In the large amount of data that goes around, it is all too easy to miss a problem. Or a step in the entire process may be forgotten, leading to wrong results. Typically, the experts running the simulation spend a lot of time inspecting the results and redoing steps before finally deciding that they trust the results.
And last, but certainly not least, it takes a lot of time to do the simulations with all the effort involved. This time is often expensive as the people running the simulation are highly paid experts. But it will also make it hard to run additional simulations to test extra scenarios.
Automating a simulation workflow
A challenge in automating a simulation workflow is that there is a lot of expert knowledge involved in the process that is often not explicit. In addition, the issues with the incoming data are unpredictable as collecting the data is beyond the control of the people that run the simulation. And the process itself may change over time according to changes in the demands from the business. Therefore, the setup must be flexible to accommodate for unexpected events and changes that are bound to emerge.
Still, it is entirely possible to convert the manual process into an automated toolchain without sacrificing the flexibility that needs to be there. This blog describes how we have done that for a workflow at the Transmission System Operator TenneT.
An example: automating a simulation workflow for electricity supply and demand at TenneT
Recently, we have performed a workflow automation project for Tennet, the major transmission system operator (TSO) in the Netherlands. TenneT assesses its high-voltage power grid on a regular basis to determine if it will be adequate for the coming changes in the energy market. The results of such studies are also used for planning the investments that are required for meeting the future demands.
Simulation of the future electricity supplies and demands and of the consequent loads on the power grid infrastructure is a major part of these adequacy studies. In turn, this simulation uses a large amount of data from various sources to describe both the network and the expected circumstances that the network will be dealing with.
Tennet was very much aware of the disadvantages of a simulation workflow with too many manual steps. Therefore, they contacted VORtech to help them to automate the whole workflow.
The first step we took was getting a good understanding of the process. This meant a lot of interaction with the experts that have been running the process. They have a lot of implicit knowledge and experience that needs to be made explicit.
This is not only about the steps that are involved in running the simulation. It also addresses the experience that the experts have with the incoming data. What are the issues that they typically encounter? How do they deal with them? How can they identify a problem? Sometimes, the experts could not easily say how they checks the data: it may just be a hunch that some data looks weird.
This required us to acquire a basic understanding of the application field. Only if we could understand the purpose and the background of the application would we be able to ask the right questions to the experts.
In this exploratory phase of developing an automated toolchain, we took time to look at the various data files: the files that are received from external providers, the files that are used as input for the simulation and the files that are produced by the simulation. A good understanding of all these data formats was essential going further.
Second step: what to improve, how to get there?
Once we had a good understanding of the current process, the next step was to define what the automated process should look like. This is not necessarily the same as the current process. Experts often know about inefficiencies in the process that they never addressed because it would be too much work to change the process. Or they may know that certain steps always give problems and might be better designed.
But also, the dialogue between the automation developer and the expert may reveal improvements to the process. The automation developer, being new to the process, will be asking questions that the expert may never have considered.
In the TenneT case, one of the things that came out of this discussion was that it would be useful to implement a variety of explicit checks on the input data. This would allow to flag problems with data from external parties right at the beginning of the process and ask the provider to correct the issue. Another idea was that the automated workflow should make a log file of each step, thus getting a full trace of the entire process also for quality management.
We provided a report after the two initial steps (understanding the process and designing an improved process). With a clear idea about how we would build the automated process, it was possible to come up with an estimate of the costs and with a planning. That last issue was crucial: the simulation process had to be run at a certain moment and the entire development had to be finalized before then.
Third step: build a prototype
Automating a simulation workflow for people that have been running the workflow manually for a long time is never right the first time. Even though a good understanding of the process was gained in the first step, all sorts of implicit knowledge will pop up once the first prototype of the system is there.
In the case of TenneT, the experts realized for example that they forgot to mention one particular check that they always perform. Note that it is not a fault of them to have forgotten something: it is a typical feature of explicating knowledge and exactly the reason why you would build a prototype.
Also, when the experts started using the first prototype, they came up with useful ideas about how to improve the workflow further. In short, having a prototype helped tremendously in the dialogue between users and developers. It is a common reality that both parties can relate to.
Such a prototype does not need to be functionally complete, and it does not need to be fully robust. For example, it is unnecessary to build an entire test suite for this prototype as things may still change significantly. Also, it is not important that the user interface is entirely robust against errors in the user’s input. It is the first usable deliverable so that the experts can get their hands on as fast as possible. But no more than that.
Still, we did not make many compromises on the software quality. The parts of the system that will live on in the final product should have the right quality and it is all too easy to forget to beautify a piece of code later in the process.
Also, in this stage, we set up a proper software management system in Gitlab, including the issue tracker and test environment. This ensured that whatever we made was properly managed and a good basis for further development.
Fourth step: build the complete workflow system
After several sessions with the prototype, both the experts and we had sufficient confidence in the final design to implement the full functionality.
Now, software quality became a high priority for us. Quality not only means a clear programming style, conforming to the usual programming guidelines for the selected programming language. It also means that all conceivable problems and errors are handled appropriately. And of course, all relevant tests need to be implemented from the basic unit tests all the way up to the integration tests.
In this phase, we also discussed the software management in the future. We would be available for support for some time, but we prefer someone at the client to be capable of providing the support. In the case of TenneT, the decision on this was deferred to a later stage as they preferred to get support from us for some time to come.
Fifth step: support
While the workflow automation tool was used operationally, we provided support for addressing problems arising from bugs as well as wishes coming from new insights. For this project, the support phase lasted four months, but typically this support phase can last for a long period, sometimes even decades. In this phase the investments in software quality start to pay off: if the quality is good then the changes during the support phase will take less time than when the quality is poorer.
Still, it is important to have developers on board that understand the system and the details of the workflow that it implements. No degree of quality can compensate for the inefficiencies that come from a developer that is new to the system and has not had a good introduction. That is why we make sure that we have someone with an understanding to the system for the entire period that we provide support.
Evaluation
The people at TenneT were enthusiastic about the new system. They admitted that it had been a tough process but that every step we took had been necessary and useful. Obviously, we enjoyed the positive feedback, but the fact that the entire development was done within budget was certainly something that we were proud of as well.
Is automating a simulation workflow the solution for you?
In this blog we have described the process for automating a simulation workflow. We have gone through this process a lot of times for many different customers in various sectors. If you are struggling with a (largely) manual workflow and see the benefits of automation, feel free to contact us to explore how we can assist you.