Introduction
When an application is handling a large number of API calls and multiple services running in parallel or sequentially, it is important to ensure that it includes tests to prevent it from crashing or experiencing other issues. For example, in the case of a producer-consumer model or asynchronous API requests across services, it is essential to have tests in place to ensure that everything is running smoothly. It is worth noting that the need for tests increases as the complexity of the system increases.
Togai is an API-first metering and billing solution service that utilizes a microservice architecture. Due to the complex nature of this architecture, it is crucial to have both local integration tests and unit tests to ensure that the system is functioning as expected. This not only helps to detect any potential issues early on, but also gives developers the confidence to make changes and updates to the system without worrying about introducing new problems.
Complexity Involved
When multiple processes are involved in accomplishing a task, there are many moving parts that need to be coordinated to ensure smooth operation. To optimize these steps for faster speed, we need to consider various factors. For instance, we need to ensure that the communication between services is seamless and that the data is being passed correctly. We also need to ensure that the tests are cover all the possible scenarios and that the resources are being utilized efficiently.
For example, when a service needs to retrieve information from another service, communication is achieved through an API call or other message-passing strategies such as producer and consumer queues. To verify this flow, tests can be written to ensure that everything is working as expected. However, the performance of these tests depends on whether we choose to make an actual API call or mock the entities involved to produce static stub information. If we choose to make an actual API call, the test might fail when there is server maintenance or if a particular API has a bug, which raises the need for unit tests.
Furthermore, when we consider the parallel processing of multiple instances of a consumer, there are issues of shared resources that need to be handled accordingly. This can involve developing strategies for resource allocation and management to ensure smooth operation.
In summary, optimizing the steps involved in a task with multiple processes requires careful consideration of various factors such as communication methods, testing strategies, and resource management. By taking these factors into account, we can ensure smooth and efficient operation.
Frameworks and Strategies
Togai is a platform that utilizes JUnit5 for testing Kotlin services. To ensure that all integral parts of the system are functioning well, Postgres, Timescale, and Redis are run inside the test container while the tests run on these containers. We have both local integration tests, comprising consumer and producer tests and mocked APIs, and unit tests for the actual APIs. This testing strategy allows us to monitor a particular instance of an object and provide stubs for hand-picked functions, which enhances the accuracy of the tests.
Unit tests are simple API calls and response verification. A consumer flow test is more complicated than it sounds. Togai has multiple consumers in the NATS queue. These consumers consume data accordingly in order to generate information that can be stored. The complexity of these tests can be realized by questioning the need for message passing when an API call can be made any day. However, the fact that message queues reduce the service load by not spiking the service with calls to multiple instances in the case of an API call cannot be denied. The use of message queues enhances the individuality of each service. Additionally, the use of message queues reduces the probability of the system crashing, since the load is distributed among the multiple instances.
To optimize the process of consuming data from a queue, multiple instances of consumers run on parallel coroutines executing different sets of data individually. This requires the management of shared resources. For example, a flag to figure out when to stop a test needs to be modified only by one consumer instance at any time. This approach significantly reduces the time taken to consume data from the queue while maintaining the accuracy of the tests. This approach also allows for better utilization of resources, since multiple instances are running in parallel.
How Togai does deterministic testing?
Well, when dealing with a consumer in an application, it is important to note that this is a never-ending job. Essentially, the consumer is constantly scanning the queue, checking for any messages. If there are messages present, then it dequeues them. If not, the consumer sleeps until there are new messages in the queue. However, to ensure that the system is functioning correctly, we need to ensure that the consumer is not stuck in a loop and that it is able to process all the messages correctly.
In contrast, when performing an API test, we can easily verify the result of the test by asserting the response of the API call. However, when it comes to consumer testing, it is not always so straightforward. In the event that there is an error in the execution of the flow, there is no explicit status update. Instead, the message is retried for a fixed number of times before being sent to another queue if it remains unprocessable.
This brings up several important questions, such as "How do we assert a failure that occurs in between processing messages?" and "How do we know when the consumer inside the test has completed its job and it's time to check for updates in the database?" These questions are just a subset of the many that arise in consumer testing. To ensure that the tests are deterministic, we need to ensure that the tests are covering all the possible scenarios and that the system is capable of handling any errors that might occur.
For example, one potential issue that may arise is if an I/O process inside the consumer flow takes a longer time than expected. In this case, it is important to determine how long we should wait until we assert that the process has failed. In general, setting a timeout limit can help address this issue. If a test exceeds the timeout limit, then it is considered to have failed. However, it is important to note that asserting the cause of the failure is a separate issue that must also be addressed. Additionally, stress testing the channels for performance benchmarking can help identify any potential issues that might arise in a real-world scenario.
In case of a consumer test, the messages and the connections are mocked inside the test container and thus only the consumer flow code is tested. In a real scenario, a consumer might take a longer time to establish the connection or even a longer time to dequeue the messages in it. Those anomalies need to be taken into account and simulated in the tests to ensure that the tests are covering all the possible scenarios.

