eBay Using Fault Injection at the Application Level With Code Instrumentation

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

eBay engineers have been using fault injections techniques to improve the reliability of the notification platform and explore its weaknesses. While fault injection is a common industry practice, eBay attempted a novel approach leveraging instrumentation to bring fault injection within the application level.

This platform is responsible for pushing platform notifications to third party applications to provide the latest changes in item price, item stock status, payment status and more. It is a highly distributed and large-scale system relying on many external dependencies, including distributed store, message queue, push notification endpoints and others.

Usually, says eBay engineer Wei Chen, fault injection is carried through at the infrastructure level, for example causing a network failure to introduce an HTTP error such as server disconnect or timeout, or making a given resource temporarily not available. This approach is expensive and has a number of implications on the rest of the system, making it hard to explore the effect of faults in isolation.

But this is not the only possible approach, says Chen. Instead, faults can be created at the application level, e.g., adding a specific latency within the HTTP client library to simulate a timeout.

We instrumented the class files of the client libraries for the dependent services to introduce different kinds of faults we defined. The introduced faults are raised when our service communicates with the underlying resource through the instrumented API. The faults do not really happen in our dependent services, owing to the changed codes, but the effect is simulated, enabling us to experiment without risk.

Three are the basic instrumentations that eBay has implemented to force invoked methods to show faulty behavior: blocking or interrupting the method logic, for example by throwing an exception; changing the state of methods, for example altering the return of response.getStatusCode(); and replacing the value of method parameters, which consists in modifying the value of an argument sent to a method.

To implement the above three types of instrumentation, we have created a Java agent. In the agent, we have implemented a classloader which will instrument the code of the methods leveraged in the application code. We also created an annotation to indicate which method will be instrumented and put the instrumentation logic in the methods annotated.

In addition, eBay engineers also implemented a configuration management system to dynamically change how fault injection behaves at runtime. In particular, for each endpoint supported by the eBay app, engineers can alter a number or parameters to test specific behaviors.

According to Chen, eBay is the first organization in the industry to practice fault injection at the application level using code instrumentation. If you are interested in this approach, do not miss the full explanation provided in the original article.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.