Many of our customers are large enterprises with critical highly-available and secure infrastructures. This means that they spend (as do we) a lot of time proactively investigating and stress-testing systems, indeed we and many other vendors also provide tools within our products to assist in “kicking the tyres”. However small or large your enterprise is though, it’s a methodology and mindset that you can embrace with plenty of free and open-source tools out there to assist you. My colleague, Abhilash recently wrote on his blog “What is Chaos Engineering & Testing” about the business benefits and paradigms many of our customers are adopting which gives more context on why you may want to evaluate some of these tips and tricks. In this blog, I have listed ten tools and related techniques you can use to test your IT infrastructure resilience.
1. IOMeter
IOMeter is an open-source I/O subsystem measurement and characterization tool for single and clustered systems. IOMeter is an easy way to generate stress on the I/O system and as such can be very useful for development testing teams of products interacting with or generating load on the OVS. There are many ways to use IOMeter. If you are looking to upgrade your Citrix or VMware servers, run IOMeter before and after the upgrade and compare performance to make sure that I/O performance remains acceptable. Of course, if you have a monitoring tool in place when you run your stress test, you can also observe I/O latencies, queueing and other related parameters.
2. iperf / iperf3
iperf is an open-source utility that can be useful for diagnosing network issues. It is a cross-platform tool for network performance measurement and tuning that can produce standardized performance measurements for any network. iperf has client and server functionality and can create data streams to measure the throughput between the two ends in one or both directions. Use iPerf to stress your network interface cards, or to simulate bandwidth hungry users.
3. WAN Emulation
Network emulation allows you to test the impact of poor or limited connectivity, bandwidth constraints and factors that impact on QoS (Quality of Service) and packet resends such as latency and jitter. Most would agree that a hardware device or appliance independent of the computer / hypervisor / virtual machine being tested is the ideal scenario. However, for situations with limited access to network or cost constraints software emulation may be an alternative, although it should be noted this can be difficult to do well and it is easy to introduce artificial conditions.
The Linux Foundation’s netem is a widely used option. WANem is another popular choice with a VMWare Appliance available.
Apposite are a popular vendor in the network emulation space, their product line includes the highly regarded Linktropy Hardware Emulator range. Whilst not the cheapest option, most would concur that hardware emulation is the gold standard and least problematic Enterprise-grade option.
4. Bonnie++
This is an open source project which helps measure and investigate network load and includes tools to investigate large file IO and creation/deletion of small files. There are lots of user guides and blogs on this tool.
5. Synthetic Monitoring – Proactive Availability Monitoring
We, at eG Innovations, have a full portfolio of synthetic monitoring simulators that go beyond simple logon simulations and include:
- Protocol Simulators
- Logon Simulators
- Web Client Simulation
- Full Session Simulation (including thick/thin clients)
Most monitoring vendors will include at least a basic Logon Simulator to test whether users can access platforms such as Citrix, VMware, Azure or AWS.
Our Web Client Simulator is particularly useful for testing the availability and performance of SaaS and Cloud-hosted applications where you may have little insight or control over the underlying infrastructure. A robot user accesses web applications and performs actions and real use-case steps e.g. adding a new customer and updating records in Salesforce or working within Microsoft Excel spreadsheets, and the steps are measured, and failures screenshotted, thus testing the availability and performance of the application but also interrogating the underlying services and infrastructure such as storage, databases and networking beyond Logon and app. Availability.
6. Fault Injection Tools
AWF Fault Injection Simulator was released in March 2021, a fully managed service for running fault injection experiments on AWS that makes it easier to improve an application’s performance, observability, and resiliency. Fault injection experiments are used in chaos engineering, which is the practice of stressing an application in testing or production environments by creating disruptive events, such as sudden increase in CPU or memory consumption, observing how the system responds, and implementing improvements. Beyond the managed service, open-sourced tools have also been released by AWS Engineering. The availability of such tools and services is becoming a factor when choosing between different clouds or virtualization environments.
NetFlix open-sourced their Chaos Monkey projects and are probably one of the best-known adopters and leaders of chaos engineering methodologies. There’s plenty of further information on their blog.
7. Graphical Benchmarks
When investigating the performance of intensive graphical workloads and technologies associated with them such as remoting protocols (Citrix HDX/EDT/ICA, RDP, Teradici PCoIP or CAS, VMware Blast Extreme) or GPU/CPU needs there are a wealth of benchmarks available, a great list is available here:
8. Capacity Benchmarking / Load Testing
Whilst Logon Simulators are useful to test the availability of Apps and Virtualized Desktops or services, tools like those from Login VSI are frequently used to understand the capacity of infrastructure to deal with a large volume of users causing a boot storm and the degradation of systems as they approach capacity (at what point do you intervene?). Frequently setup to simulate some degree of randomness in user behavior e.g., accessing different mailboxes, webpages or a selection of say 3 videos from 5, these workloads can be modified for test lab purposes to be more uniform to assist in the assessment of the impact of changes to configuration or say protocol code.
Other capacity benchmarking and load testing tools include:
- K6 is a popular open-source and cloud tool, popular with Developers, QA Engineers, and DevOps who use k6 for testing the performance of APIs, microservices, and websites
- Micro Focus LoadRunner
- NeoSys NeoLoad (SAP, Citrix, Cloud, DevOps and others)
- Other options ranging across free, open-source software to premium priced include: Apache JMeter, Gatling, Rational Performance Tester, WebLOAD, Locust, LoadNinja, SmartMeter.io, LoadView, WAPT
9. Fuzzing tools
Fuzzing is a technique whereby poor quality, corrupt inputs and random data is input into systems to try and expose bugs, security flaws, potential crashes etc. This can be particularly useful for testing the robustness of platforms to corrupt data packets and malicious inbound network activity.
- Microsoft announces new Project OneFuzz framework, an open source developer tool to find and fix bugs at scale – Microsoft Security
- How we found a tcpdump vulnerability using cloud fuzzing | softScheck
10. PCMark 10
PCMark 10 is an industry standard benchmark popular with enterprises with large numbers of Windows 10 end points. Beyond physical PCs/Laptops this is a popular tool in the EUC community for testing VMs (Virtual Machines) on Cloud/Citrix/VMware especially for office/task worker like workloads, vExpert Ryan Mangan’s blog carries good examples.
eG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces,
web applications, SaaS services, cloud and containers from a single pane of glass.
Further Reading:
For those interested in the field of Chaos Engineering, a few good links (thanks to EUC gurus including James Rankin, Tobias Kreidl and others) include:
- Chaos Engineering Saved Your Netflix
- An Overview of Chaos Engineering Best Practice from the Microsoft Azure Team
- Chaos Engineering: the history, principles, and practice
- Making Microservices More Resilient with Chaos Engineering (from NGINX part of F5)
- Chaos Engineering – Building Confidence in System Behavior Through Experiments
Note: All third-party tools, products and articles referenced are not officially endorsed by eG Innovations and should be used at the reader’s discretion.
eG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces,
web applications, SaaS services, cloud and containers from a single pane of glass.