Making Java Applications Run Faster
Application developers and application operations personnel are together responsible for ensuring that Java applications perform well. In an earlier blog, we had discussed 7 configurations that Application Operations teams can use to make their Java applications high-performing. In this blog, we will focus on Application Developers and discuss 6 ways in which they can enhance the performance of their Java applications and make Java run faster.
eG Enterprise is a Java-based web application performance monitoring, diagnosis, and reporting solution. Most of the tips in this article are based on techniques that our development team uses to make our solution highly scalable and performing well.
Several articles over the years have documented best practices you can follow to make your Java applications’ performance better. For instance, see https://blog.jooq.org/2015/02/05/top-10-easy-performance-optimisations-in-java/. Common recommendations include:
- String concatenation using StringBuilder
- Avoiding regular expressions
- Avoiding iterators
- Avoiding recursion where possible
- Using primitive types where possible
1. | Select the Java collection to use in your application carefully |
Many large-scale Java applications will have to deal with a large amount of data. The Java Collections Framework is a collection of interfaces and classes, which helps in storing and processing the data efficiently. Let’s explore how the right use of Java collections and related methods can significantly boost the performance of your Java application.
Bear in mind that ArrayList, HashMap, etc. are not synchronized whereas collections such as Vectors and Hashtables are synchronized. Multi-threaded applications may need thread-safe access and hence, synchronized collections need to be used in such cases. Using Vectors or Hashtables within a thread when synchronized access is not required should be avoided. You can get multi-fold performance gains by avoiding unnecessary use of synchronized collections.
Inadvertent use of synchronized objects will result in frequent thread-blocking and could result in your application appearing to be very slow to users. If you must use synchronized objects, use collections such as ConcurrentHashMap that allow multiple threads to be accessing different parts of the same object at the same time.
2. | Minimize the number of method calls you need to make to access collections in your Java applications |
Collection objects offer several access methods – to check the keys, to check the values, size of the collection, to put objects, etc. Many a times, developers have a tendency to check if a Hashtable or a HashMap has a key and then get the value of that key for processing. See the code snippet below:
Hashtable test = new Hashtable(); … for (int i=0; i < vars.size(); i++) { if(h.containsKey((String) vars.get(i))) { String value = (String) h.get((String) vars.get(i)); … } }
In the above code snippet, the index of the Hashtable is first checked and then the value is accessed. This requires two method calls – both of which involve synchronized access to the Hashtable. A far more efficient way to do this is represented by the code snippet below:
Hashtable test = new Hashtable(); … for (int i=0; i < vars.size(); i++) { String value = (String) h.get((String) vars.get(i)); if (value != null) { … } }
In this case, there is one less access to the Hashtable for each iteration of the loop. If you are making 1000s of accesses to the Hashtable in a loop, this could save 1000s of method calls. This is a simple way to boost the speed of your Java applications without any change in the code logic.
3. | Use “contains” with caution in your Java applications |
Lists, ArrayLists, and Vectors have a contains method that allows programmers to check if a collection already has a similar object. You may be iterating through a large sample and, very often, you may need to find a list of unique objects in the sample. Your code might look like this:
ArrayList al = new ArrayList(); … for (int i=0; i < vars.size(); i++) { String obj = (String) vars.get(i); if (!al.contains(obj)) { al.add(obj); } }
Functionally, this code is fine, but from a performance standpoint, you are checking whether the ArrayList contains the object on every iteration of the loop. The contains method scans the entire ArrayList each time. So, as the ArrayList gets bigger, the performance penalty increases.
You will be better off adding all the samples to the ArrayList first, conducting a duplicate check once, using a collection such as a HashSet that inherently provides uniqueness, and creating the unique ArrayList once. Instead of having potentially 1000s of contains checks on the ArrayList, now you have a one-time duplicate check.
ArrayList al = new ArrayList(); … for (int i=0; i < vars.size(); i++) { String obj = (String) vars.get(i); al.add(obj); } al = removeDuplicates(al); … static ArrayList removeDuplicates(ArrayList list) { if (list == null || list.size() == 0) { return list; } Set set = new HashSet(list); list.clear(); list.addAll(set); return list; }
The table below shows the time difference between our original code and the modified code above:
List Size | 100 | 1000 | 10000 | 100000 |
Original Code | 0ms | 5ms | 171ms | 49,820ms |
Modified Code | 0ms | 1ms | 7ms | 28ms |
As you can see above, the savings from the performance tweaks we made increases as the size of the ArrayList being accessed increases. When your application is dealing with larger data sets, the importance of performance tuning is more.
4. | Use Maps if you are handling large data sets and need to index/search them in your Java applications |
Java developers have been taught to use Hashtables and HashMaps when dealing with key value pairs and to use a collection like a LinkedList, ArrayList, Vector, etc. when there are a list of items, respectively. When you have a large list and you need to search this list, you have to use the contains method of the List object. Consider the code snippet below:
ArrayList al = new ArrayList(); //insert records into the ArrayList … for (…) { … boolean b = al.contains(astring); }
Now, instead of using an ArrayList, let’s use a HashMap instead:
HashMap myHash = new HashMap(); //insert records into the HashMap … for (…) { … boolean b = myHash.contains(astring); }
The table below summarizes the performance difference between the two approaches as the list size increases:
List Size | 100 | 1000 | 10000 | 100000 |
When using ArrayList | 0ms | 6ms | 186ms | 62,105ms |
When using HashMap | 0ms | 0ms | 1ms | 8ms |
As you can see from the table above, the performance improvements from using HashMap rather than ArrayLists to hold large objects that need to be searched or indexed is very significant as the number of objects involved is higher. Similar performance improvements have also been reported in earlier analysis too.
The figure below shows how the performance improvements translate into lower resource usage of the Java application as well. Here you see the CPU usage of the JVM before and after the optimization was applied. The JVM’s CPU usage which was around 50% on an average dropped to 20% after this optimization was applied – that’s a 60% reduction in CPU usage, a significant cost saving, if you are running the application in a cloud environment like AWS or Azure.
5. | Remember Java is full of References |
Reference: A reference is a variable that refers to something else and can be used as an alias for that something else.
Pointer: A pointer is a variable that stores a memory address for the purpose of acting as an alias to what is stored at that address.
So, a pointer is a reference, but a reference is not necessarily a pointer. Pointers are particular implementations of the concept of a reference, and the term tends to be used only for languages that give you direct access to the memory address.
Source: https://www.geeksforgeeks.org/is-there-any-concept-of-pointers-in-java/
This understanding of references is useful when dealing with collections. If you remember that Java objects are references, you will be able to optimize your code for better performance. Check this example below:
HashMap myHash = new HashMap (); … if (myHash.containsKey(astring)) { ArrayList myList = (ArrayList) h.get(astring); myList.add(avalue); myHash.put(astring,myList); }
As Java uses references and the ArrayList myList is a reference, the put method call above is not necessary. If you were doing this in a loop that executes a 1000 times, that’s a 1000 method calls done unnecessarily in the example above. And if myHash had been a Hashtable, the time saved is even more – one synchronized method call less in every iteration of the loop!
The time saved here is not as dramatic as in the previous cases. When processing 100,000 objects, the performance gain is about 33%.
6. | Be careful when you use synchronization in your Java applications |
Synchronization in C and C++ was difficult to do – mutexes and semaphores had to be used carefully. Java makes it really easy to use synchronized blocks. Anyone can use synchronized blocks and methods without knowing the performance impact of such usage. Improper use of synchronization can result in thread-blocking or even thread deadlocks. Read more on synchronization of Java threads in our earlier blog.
A common mistake made by Java programmers is to synchronize on static variables unknowingly. You might have defined a variable in one class that point to a string as below:
String string1 = “syncstring”;
You might have a completely different variable defined in another class:
String string2 = “syncstring”;
If you use synchronized blocks that synchronize on string1 in the first class and that synchronize on string2 in the second class, you might expect that these two synchronized blocks cannot interfere with each other. Unfortunately, string1 and string2 are pointers to the same string “syncstring”. So, when the two classes are executed, they have a possibility of interfering with each other. Therefore, to minimize interference between classes, create new instances in each class. In the above example, string1 should have been set to new String(“syncstring”).
For best performance, it is important to keep the amount of processing done inside synchronized methods and blocks to an absolute minimum. If there are activities you can perform outside the synchronized blocks/methods, do that first for improved performance.
Conclusion
Performance is not always top of the mind for application developers. But, if you are developing Java applications that will handle large volumes of data, performance considerations are especially important and will affect how users perceive your applications.
As you have seen in the examples above, application developers can achieve an order of magnitude better performance for Java web applications, if one keeps the 6 best practices presented here in mind.
eG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces,
web applications, SaaS services, cloud and containers from a single pane of glass.