Most of the developers know the benefits of threads (responsiveness, exploiting multicores, etc), most of them also know the risks of threads (data inconsistency, deadlocks, context switch overhead, etc), but not all of them know how to minimize the risks while retaining the benefits. So here’s my humble attempt to simplify the understanding in the context of Java.
We know that thread synchronization is needed when multiple threads access some changeable data and one of them might change it. This mechanism in java enables us to enforce 2 important things: atomicity and visibility. Without atomicity (i.e. multiple actions as a single action without any interference of others in between), we may have race conditions which occur when the correctness of a computation depends on the relative timing or interleaving of multiple threads by the runtime. Memory visibility is also very important as without synchronization, a thread may not see the latest value of a variable that is changed by some other thread (this is to facilitate optimizations, like instructions reordering or caching variable values, by compiler, runtime and processor in the context of concurrency).
But here I’m going to suggest NOT to use synchronization (or at least not using locks) if possible. Why? because it can open a can of dangerous worms if you use it without utmost care. I’m not against the proper use of synchronization, but that is very hard to achieve and the explicit synchronization (i.e. lock based, using synchronized keyword) should be used as a last resort (in my opinion).
To start with, if you synchronize a big part of your code (like all methods synchronized), you may not get the benefit of concurrency at all, as all threads will execute it one after the other, and scalability suffers badly because of this. If you reduce the lock scopes too much and use too many locks, you increase the performance overload due to high context switching. Applying synchronization may also prevent the various optimizations done by compiler and runtime (like caching, reordering of instructions), thus limiting the performance. Then there are some serious liveness problems (like deadlocks, there’s no way to recover other than aborting the application). The indiscriminate use of locking may result into lock ordering deadlocks. For example 2 threads trying to acquire the same locks but in different order (T1: lockA -> lockB, and T2: lockB -> lockA), which may cause cyclic locking dependency and thus deadlock. Just as threads can deadlock when they are each waiting for a lock that the other holds and will not release, they can also deadlock when waiting for resources. Such programs also increase complexity and difficult to understand (which in turn may cause other problems). And last but highly important, such programs are very hard to test for correctness and performance.
So what are alternatives/ better ways?
The best way to avoid coordination between threads is not to share. If an object is restricted to a thread it’s automatically safe from all those hazards. Its not just superficial to achieve, rather this model has been intentionally implemented in systems like Swing and others, because the problems it could have created are just too costly for those systems (deadlocks in GUI toolkits). Rather than sharing, we can use local variables (scoped within a method) as far as possible as threads keep local copies of them, avoiding any risk of sharing (but take care of not escaping the local objects from the method, like assigning it to an instance variable). Java also provides ThreadLocal class which makes it easy to use a variable in multithreading as it internally manages the copies of the variable for each thread. We can use a shared ThreadLocal variable and use its getter/setter methods to get/set value without worrying about which value is associated with which thread (get method always provides the value associated with the current thread). Check Thread Confinement.
The other best way is to share but make them immutable. If threads cannot change the state of an object, there is no risk. Initially its applicability may sound very little, but it’s not so. The popularity of Functional Programming lies in it as immutable data has no chance of side effects. You can also have a new immutable object when required, think of String in Java. Though making object immutable isn’t just about making every field final as the object a field variable refers to may be mutable. Immutable objects are those which state cannot be changed once they are constructed. So we need to be careful in the constructor that it does not escape. Check To mutate or not to mutate?. If you do not want to/ can not make your classes immutable (still suggest make the parts of it immutable as far as possible to reduce the side effect), you can make deep copies of object and pass to threads (if its affordably small in memory consumption), and later at the end can merge those copies if required. (Recently in my current project, I wanted to achieve concurrency for some tasks, so rather than synchronizing I identified the minimum changeable data required and provided new copies of it while entering in concurrent tasks, and when they finished, I merged those copies to have a single one as before to move ahead in the process flow. At the end I was very happy with its simplicity)
In simple cases like shared status flag variable where you do not require atomicity of operations, you can use volatile variable. Read of a volatile variable always gives us the latest write by any thread (i.e. threads always see the latest value, which is not guaranteed in java without volatile or synchronization). It obtains no locks, so none of those hazards but its use is limited. Check Managing volatility.
Built-in Concurrent Collections
Java (5.0+) provides some very useful collection classes specially designed for concurrency. They are powerful in terms of performance and scalability with very little risk compared to Collections.synchronizedXxx methods/ Vector/ Hashtable/ your own lock based synchronization. They use finer grained locking mechanism (like lock striping) and add support for some useful common compound actions like put-if-absent, replace and conditional remove. Some important classes are ConcurrentHashMap, CopyOnWriteArrayList (creates a new copy of the collection internally every time it is modified, well, thanks to immutability), ConcurrentLinkedQueue, LinkedBlockingQueue (blocking queue provides internal waiting on insertion and retrieval operations when the queue is full or empty respectively). Check Concurrent Collections and more.
Built-in Synchronizers/ Coordinators
Synchronizer object is something that coordinates the control flow of threads based on its state. Java (5.0+) identified some common synchronization patterns and provided classes for that like latches, semaphores, barriers etc. Blocking queue is a special collection as it also provides coordination in producer-consumer pattern (through blocking). Now the benefit is they use the minimum synchronization required and they are well tested, so we can rely on them. A latch allows threads to wait until a certain number of events have occurred. The set of events could be initialization of certain resources, starting of certain services or readiness of certain users on which threads want to wait before proceeding. Barrier is like a latch but rather than waiting for events, it waits for other threads to come at certain point (barrier). With a barrier, all threads must come together at a barrier point at the same time to proceed. They are useful for example when we want to execute one step’s tasks in parallel but all these tasks must be completed before starting next step’s tasks in parallel (because the combined results of step one are required in the next step), kind of MapReduce. Counting semaphore is useful when you want to implement some resource pool or put a bound on a collection. For example you can implement database connection pool where it blocks if the pool is empty and unblocks when it becomes non empty. Similarly we can use semaphore to convert a collection into a blocking bounded collection, e.g. bounded HashSet. Check Synchronization Utilities. You can also check ReentrantLock and Thread Pools.
Atomic Variables and Nonblocking Sysnchronization
Many of the java.util.concurrent classes are significantly better in performance and scalability as they use atomic variables and nonblocking synchronization. Atomic variables (like AtomicInteger, AtomicReference, etc) are like volatile variables but also provide some useful methods (like incrementAndGet(), compareAndSet(), etc) to update them atomically without synchronization. Nonblocking algorithms use low level atomic machine instructions (like compare-and-swap) instead of locks to ensure data integrity. They offer high scalability and liveness advantages but are hard to design and implement. Using atomic variables in Java 5.0+ it is possible to build efficient nonblocking algorithms. Check Going atomic and Intro to nonblocking algorithms.
I know all these alternatives also have some trade-offs but it is always better to be aware of them so that we can use the right thing in the right context. Also if you’ve still not read a wonderful book, Java Concurrency in Practice, stop exploring internet and read it (a must for every Java developer).
I hope it helps somebody. Let me also know if it can be improved.
Happy simplicity, bye.