Article From:https://www.cnblogs.com/fswhq/p/ConcurrentHashMap.html

Reference link: https://www.cnblogs.com/chengxiao/p/6842045.html

https://www.cnblogs.com/ITtangtang/p/3948786.html

Background:

As we all know, hash tables are very efficient and O (1) complex data structures. In Java development, the most common and frequent use of hashMap and HashTable is, but in threaded concurrency scenarios, it is not reasonable to use them.

  HashMap :First say HashMap, HashMap is.Thread is not secureIn the concurrent environment, it may be formed.Ring chain list(Expansion can occur), causing the CPU to idle during get operations, so using HashMap in a concurrent environment is very dangerous.

  HashTable : HashTableAlmost the same as the principle of HashMap, the difference is nothing more than1.HashTableKey and value are not allowed to be null; 2.HashTable is thread safe.But the hashTable thread-safe policy is too expensive to implement, simple and crude, and all get / put operations are synchronized, which is equivalent to adding a hash table to the entire hash tableLarge lock,When multithreaded access occurs, as long as one thread accesses or manipulates the object, the other threads can only block, equivalent to all operations.Serialization,In a highly competitive concurrent scenario, the performance is very poor.

Lock segment Technology

    HashTableContainers are inefficient in a highly competitive concurrency environment because all threads that access HashTable must compete for the same lock. If there are multiple locks in the container, each lock is used to lock part of the container’s data, then when multiple threads access data in different segments of the containerThere is no lock contention between threads, which can effectively improve the efficiency of concurrent access. This is the lock segmentation technique used by ConcurrentHashMap, which first divides the data into pieces of storage, and then assigns a lock to each piece of data, when a thread occupies the lock to access it.When a segment of data is available, data from other segments can also be accessed by other threads. Some methods need to span segments, such as size () and containsValue (), and they may need to lock the entire table rather than just one segment, which requires locking all segments sequentially, then pressingThe locks of all segments are sequentially released. Here “in order” is important, otherwise deadlocks are most likely to occur. Within ConcurrentHashMap, segment arrays are final, and their member variables are actually final, but simply declare arrays as finsAl does not guarantee that the members of the array are also final, which requires assurance of implementation. This ensures that deadlocks do not occur because the order of locks is fixed.
ConcurrentHashMapIt is made up of Segment array structure and HashEntry array structure. Segment is a reentrant lock ReentrantLock that acts as a lock in Concurrent HashMap and HashEntry is used to store key valuesFor data. A ConcurrentHashMap contains an array of Segments, which are structured like HashMaps, are arrays and linked lists, and a Segment contains a HashEntry array, each HAshEntry is an element of a linked list structure. Each Segment daemon has an element in a HashEntry array. When modifying the data of the HashEntry array, it must first obtain its corresponding Segment lock.
 

Two, application scenarios

    When you have a large array that needs to be shared by multiple threads, you can consider whether to lay it out to multiple nodes to avoid large locks. And we can consider some module localization through hash algorithm.
It’s not just for threads, but when you design a table transaction (which in a sense is a synchronization mechanism), you can think of a table as an array that needs to be synchronized, and if you’re manipulating too much table data, you can consider the separation of transactions (which is why large tables are avoided), such as putting data inRow fields are split and horizontal tables are divided.
 

Three, source code interpretation

    ConcurrentHashMap(1.7And before that, the main entity classes are three: ConcurrentHashMap (the entire Hash table), Segment (bucket), HashEntry (node), corresponding to the relationships shown in the graph above
/** 
* The segments, each of which is a specialized hash table 
*/  
final Segment<K,V>[] segments;

Invariance (Immutable) and volatility (Volatile)

    ConcurrentHashMapIt allows all read operations to be carried out concurrently, and reading operations do not need to be locked. If traditional techniques, such as HashMap implementations, were allowed to add or delete elements in the middle of the hash chain, read operations without locking would yield inconsistent data. ConcurrentHashMap implementation technology is to ensure that HashEntry is almost immutable. HashEntry represents a node in each hash chain. Its structure is as follows:
 static final class HashEntry<K,V> {  
     final K key;  
     final int hash;  
     volatile V value;  
     final HashEntry<K,V> next;  
 } 
  You can see that all values are final except that value is not final, which means you can’t add or delete nodes from the middle or tail of the hash chain because you need to modify the next reference value, and all nodes can only be modified from the beginning. For put operation, you canAll add to the head of the Hash chain. But for the remove operation, you may need to delete a node from the middle, which requires you to copy all the previous nodes of the deleted node once, and the last node points to the next node of the deleted node. This will be detailed when explaining the delete operation. byTo ensure that the read operation can see the latest value, set value to volatile, which avoids locking.

Other

  The number of hash slots in each segment is 2 ^ n in order to speed up the positioning segment and the hash slots in the segment, which makes it possible to locate the hash slots in the segment and segment by bit operation. When the concurrency level is the default value of 16, that is, the number of segments, the higher 4 bits of the hash value decide where to allocate.In a paragraph. But let’s not forget the lesson from Introduction to Algorithms: the number of hash slots should not be 2 ^ n, which may lead to uneven distribution of hash slots, which requires hash values again. (this paragraph seems to be a bit redundant).
 

Positioning operation:

  The initialization method takes three parameters, initialCapacity 16, loadFactor 0.75 (load factor, to refer to when expanding), and concurrentLevel 16, if the user does not specify them.

  SegmentThe size of the ssize array is determined by the concurrent level, but it is not necessarily equal to the concurrent level. The ssize must be a power greater than or equal to the smallest 2 of the concurrent level. For example, by default.The lower concurrent level is 16, then ssize is 16; if concurrent level is 14, ssize is 16; if concurrent level is 17, then ssize is 32. Why Segment?The size of the array must be the power of 2? In fact, it is mainly convenient to locate Segment’s index by bitwise and hashing algorithm.

  segmentShiftThe main purpose of these two global variables is to locate Segment, int j = (hash & gt; & gt; & gt; segmentShift) & amp; segmentMask.

  segmentMask:Segments mask, if the segments array length is 16, the segments mask is 16-1 = 15, the segments length is 32, and the segments mask is 32-1 = 31. All bit bits thus obtained are 1, which can better guarantee the uniformity of hashing.

  segmentShift:2The sshift power is equal to ssize, segmentShift=32-sshift. If segments length is 16, segmentShift=32-4=28; if segments length is 32, segmentShIft=32-5=27. The maximum hash value calculated is 32 bits, and the unsigned right-shift segmentShift means that only a few bits are retained (the rest is useless), and the segmentMask bits of the segment mask are then manipulated to locate the Segment.

 final Segment<K,V> segmentFor(int hash) {  
     return segments[(hash >>> segmentShift) & segmentMask];  
 }
  Since ConcurrentHashMap uses segment locking Segments to protect different segments of data, you must first locate the Segment through a hash algorithm when inserting and retrieving elements. You can see that ConcurrentHashMap will make first.Use Wang/Jenkins hash variant algorithm to carry out a re hash on the hashCode of the element.
The goal of re-hashing is to reduce hash collisions and make elements evenly distributed over different Segments, thus improving the storage efficiency of containers. If the hash is poorly massed, then all the elements are in one Segment, and not only are access elements slow, the segment lock is also lostSignificance. I did a test without hashing directly to perform hash computing.
 
System.out.println(Integer.parseInt(“0001111”, 2) & 15);
System.out.println(Integer.parseInt(“0011111”, 2) & 15);
System.out.println(Integer.parseInt(“0111111”, 2) & 15);
System.out.println(Integer.parseInt(“1111111”, 2) & 15);
 
The output hash value after calculation is all 15. This example shows that if no hashing is done, the hash conflict will be very serious, because as long as the low bit is the same, no matter what the high bit is, the hash value will always be the same. We then hash the binary data above as follows: for convenience of reading,Less than 32 of the highs were supplemented by 0, and four points were separated by vertical lines.
 
0100|0111|0110|0111|1101|1010|0100|1110
1111|0111|0100|0011|0000|0001|1011|1000
0111|0111|0110|1001|0100|0110|0011|1110
1000|0011|0000|0000|1100|1000|0001|1010
 
It can be found that each bit of data is hashed, and this hashing allows each of the numbers to participate in the hash operation, thereby reducing hash collisions. ConcurrentHashMap locate segment through the following hash algorithm.
By default, the segmentShift is 28, the segmentMask is 15, and the maximum number after hashing is 32 bits of binary data, moving 28 bits to the right unsigned, meaning four bits higher (hash & gt; & gt; & g)The results of t; segmentShift) & amp; segmentMask are 4, 15, 7, and 8, respectively, and you can see that hash values do not conflict.
final Segment<K,V> segmentFor(int hash) {
    return segments[(hash >>> segmentShift) & segmentMask];
}

 

data structure

 
  All members are final, where segmentMask and segmentShift are primarily for locating segments, see the segmentFor method above.
  As for the basic data structure of Hash table, we do not want to do much discussion here. An important aspect of Hash tables is how to resolve hash conflicts. ConcurrentHashMap and HashMap use the same approach, placing nodes with the same hash valueIn a hash chain. Unlike HashMap, ConcurrentHashMap uses multiple sub Hash tables, that is, Segment.
Each Segment is equivalent to a sub Hash table whose data members are as follows:
static final class Segment<K,V> extends ReentrantLock implements Serializable {    
         /** 
          * The number of elements in this segment's region. 
          */
         transient volatileint count;  
         /** 
          * Number of updates that alter the size of the table. This is 
          * used during bulk-read methods to make sure they see a 
          * consistent snapshot: If modCounts change during a traversal 
          * of segments computing size or checking containsValue, then 
          * we might have an inconsistent view of state so (usually) 
          * must retry. 
          */
         transient int modCount;  
         /** 
          * The table is rehashed when its size exceeds this threshold. 
          * (The value of this field is always <tt>(int)(capacity * 
          * loadFactor)</tt>.) 
          */
         transient int threshold;  
         /** 
          * The per-segment table. 
          */
         transient volatile HashEntry<K,V>[] table;  
         /** 
          * The load factor for the hash table.  Even though this value 
          * is same for all segments, it is replicated to avoid needing 
          * links to outer object. 
          * @serial 
          */
         final float loadFactor;  
 } 
  countIt is used to count the number of data in this segment, which is volatile, and it is used to coordinate modification and read operations to ensure that read operations can read almost the latest changes. The coordination is that each modification makes a structural change, such as adding / deleting nodes (the value of the modification node is not structured)Write the count value, and read the count value at the beginning of each read operation. This takes advantage of the enhancement of volatile semantics in Java 5, where there is a happens-before relationship between writing and reading the same volatile variable. MOdCount statistics segment structure changes the number of times, mainly to detect whether a segment has changed during the traversal of multiple segments, will be described in detail in the cross-section operation. Threashold is used to indicate the threshold value of rehash. Table array storage segmentNodes, each array element is a hash chain, expressed in HashEntry. Table is also volatile, which enables you to read the latest table value without synchronizing. LoadFactor represents the load factor.

Delete operation remove (key)

public V remove(Object key) {  
   hash = hash(key.hashCode());   
   return segmentFor(hash).remove(key, hash, null);   
}

 

   The entire operation is to locate the segment first, and then delegate it to the remove operation of the segment. When multiple deletions occur concurrently, they can be performed simultaneously as long as they are in different segments.
The following is the remove implementation of Segment:
 V remove(Object key, int hash, Object value) {  
     lock();  
     try {  
         int c = count - 1;  
         HashEntry<K,V>[] tab = table;  
         int index = hash & (tab.length - 1);  
         HashEntry<K,V> first = tab[index];  
         HashEntry<K,V> e = first;  
         while (e != null && (e.hash != hash || !key.equals(e.key)))  
             e = e.next;  
         V oldValue = null;  
         if (e != null) {  
             V v = e.value;  
             if (value == null || value.equals(v)) {  
                 oldValue = v;  

                 // All entries following removed node can stay  
                 // in list, but all preceding ones need to be  
                 // cloned.  
                 ++modCount;  
                 HashEntry<K,V> newFirst = e.next;  
                 *for (HashEntry<K,V> p = first; p != e; p = p.next)  
                     *newFirst = new HashEntry<K,V>(p.key, p.hash,  
                                                   newFirst, p.value);  
                 tab[index] = newFirst;  
                 count = c; // write-volatile  
             }  
         }  
         return oldValue;  
     } finally {  
         unlock();  
     }  
 }
  The whole operation is performed with a segment lock, and the line before the blank line is mainly located at the node e to be deleted. Next, if this node does not exist, it returns directly to null, otherwise it copies the node before e and points to the next node of E. The nodes behind e do not need.Replication, they can be reused.
What is the for loop in the middle for? (* flag) From the code point of view, is it necessary to clone and spell back all entries after location? Every time you delete an element, you want to clone the elements before that? This is actually determined by the invariance of entry.Looking carefully at the entry definition, it turns out that all attributes except value are decorated with final, which means that you can’t change the next field the first time you set it, instead cloning all the previous nodes once. As to why entry should be set upThis is invariance, which is related to the invariance of access without synchronizing and saving time.
The following is a schematic diagram.
                                        Before deleting elements,
 
 
 
                                        
                                        After deleting element 3,
 
 
 
  The second graph is actually a bit of a problem. The duplicated node should be the node with a value of 2 in the front, and the node with a value of 1 in the back, which is just the opposite of the original node order. Fortunately, this does not affect our discussion.
The whole remove implementation is not complicated, but we need to pay attention to the following points. First, when the node to be deleted exists, the last step to delete is to reduce the value of count to one. This must be the last step, or the read operation may not see the structural changes made to the segment before. Second, RThe emove starts executing by assigning a table to a local variable tab, because the table is a volatile variable and reading and writing volatile variables is expensive. The compiler can not optimize the read and write of volatile variables directly.Asking non volatile instance variables has little impact, and the compiler will optimize accordingly.

getoperation

ConcurrentHashMapThe get operation is directly entrusted to the get method of Segment, directly looking at the get method of Segment:
V get(Object key, int hash) {  
     if (count != 0) { // read-volatile Is the number of data in the current bucket 0?HashEntry< K, V> E = getFirst (hash); get the header node.While (E! = null){If (e.hash = = hash & & key.equals (e.key)) {V v = e.valUE;If (V! = null)Return V;ReturnReadValueUnderLock (E); / / recheck}E = e.next;}}Returnnull;}
getOperation does not require locks.
  Unless the read value is empty, the lock reads. We know that the get method of the HashTable container needs to be locked. So how does ConcurrentHashMap get operate unlocked? The reason is that the shared variables will be used in its get method.Are defined as volatile
 
  The first step is to access the count variable, which is a volatile variable, which guarantees that the get operation will get almost the latest structural updates, since all modifications will write the count variable in the last step when making structural changes. For non structural updates, too.It’s a change in node values, and since the value variable of HashEntry is volatile, it’s also guaranteed to read the latest value.
 
  The next step is to traverse the hash chain according to hash and key to find the node to get, if not, directly visit null. The reason why the hash chain is traversed does not need to be locked because the chain pointer next is final. But the head pointer is not final.This is returned by the getFirst (hash) method, that is, the value in the table array. This makes getFirst (hash) possible to return an outdated header node, for example, when the get method is executed, it just executes getFirst (hash)Later, another thread executes the delete operation and updates the header node, which causes the header node returned in the get method to be not up-to-date. This allows the get to read almost the latest data, though not possibly the latest, through the coordination mechanism of the count variable. To get the latest data,Only full synchronization is adopted.
 
  Finally, if the desired node is found, its value will be returned directly if it is not empty, otherwise it will be read again in the locked state. It seems puzzling that theoretically the value of a node can’t be empty, because it’s judged when put, and NullPointer is thrown if it’s empty.Exception. The only source of null values is the default value in HashEntry, because the value in HashEntry is not final and it is possible to read null values asynchronously. Take a closer look at the statement of put operation: tab[index]= new HashEntry & lt; K, V & gt; (key, hash, first, value), in this statement, the assignment of value in the HashEntry constructor and the assignment of tab [index] are possibleBeing reordered, this may cause the node to be empty. In this case, when V is empty, it is possible that a thread is changing the node, and the previous get operation is not locked, according to the Bernstein condition, read-write or read-write will cause inconsistencies in the data, so here to RE-E this eLock and read again to ensure that you get the correct value.
 V readValueUnderLock(HashEntry<K,V> e) {  
     lock();  
     try {  
         return e.value;  
     } finally {  
         unlock();  
     }  
 }
  For example, the count field used to calculate the current Segement size and the value for storing the value of HashEntry. Variables defined as volatiles can be visible between threads, read simultaneously by multiple threads, and are guaranteed not to read expired values, butIt can only be written by single threads (in one case it can be written by multiple threads, that is, the value written does not depend on the original value), and in the get operation it is only necessary to read and write the shared variables count and value without locking. The reason why we do not read the expired value is based on the JAVA memory model.The happenbeforehand principle is that writing to volatile fields precedes reading. Even if two threads modify and retrieve volatile variables at the same time, the get operation gets the latest value. This is a classic application scenario where volatile replaces locks.

putoperation

Similarly, the put operation is entrusted to the put method of the segment. The following is the paragraph put method:
 V put(K key, int hash, V value, boolean onlyIfAbsent) {  
     lock();  
     try {  
         int c = count;  
         if (c++ > threshold) // ensure capacity  
             rehash();  
         HashEntry<K,V>[] tab = table;  
         int index = hash & (tab.length - 1);  
         HashEntry<K,V> first = tab[index];  
         HashEntry<K,V> e = first;  
         while (e != null && (e.hash != hash || !key.equals(e.key)))  
             e = e.next;  
         V oldValue;  
         if (e != null) {  
             oldValue = e.value;  
             if (!onlyIfAbsent)  
                 e.value = value;  
         }  
         else {  
             oldValue = null;  
             ++modCount;  
             tab[index] = new HashEntry<K,V>(key, hash, first, value);  
             count = c; // write-volatile  
         }  
         return oldValue;  
     } finally {  
         unlock();  
     }  
 }
  This method is also executed in the case of holding segment locks (locking the entire segment), which of course is for the sake of concurrent security. Modifying data cannot be done concurrently. A statement must be made to determine whether the limit is exceeded to ensure rehash when the capacity is insufficient. Next is to find the same K.The node of ey directly replaces the value of this node if it exists. Otherwise, create a new node and add it to the head of the hash chain. Make sure to modify the modCount and count values, as well as the count values, in the last step. The put method called re.Hash method, reash method is also very clever implementation, mainly using the size of the table is 2 ^ n, I will not introduce here. What is more difficult is the int index = hash & (tab.length – 1), the original sThe real hashtable is in egment, which means that each segment is a traditional hashtable, as shown in the figure above. The difference can be seen from the structure of the two, which is to find out where the desired entry is on the table.The entry to is the first node in the chain. If e! = null, it means that the value of the node is replaced (onlyIfAbsent = = false). Otherwise, we need a new entry, followed by first, and letTab[index] points to it. What does that mean? In fact, the new entry is inserted into the chain head, and the rest is very easy to understand.
 
  Because the put method needs to write to the shared variable, it is necessary to lock the shared variable for thread safety. The Put method first locates to Segment, then inserts the operation in Segment. There are two steps to insert.Whether you need to expand the HashEntry array in the Segment, the second step is to locate the location of the added element and put it in the HashEntry array.
  • Expansion is required. Before inserting elements, the HashEntry array in the Segment is judged to exceed the threshold, and if it exceeds the threshold, the array is expanded. It is worth mentioning that the expansion of Segment is more appropriate than HashMap.Because HashMap determines if the element has reached capacity after inserting the element, it expands if it has arrived, but most likely no new element is inserted after the expansion, then HashMap makes an invalid extension.
  • How to expand capacity. When expanding, you first create an array that is twice the original capacity, then hash the elements of the original array and insert them into the new array. In order to be efficient, ConcurrentHashMap will not expand the entire container, but only one segment.Expand the capacity.
 
Another operation is containsKey, which is much simpler to implement because it does not need to read values:
 
 boolean containsKey(Object key, int hash) {  
     if (count != 0) { // read-volatile  
         HashEntry<K,V> e = getFirst(hash);  
         while (e != null) {  
             if (e.hash == hash && key.equals(e.key))  
                 returntrue;  
             e = e.next;  
         }  
     }  
     returnfalse;  
 } 

size()operation

  If we want to count the size of the entire ConcurrentHashMap element, we have to sum it up by counting the size of all the elements in the Segment. The global variable count in Segment is a volatile variable. Then, under the multi thread scenario, IDo we add all the Segment counts directly to get the entire ConcurrentHashMap size? No, although you can get the latest value of the count for each Segment when you add it up, it may be used before you add it up after you get itCount has changed, so the statistics are not allowed. So the safest way to do this is to lock all the Segment’s put, remove, and clean methods while counting size, but it’s obviously very inefficient.
  Because the probability that the previously accumulated counts will change is very small during the cumulative count operation, Concurrent HashMap tries to count each Segment size twice without locking the Segment, if you doDuring the process, the container’s count changes, and then locks are used to count all the Segments.
  So how does ConcurrentHashMap determine whether the container has changed during statistics? Using the modCount variable, you add the modCount variable by 1 before you manipulate the element in put, remove, and clean methods.Then compare the modCount before and after the size statistics to see if the size of the container has changed.

Leave a Reply

Your email address will not be published. Required fields are marked *