Image rain removal is an important issue of common concern in the field of image processing and computer vision. The traditional method of image restoration for image rain removal will fail in certain specific circumstances. In view of the rapid development of CNN in the field of computer vision and its good learning performance, more and more research has been made.The researchers apply CNN to the field of image recovery. In this paper, image deconvolution technology is reviewed from the point of view of image processing and physical modeling, combined with convolution neural network technology. It also introduces the basic principles and research progress of the typical rain raining CNN network in recent years, and gives these methods.The visual effect and objective evaluation of the data.
In recent years, with the continuous development of computer software and hardware technology, computing power is increasing, it is possible to remove rain from rain images. In turn, new requirements are put forward for the clarity and authenticity of the rain image. In rainy days, the visibility of the scene is low and the background scenes are obscured.The contrast and color of the target in the image will be attenuated to varying degrees, which leads to the ambiguous expression of the background information (the target image), which makes some video or image systems not working properly, so it is necessary to eliminate the impact of rain weather on the image scene. In fact, image to rain has always been image restoration and computation.The important content of the research in the field of machine vision is mainly used in the fields of video surveillance and autopilot, so the focus of research is the automation and real-time. This paper analyzes and draws on recent research hotspots, from the perspective of image processing and physical modeling, and combines convolution neural network technology to image.The rain removal technology is reviewed.
1 Detection and removal of rainwater based on single map depth
It is very important to restore the rain image in the application of the computer vision system. The rain will cover the background scene, cause the image deformation or blurred, and the rain will also produce the air shielding effect similar to the fog, obviously reducing the visibility of the image background. Whether it is raindrop dense rainstorm scene or raindrop accumulation.In the scenario, the method proposed by the team can solve the problem of rain from a single image. The main idea is the new Rain model and the deep network architecture based on this rainwater model.
1.1 Rainwater image model
The widely used rainwater model is shown in the following expressions:,B represents the background layer, that is, the target image to be acquired;SIt represents the raindrop layer;ORepresents the input image with raindrops (raindrop degraded image). Based on this model, image rain is considered as a “double signal separation problem”, that is to say, based on a given degraded image.O，Because the background layer and the rain mark layer have different characteristics, the two layers can be separated to get the target result. However, there are two defects in this model: first, the layer density is uneven, because the layer has rain marks in only part of the layer, and the effect of the unified sparse coding is not ideal; secondly, the solution of the letter.The problem of number separation does not distinguish between rain area and rain free area, which causes the processed background to be too smooth, resulting in deformation or blurring.
Based on the above defects, the above models are improved. It makes the location information of the rain mark not only including the contribution of the rain mark of the specific pixel position to the pixel value. A generalized rainwater model is obtained as follows:，This contains a variable based on the regionR，The location of a separate rain mark is shown. The variable is actually a two value graph with a value of “1” indicating a rain mark in the corresponding pixel position. The value is “0”, indicating that there is no rain mark in the corresponding pixel position. The reason whyS、RThey are described and used in network prediction respectively, so as to avoid only returning to S and affecting the parts without raindrops. YesRThe independent modeling has the following two advantages: first, to provide more information for the network to learn the rain mark area; secondly, we can detect the rainwater area and the non rainwater area, and do different treatment to the two, and can maximize the information of the background layer.
In the real scene, the two main problems, based on the different shapes and different directions, and the overlap between the rain marks, and the fog effect caused by rainstorm accumulation, lead to the reduction of the visibility in the distant scene. This method proposes a more detailed rainwater model, which contains many different models.The rain mark layer (the rain mark in each rain layer is consistent), and also contains the effect of global atmospheric light (used to simulate the fog effect produced by rain). The model formula is as follows:，HereSIt represents a rain mark layer, and the direction of rain marks in this layer is consistent.tIt is the index of the raindrop layer.AIt is the global atmosphere light, and its essence is to model the effect of the fog generated by the rainwater, and it is the global atmospheric optical transmission coefficient. The model also realizes a comprehensive state of rain effect and fog effect, which is closer to the actual rainfall effect. The target image based on this model is closer to the natural image.
1.2 Deep convolution neural network for joint rainwater detection and removal
Based on the above models, a deep network architecture for detecting and removing cyclic rainwater is proposed, which is shown in Figure 1.
Figure 1 circulating rainwater detection and removal network architecture. Each cycle uses a multitasking network for rainwater detection and removal (blue dot box).
Contextual context expansion network：The deep architecture includes a novel network structure, Contextualized dilated network based on situational information, which is used to extract the identifiable features of rainwater images, which is the basis for subsequent detection and removal.
Dilated convolution Technology(Dilated Convolutions)：Context information is very useful for the detection and recognition of the image rain area. The context based context expansion network can gather the multi – scale context information to learn the feature information of the rainwater image. Compared with ordinary convolution, expansion convolution has an expansion factor (Di) besides the size of convolution kernel.The lated factors parameter is used to represent the size of the expansion. The same point of the dilated convolution with the common convolution is that the size of the convolution kernel is the same. In the neural network, the number of parameters is constant, and the difference lies in the larger receptive field (Receptive FIE) with the expansion convolution.LD). In Figure 1, we can see that the network contains three convolution paths, each convolution path uses the convolution kernel of 3*3, the first convolution path uses the common convolution kernel and the other two paths use the expanded convolution technique, so there are different expansion factors, [DF = 1, 2, 3], and so on.The extracted features have different receptive fields [5*5, 9*9, 13*13]. Based on this idea, richer contextual information can be extracted to make the feature more robust.
Cyclic subnetwork：The blue dot box in Figure 1 is the structure of the subnetwork. The result of each cycle will generate a residual image T (*) accordingly, which will be used as the input of the next cyclic subnetwork, and the estimated residual values of each time accumulate with the network cycle. And the rainwater mask layer and raindrop layer required for each cycle.Not the same, but by losing and regularizing each time.
1.3 experimental result
Qualitative assessment：The following methods and other methods are given, based on the sampling results under the test data set of the same real rainwater images, in which the DSC (discriminant sparse coding) and the LP (layer prior) are compared, and the contrast results are shown in Figure 2 as follows:
Fig. 2 test results based on real images and different methods. Corresponding from left to right: input test images, DSC, LP and this method.
Quantitative assessment：The two measurements of peak signal to noise ratio (PSNR) and structural similarity (SSIM) are used to compare the data of different rain removal methods, and the greater the value of the corresponding measures, the better the results are. Table 1 is a comparison based on data sets Rain12 and Rain100L.
Table 1 based on data sets Rain12 and Rain100L, the results of PSNR and SSIM measurements with different rain removal methods are presented.
2 Using Attentive GAN to rain a single image
This method solves a more challenging problem, removing raindrops falling on glass or lenses. First, the original image information that is blocked by the rain is unknowable; secondly, the obscured background information will inevitably be lost; if the raindrops are larger and denser, the situation will become more spinous.Hand. This brings great difficulty to the solution of the problem.
To solve this problem, the team proposed the use of Attentive GAN. The main idea is to simulate the human visual attention (Visual attention), quantify the attention, and then apply it to the generation network (Ge).Nerative network) and discriminant network (Discriminative network) are trained. In the training process, quantitative visual attention can learn more rainwater area and its surrounding information. Therefore, visual attention is applied to the generationThe network and discriminant network can make the generation network better focus on the structure information of the rainwater region and its surrounding, and can also make the discriminant network obtain the local consistency information of the image restoration region.
2.1 rain image model
ThereIThe input image is represented.MIt is based on the whole image of each pixel binary mask (Binary-Mask, for pixel x, if it is covered by raindrops.M(x) = 1，Not forM(x) = 0）;BIt is the background of the image (that is, the desired target image).RIt is the complex effect of the impact of raindrop (image background information, environmental reflection light, and refraction light attached to the windshield or the raindrop of the lens, because the rain is transparent, because of the shape and refractive index of the raindrop, one of the pixels in the rainwater area is affected by the surrounding pixels.A comprehensive performance effect); operators represent pixel by pixel multiplication.
I to get the target imageB“. UseM guide to generate attention map(Attention map),GAN” to achieve the target image generation.
2.2 Attentive GAN network structure
3 > shows the whole network structure of the method. We can see that the network mainly consists of two parts: the generation network and the discriminating network. Given an image that is degraded by rain, the generation network attempts to generate as real as possible rain free images, and the discriminant network is used to verify whether the generated images are real enough.
Figure 3 Attentive GAN schema diagram
Generating network：As shown in Figure 3, the generation network contains two sub networks: the attention cycle subnetwork (Attentive-Recurrent Network) and the context self encoder subnetwork (Contextual Autoencoder). Attention cycleThe purpose of the subnetwork is to find the area that needs to be noticed in the input image, mainly the rainwater and the surrounding area that needs the focus of the context self encoder subnetwork. This can generate better local restoration images, in order to identify better network focus and evaluation.
Attention cycle subnetwork：The visual attention model can help locate the target area of an image and get the characteristics of the region. This model is also important for generating rain free images, because it allows the network to focus on the image restoration area. Figure 3 shows that the method uses circular networks to generate visual attention for quantized images.At each time step, its input is the original input image and the attention map of the last time step, which contains five ResNet residual block layers for extracting features, and a convolution LSTM unit, and a coiling layer for generating a 2D attention map.
The attention mapping graph obtained from each time step is a two-dimensional matrix. The value range of each element is 0 ~ 1, and the greater the value of the element, the greater value of attention is obtained for the element corresponding to the image region. Therefore, as a whole, the attention map obtained from each step as time goes by.The value of the element is increasing. Note: the input of the first time step is the original image and an initialized attention map.
It is meaningful to increase the attention mechanism, which can expand the area of concern and make the surrounding information of the rainwater area concerned; the different raindrops have different transparency, the background information can not be completely blocked, and the expansion of attention can capture some background information through the raindrop.
Context self encoder subnetwork：The subnetwork takes the original input image and the attention cycle sub network the last time step of the attention map as input to obtain a rain free image as the target. The depth self encoder contains 16 conv-relu blocks and Skip Connection.To prevent the target image blurred. The specific structure is shown in Figure 4.
Figure 4 the structure of the context self encoder. Multi scale loss and perceived loss are used to train the sub network.
From Figure 4, we can see that the subnet uses multi-scale loss (Multi-scale loss) and Perceptual (loss). Multiscale loss based on pixel operation is extracted from different decoding layers (Decoder layers).To generate different sizes of output, this can get more contextual information. The loss of perception is used to measure the overall difference between the features of the output image of a self coded network and the features of the original image input image, and the feature extraction mentioned here is based on a trained CNN (based on ImageNet pre training.VGG16).
Discriminant network：In order to distinguish the authenticity of generated images, some GAN based methods usually use the global and local consistency of image content as criteria. The global discriminator is used to detect the inconsistency of the whole picture, and the local discriminator detects a small specific area.
The characteristic of the discriminant network is to use an attention discriminator (Attentive discriminator), that is, the attention map generated by the attention cycle network is applied to the discriminant network. Use the attention map to guide the discriminator to focus on the corresponding area, so as to better judge the image.The truth.
2.3 experimental result
Qualitative assessment：Figure 5 shows the comparison between this method and the results presented in other papers (mainly Eigen and Pix2Pix); Figure 6 gives a comparison of the results of the entire network (AA+AD) with other possible configurations of the network system (A, A+D, A+AD). A (no attention)Self encoders of maps), A + D (self encoders without attention mapping and non attention map discriminator), A + AD (attention mapping graph self encoder plus attention map discriminator), AA + AD (attention map self encoder and attention projection diagram)It represents the overall network architecture of this method.
Regardless of the diversity of raindrop color, shape and transparency, this method can almost be completely removed.
Figure 5 Comparison of the results of different methods. From left to right: the original input image, Eigen, Pix2Pix, this method.
Fig. 6 Comparison of results between network architecture and its possible configuration structure
Quantitative assessment：Table 2 shows the results of this method and existing methods on the two measurements of the peak signal to noise ratio (PSNR) and structural similarity (SSIM). The greater the value of the corresponding measure, the better the results are.
Table 2 quantitative evaluation results
3 Summary and Prospect
A team proposed a region related rainwater image model for further detection of rainwater and further better simulation of rainwater accumulation and rainstorm. Based on this model, a network structure for joint rainwater detection and removal is proposed, which is effective for removing the accumulation of rain marks; method two team proposedA method of raindrop removal based on a single image is used. This method uses the generation of antagonism network, in which the generation network generates attention map through a special cycle mechanism, and generates a non raindrop image through a context automatic encoder with the input image, and the effect is obvious in removing obviously visible and dense raindrops..
In order to better generalization and obtain a more universal rain removal mechanism, we can try to explore a method of combining the two methods. The prospect of this method needs further experimental exploration and verification. After continuous research, image deconvolution has made great achievements, but convolution neural network technology is applied to image deconvolution.We still need to continue to explore.
 Qian R, Tan R T, Yang W, et al. “Attentive Generative Adversarial Network for Raindrop Removal from a Single Image. ” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018
 Yang, Wenhan, et al. “Deep joint rain detection and removal from a single image.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
 Fu, Xueyang, et al. “Removing rain from single images via a deep detail network.” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
 Cai, Bolun, et al. “Dehazenet: An end-to-end system for single image haze removal.”IEEE Transactions on Image Processing25.11 (2016): 5187-5198.
 Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions[J]. arXiv preprint arXiv:1511.07122, 2015.
 Xu Bo, Zhu Qingsong, Xiong Yan Hai. “Research fronts of video and image to rain technology.” Chinese science and technology thesis 10.8 (2015): 916-927.
 Guo Pan, et al. “research and review of image de fog technology.” computer application 30.9 (2010).