Visual Recognition using Graph
It describes the paper Beyond Grids Learning Graph Representations for Visual Recognition published in Neurips 2018
Overview
Convolution neural networks have been a great success in visual recognition tasks. This article particularly focuses on semantic segmentation. The logic behind using CNN is that images have a sense of locality that is, the pixels which are near to each other are more related. CNNs are able to capture this by convolution operations and the local region that comes into consideration (formally known as receptive field) depends on the kernel size. There are also long-range dependencies in an image which can help in visual recognition tasks. For that the concept is to stack may convolution layers which will theoretically increase the receptive field. So now both long range and short range dependencies are taken into account, put the network on the training and you get the results easy right!
But I am sure you are well aware that not many times things match in theory and practical. Recently a paper ( by Luo, Wenjie, et al.) was published which showed that receptive fields do not grow linearly with the number of convolution layers moreover they are severely limited. Furthermore, the receptive field depends on various other factors such as initialization schemes. So what is the solution?
Find the solution Medium Article