Noob’s guide on Information Theory of Convolution Neural Nets

Machine Learning has been around for a while now and we are all aware of its impact in solving everyday problems. Initially, it was about solving simple problems of statistics, but it has grown much beyond that and can now solve even bigger problems like image recognition.But have you ever wondered about the pace at which this transition has occurred. It has now reached a point where it can actually distinguish a cat from a dog. In this series, we will explore the nature of such networks, and how to manipulate information represented through the network to solve some of the toughest problems around image recognition.

Epilogue: a troublesome story of Real Estate Agents

Let’s start from the scratch. Let’s say we have input vectors — specifications of a house, and outputs like the price of the house. For let’s not worry much about the details and just visualize this as though we have information described in a set of concepts(kitchen size, number of floors, location) and we need to represent information pertinent to another set of concepts (price of house, architecture quality etc). This is basically conversion from one conceptual representation to another conceptual representation. Now think how a human would have solved it ?

He (say Alex) would probably have a mathematical way to convert this from one conceptual representation to another through some ‘if-else’ condition to start off. If he (say Bob) was slightly smarter, he would have converted input concepts into some intermediary scores like simplicity, floor quality, noise in the neighbourhood, etc. And he would also cleverly map these scores to the corresponding final output, say price of the house. If you see what has changed from noob real estate agent(Alex) to a slightly smarter real estate agent (Bob) is that he mapped input-output information flow in detail. In other words, he changed the framework in which he thought he could best represent the underlying architecture.

Lesson 1: Framework of thinking is everything

So the difference between Alex and Bob’s thought process was that Bob could figure out that secondary concepts are easy to calculate and he can combine them to represent the final desired output whereas Alex tried to write an entire ‘if-else’ logic for each one of the input variables and mapped it with each one of the output variables. Bob in a way represented the same mapping in a more systematic way by breaking them into smaller concepts and just had to remember fewer concepts. Meanwhile, Alex had to remember how every input is connected to every output without breaking it into smaller concepts. So the big lesson here is that the framework of thinking is everything.

This is what most researchers have realized. Every researcher has the same problem, let’s say cat vs dog image. They too have to convert information from one conceptual representation (pixels) to another conceptual representation (is-cat is True/False).They also have almost the same computational power(memory, complexity etc), hence the only way to solve this problem is to introduce the framework of thinking that decodes inputs with minimum resources and converts it from one form to another. You would’ve already heard about a lot of ‘frameworks of thinking’. When people sayConvolutional Networks, it simply means — it is a framework of representing a particular mapping function. Most statistical models that predict house prices are also just mapping functions. They all try to best predict a universal mapping function from input to output

Lesson 2: Universal Mapping function like CNN

CNN(Convolution Neural Networks) are a form of functions that uses some concepts around images — like positional invariance. That means the network can re-use the same sub mapping function from the bottom part of the image to the top part of the image. This essentially reduces the number of parameters in which the Universal Mapping function can be represented. This is why CNNs are cone shaped. Here we move from concepts that are space oriented (pixels) to concepts that are space independent (cat-or-not, has-face). That’s it. It’s that simple. Information is smartly converted from one form to another.

Lesson 3: CNN and the Brain

Recent advancements in Neuroscience has essentially said the same thing regarding how we decode the information in the visual cortex. We first decode lines, then decode objects like boxes, circles, curves etc, then decode them into faces, headphones etc.

Conclusion

A lot of Machine Learning/Deep Learning/AI technologies have very simple conceptual frameworks but the reason behind it solving mammoth problems lies in the complexity that arises from a whole lot of simple-conceptual-frameworks that are attached end-to-end. It is so complex that we can’t really predict whether these networks can solve any kind of problem. Yet, we have been implementing them on a day to day basis based on some sort of assumption. It’s very similar to the human brain. We know its underlying structure and framework. We discovered it half a century ago. Yet, we’ve not been able to decipher this complex world and we are still unsure as to when we’ll reach such an understanding.

Guest article written by: Prashant Maurice, Machine Intelligence Expert, Icecream Labs, https://www.linkedin.com/in/prashantmaurice/

Leave a Comment