Introduction to Dense Layers for Deep Learning with Keras
- By : Mydatahack
- Category : Data Science, Deep Learning
- Tags: Deep Learning, Dense Layers, Fully-Connected Layers, Iris, Keras
The most basic neural network architecture in deep learning is the dense neural networks consisting of dense layers (a.k.a. fully-connected layers). In this layer, all the inputs and outputs are connected to all the neurons in each layer. Keras is the high-level APIs that runs on TensorFlow (and CNTK or Theano) which makes coding easier. Writing code in the low-level TensorFlow APIs is difficult and time-consuming. When I build a deep learning model, I always start with Keras so that I can quickly experiment with different architectures and parameters. Then, move onto TensorFlow to further fine tune it. When it comes to the first deep learning code, I think Dense Net with Keras is a good place to start. So, let’ get started.
Dataset
Deep learning 101 dataset is the classic MNIST, which is used for hand-written digit recognition. With the code below, you can certainly use MNIST.
In this example, I am using the machine learning classic Iris dataset. The dataset will be imported from a csv file. This gives you an idea on how to import csv into the deep learning model, rather than porting example data from the build-in package.
Deep learning on Iris certainly feels like cracking a nut with a sledge hammer. However, you can apply the knowledge and the same code to more appropriate datasets once you understand how it works.
There are many ways to get a csv version of Iris. I got it from R.
Steps
(1) Import required modules
(2) Preprocessing
Both Keras and TensorFlow takes numpy arrays as features and classes. When the prediction is categorical, the outcome needs to be one-hot encoded (see one-hot encoding explanation from the Kaggle’s website). For one-hot encoding, the class needs to be indexes (starting from 0). Once they are transformed, you can use keras.utils.to_categorical() for conversion.
It uses sklearn.model_selection.train_test_split to create training and test dataset.
(3) Design Networks
I am using the sequential model with 2 fully-connected layers. ReLU is more popular in many deep neural networks, but I am using Tanh for activation because it actually performed better. You almost never use Sigmoid because it is slow to train. Softmax is used for the output layer.
Adding the 3rd layer degrades the performance. This makes sense as the data set is fairly simple. I am using Dropout to reduce over-fitting. L2 regularizer can be used. But, it did not perform well in this case and I commented out the line.
(4) Model Compilation
You need to define the loss function, optimizer and evaluation metrics. Cross-entropy is the gold standard for the cost function. You will almost never use quadratic. On the other hand, there are many options for optimisers. In this example, I have Adam as well as SGD with learning rate of 0.01. Both works fine.
(5) Execution
The testing accuracy goes up to 96.7% after 120 epochs. With this dataset, a regular machine learning algorithm like random forest or logistic regression can achieve the similar results. The first rule of deep learning is that if the simpler machine learning algorithm can achieve the same outcome, use machine learning and look for a more complicated problem. Here, the purpose is to learn the actual programming process so that you can apply it to more complex problems.
Next Step
(1) Try using MNIST dataset on this code.
MNIST is included in Keras and you can imported it as keras.datasets.mnist. It’s already split into training and test datasets. In preprocessing, you need to flatten the data (from 28 x 28 to 784) and convert y into one-hot encoded values. Here is the code to process the data.
(2) Replicate the same code with low-level TensorFlow code.
TenorFlow is much more complicated than Keras. The way to code is quite unique. It will be difficult at first, but it will be worthwhile.
For the actual code example, go to Introduction to Dense Net with TensorFlow.
Ah the tutorial covers the dense layers from the keras models. This example is fine. However I came here looking for densenet implementation (https://github.com/keras-team/keras/blob/master/keras/applications/densenet.py) there is a separate network structure available, may be you know already, which is called Densenet. So in other words the blog name is slightly confusing. Cheers!
Hi haramoz,
Yes, you are absolutely right! I was a little bit too loose with terminology. This post is about the dense layers, not DenseNet architecture which consists of more than dense layers. Now I changed the title from ‘Introduction to Dense Net with Keras’ to ‘Introduction to Dense Layers for Deep Learning with Keras’ and tighten my terminology in the post so as not to confuse everyone. I also updated the title for TensorFlow example. Thank you for your feedback!