Making an AI for a coding interview question
I feel uncomfortable every time I was asked to do the coding interview. Not because I am a bad coder but I think it does not reflect my best skills which are not coding but it’s learning new things and problem-solving in an innovative way. If you are looking for someone who codes as you directed, I’m not that guy. That’s why I’m unemployed…LOL. One day, I saw a tweet from a co-founder of a big tech news company in my country. He said he use a coding interview question that ask to solve the following problem:
Write a program to accept the input as a list of numbers and find the second occurred number in the list. Example:
[1, 2, 2] => 1
[1, 2, 2, 3, 3, 3] => 2
[4, 4, 4, 4, 1, 2, 2, 3, 3, 3] => 3
[4, 2, 4, 2, 7, 4, 1, 7, 7, 4] => 7
Many of his followers posted their solutions, some use one-liners and the standard library which is fine in my opinion but I was not impressed by it nether. I thought, what would I solve this problem in an innovative way, my way. I am gonna train an AI for this shit and I was successful train a model to solve the problem at the high accuracy of 90%+ just in a couple of days of tinkering. I want to share my thinking process, and what I learned from solving a small problem with a bigger hammer.
AI or Not AI?
First I think about the problem and ask myself “Can it solve with AI?” Of course, it can, silly me. It’s not like solving world hunger or playing the Go game, wait someone already did that, never mind then. It is doable.
What’s the model?
What is the type of AI or machine learning model I should use to solve the problem? LSTM(Long Short-Term Memory) is for predicting the next thing from sequential data, which I used to predict stock price(Not a great plan — Tony Stark). This does not look like that. I think about CNN(Convolutional Neural Network). I can think of the number in the list like a pixel in a photo. CNN can easily classify anything in photos. In this case, I want it to find the second most dominant color of a photo but the photo is just 1 dimension. CNN will do just fine, I thought.
Using GPU for training
As I start working on this problem I think I might want to use a GPU to faster train the model. I start to look for how to install Tensorflow GPU for my Ubuntu 20.04 which has no instruction on the official Tensorflow page the last one on the page is for Ubuntu 18.04. I came across a search result that with a cute cat photo and detailed exactly what I am looking for to install Tensorflow GPU on Ubuntu 20.04 and most importantly it can install only using command-line not involving signing up to the Nvidia website to download the packages (Nvidia, Fuck you — Linus Torvalds). It’s written by me, yes me, myself. OMG. I start to recall I use a half week trying to find the best way to install Tensorflow GPU. I never thought my writing would be helpful for anyone else but for myself, this is an absolute win.
Data and a lot of data.
No way I can train a CNN model without using a lot of labeled data. In this case, I would just randomly create a dataset with labeled data by a code. I ended up coding a function the solve the problem in the process…hmm.
The first model cannot predict anything because the training data set was not exactly the right data for the model and my labeled data is not categorical binary class matrix data. As a noob, I have never created my own dataset before. I have to run other examples which use public data sets and debug to find out what the training data actually look like. Using tf.keras.utils.to_categorical() will do, but I took a lot of time to understand that it is related to the loss function categorical_crossentropy. I have tried SparseCategoricalCrossentropy too to get the classes as an integer but it was not working and I was too lazy to find out while categorical_crossentropy already works.
I use a small size of data around 10k at first. The result of the really first model has created an overfitting scenario, val_accuracy is around 13% while training accuracy is over 95%. It’s remembering the answer in the network. Just guessing a number from 9 numbers is result 11.11% correct, this is not working.
I changed it to 100k data set and the model showed some promising val_accuracy over 40%+ and I called it a day.
Optimizing the model
At first, I use the same model as on the official Tensorflow CNN tutorial. I just change it from Conv2D to Conv1D change the input shape to 1 dimension of array size 10 the empty is 0. and add more Dense layers and add a softmax layer to the last for interpreting the probability of the class. I have no idea what I am doing.
The graph from matplotlib is really useful when comparing the model. It’s too slow to train and I need more accuracy. I changed to CPU to train the model and It’s faster than using GPU. I need an explanation. Someone on StackOverflow explain that If the model is too small using a CPU would be faster than GPU because it needs to load data to GPU memory and it took some time. What I do was change the model size to a bigger size with more hyperparameters and it trains faster on GPU but the accuracy was not improved and took more time to train overall. Changing back to a small model on the CPU is the best way to improve the training speed. I read more examples and found the BatchNormalization layer. I try it and It is a huge improve the training speed. I still have no idea of what it does under the hood. In the end, 90%+ accuracy + faster training speed is good enough.