
In this article would like to cover, how to solve a linear regression problem using Tensorflow framework. In this, I will not be covering too many details on the EDA process. This is a very simple dataset and this just to show how simple it is to solve using TensorFlow. For the first time, I have created a video tutorial as well on Youtube, will be mentioning the link for the same.
The dataset that I have used can be found on Kaggle or in my Github repo if you don't want to download and use it directly from the Github link. This dataset is about predicting the chance for a student to get admission to a particular university. The target column in this is “admit”. More details can be found in the Kaggle.
Will be solving it with the deep neural network. Steps involved are as below:
- Load Data
- Prepare training and validation dataset(tf.data.Dataset)
- Build Model
- Validate and Plot the Training and Validation Curve
Video tutorial for the same can be found below:
Load in the Data and Preparing it out to Train and Valid Set
There not many things that can be found on part of EDA here, as the dataset is very simple with no missing values. I could have added the graphs and fancy correlation plots just for the sake of EDA, but I felt since there not dealing with it to another data imputation or dealing with the outliers. I have kept it simple. I have done a little detail about the EDA in one of the articles.
Here we are dividing the dataset into 85% train data and 15% valid data using the pandas' sample method. And then I chose to keep research column as categorical just being more defensive to avoid bias. And then scaling the data using mean and std. One of the reasons I chose to use mean and standard deviation is that it can be stored, and when the model is deployed and for each request values can be scaled done with the values that were seen during training. An alternate way is to store the scaler methods available from sklearn, the same can be found here. I am going with mean and standard deviation in this to keep it simple.
And the data from the DataFrame is then converted to tf.data.Dataset object using prepare_dataset function. In this, we are converting first to tf.data using from_tensor_sclices method, where we passing both x and y as a tuple (x, y). Then we shuffle it again make it independent of the sequence the data is prepared. Further, it is batched to the batch size, this says in each epoch all the dataset will be trained one batch after each other. This adds a regularisation effect to the neural network which is good. And the prefetch is performed on it. This will make sure the processors will not wait for the next batch to appear because the next data is already prefetched and available for training. With this data, preparation is done.
Building Model and Plotting the curves
This three-layered network is built. Not that deep but since the dataset is small this would be enough. Compiling it with “mse” loss function and SGD optimizer function. It is then trained with the train_dataset and valid_dataset. Since we have created the dataset using from_tensor_slices by passing in both x and y values to it, we don't have to specify x and y in fit function. Tensorflow will understand the dataset has both the values in it. The same applies to the validation dataset. From the training metrics, we can see that values have dropped saying the model is improving with each epoch.
Further to validate our understanding it is recommended to plot the training and validation mae and loss. It can be found below:

From the plot, it is evident both training and validation curve has curved down showing a very good sign that the model is learning over each epoch.

Same goes with the loss curve, Loss has decreased over epochs.
Prediction:
In the beginning, we have stored the first row for testing the prediction and the model has performed very well predicting very close to the expected.
Please do with the video, also the code is available here. Feedback for the video and the article would be of great help. I will do more such work whenever I have time. Thanks for reading and supporting. Enjoy Coding!!!