Getting Started with Number Regression in Python Coding

Description
The tutorial demonstrates how to make ML models with Number Regression in Python Coding.

Numbers(C/R) is the extension of the ML Environment that deals with the regression of numeric data.

What is Regression?

Regression is defined as a statistical method that helps us to analyze and understand the relationship between two or more variables of interest. The process that is adapted to perform regression analysis helps to understand which factors are important, which factors can be ignored, and how they are influencing each other.

In regression, we normally have one dependent variable and one or more independent variables. Here we try to “regress” the value of the dependent variable “Y” with the help of the independent variables. In other words, we are trying to understand, how the value of ‘Y’ changes w.r.t change in ‘X’.

For the regression analysis is be a successful method, we understand the following terms:

  • Dependent Variable: This is the variable that we are trying to understand or forecast.
  • Independent Variable: These are factors that influence the analysis or target variable and provide us with information regarding the relationship of the variables with the target variable.

Real Estate Price Prediction

In this tutorial, we’ll be looking at Numeric Regression with the help of the “Real Estate Price Prediction” dataset.

This dataset contains the price per unit area of multiple real estate sites and features that dictate the price per unit area. We’ll be constructing a model that can predict the price of real estate using previously unseen features.

Download the dataset here:

  1. Training Data: Real Estate Training Data
  2. Testing Data: Real Estate Testing Data

There are six input variables:

  1. Transaction Date
  2. House Age
  3. Distance to the Nearest MRT Station
  4. Number of Convenience Stores
  5. Latitude
  6. Longitude

The output is the house price of the unit area.

 

Setting up the Environment

First, we need to set the ML environment for Number Regression.

Alert: The Machine Learning Environment for model creation is available in the only desktop version of PictoBlox for Windows, macOS, or Linux. It is not available in Web, Android and iOS versions.

Follow the steps below:

  1. Open PictoBlox and create a new file.
  2. Select the coding environment as Python Coding.
  3. To access the ML Environment, select the “Open ML Environment” option under the “Files” tab.
  4. You’ll be greeted with the following screen.
    Click on “Create New Project“.
  5. A window will open. Type in a project name of your choice and select the “Numbers(C/R)” extension. Click the “Create Project” button to open the Numbers(C/R) window.
  6. You shall see the Numbers C/R workflow with an option to either “Upload Dataset” or “Create Dataset”.

Uploading/Creating the Dataset

  1. Click on “Upload Dataset”, and then on “Choose CSV from your files”.
  2. Select the “titanic_train.csv” file of the Titanic dataset. This is how your data will look.
    Note: For our model to train, it is important that we only feed it numerical values. Hence, we must pre-process our data accordingly. Thankfully, the PictoBlox ML Environment comes with all the necessary tools to modify our data.
  3. Let’s analyze the data pre-processing features for a minute. Observe the “Data Settings”.
  4. We will be dropping the “No” columns.
  5. Our target column in this project is the “House Price of Unit Area” column. To set it as the output column, select it and click on the Set as Output button.

Our dataset is ready! Time to train the model.

Training the Model

Now that we have gathered the data, it’s time to teach our model how to classify new, unseen data into these classes. To do this, we have to train the model.

By training the model, we extract meaningful information from the data, and that in turn updates the weights. Once these weights are saved, we can use our model to make predictions on data previously unseen.

However, before training the model, there are a few hyperparameters that you should be aware of. Click on the “Advanced” tab to view them.

Note: These hyperparameters can affect the accuracy of your model to a great extent. Experiment with them to find what works best for your data.

There are three hyperparameters you can play along with here:

  1. Epochs– The total number of times your data will be fed through the training model. Therefore, in 10 epochs, the dataset will be fed through the training model 10 times. Increasing the number of epochs can often lead to better performance.
  2. Batch Size– The size of the set of samples that will be used in one step. For example, if you have 160 data samples in your dataset, and you have a batch size of 16, each epoch will be completed in 160/16=10 steps. You’ll rarely need to alter this hyperparameter.
  3. Learning Rate– It dictates the speed at which your model updates the weights after iterating through a step. Even small changes in this parameter can have a huge impact on the model performance. The usual range lies between 0.001 and 0.0001.
Note: Hover your mouse over the question mark next to the hyperparameters to see their description.

You can train the model in Python only.

Note: You must download dependencies to train the model in Python, JavaScript will be chosen by default.

We’ll be training this model in Python. Click on the “Train Model” button to commence training. It’s a good idea to use a high number of epochs for this model. We’ll be training this model for 100 epochs.

The model shows great results! Remember, in regression problems, we use a metric called Mean Absolute Error(MAE). The lower the MAE, the better the model. The x-axis of the graph shows the epochs, and the y-axis represents the corresponding MAE.

Testing the Model

Now that the model is trained, let us see if it delivers the expected results. For that, we simply need to input values and click on the “Predict” button.

Note: You can also use data from the “Real Estate Testing Data.csv” file to see results on unseen data.

Great! Time to export this model to PictoBlox and create a script.

Exporting the Model to the Python Coding

Click on the “Export Model” button on the top right of the Testing box, and PictoBlox will load your model into the Python Coding Environment.

Observe that have created a python testing code already for you.

Click on Beautify Button to make the code error-free of any indentation errors. It’s the left icon from the A+ (Magic Wand).

Following is the code which is created by PictoBlox.

####################imports####################
# Do not change

import numpy as np
import tensorflow as tf

# Do not change
####################imports####################

#Load Number Model
# Do not change

model = tf.keras.models.load_model("num_model.h5",
                                   custom_objects=None,
                                   compile=True,
                                   options=None)

###############################################
#Inputs

Transaction_Date = 0
House_Age = 0
Distance_to_the_Nearest_MRT_Station = 0
Number_of_Convenience_Stores = 0
Latitude = 0
Longitude = 0

###############################################

inputValue = [
    Transaction_Date,
    House_Age,
    Distance_to_the_Nearest_MRT_Station,
    Number_of_Convenience_Stores,
    Latitude,
    Longitude,
]  # List of input classes
inputTensor = tf.expand_dims(inputValue, 0)  # Expanding input dimension
predictValue = model.predict(inputTensor)  # Predicting the output
print(predictValue[0])

The code uses two libraries:

  1. Numpy – For array manipulation
  2. Tensorflow – For machine learning

Add the data in the input for testing. You can use the “Real Estate Testing Data.csv“.

###############################################
#Inputs
Transaction_Date = 2013.25
House_Age = 7.6
Distance_to_the_Nearest_MRT_Station = 2175.03
Number_of_Convenience_Stores = 3
Latitude = 24.96305
Longitude = 121.5125

Click on the Run button to run and test the code.

You will see the output in the Terminal.

Prediction of Multiple Cases with CSV

Let’s see how you can run the bulk prediction on the “Real Estate Testing Data.csv” file. For this, we have to use loops and Pandas library. Let’s start:

Import the “Real Estate Testing Data.csv CSV file to the PictoBlox project.

You will get this:

Next, we will modify the code. The full code is provided below:

####################imports####################
# Do not change

import numpy as np
import tensorflow as tf
import pandas as pd

# Do not change
####################imports####################

#Load Number Model
# Do not change

model = tf.keras.models.load_model("num_model.h5",
                                   custom_objects=None,
                                   compile=True,
                                   options=None)

test_data = pd.read_csv('Real Estate Testing Data.csv')
test_data_output = pd.DataFrame()

for i in range(len(test_data.index)):
  ###############################################
  #Inputs
  Transaction_Date = test_data.loc[i].at["Transaction Date"]
  House_Age = test_data.loc[i].at["House Age"]
  Distance_to_the_Nearest_MRT_Station = test_data.loc[i].at[
      "Distance to the Nearest MRT Station"]
  Number_of_Convenience_Stores = test_data.loc[i].at[
      "Number of Convenience Stores"]
  Latitude = test_data.loc[i].at["Latitude"]
  Longitude = test_data.loc[i].at["Longitude"]

  inputValue = [
      Transaction_Date,
      House_Age,
      Distance_to_the_Nearest_MRT_Station,
      Number_of_Convenience_Stores,
      Latitude,
      Longitude,
  ]  # List of input classes
  inputTensor = tf.expand_dims(inputValue, 0)  # Expanding input dimension
  predictValue = model.predict(inputTensor)  # Predicting the output

  test_data_output = test_data_output.append(
      {
          "Transaction Date": test_data.loc[i].at["Transaction Date"],
          "House Age": test_data.loc[i].at["House Age"],
          "Distance to the Nearest MRT Station": test_data.loc[i].at["Distance to the Nearest MRT Station"],
          "Number of Convenience Stores": test_data.loc[i].at["Number of Convenience Stores"],
          "Latitude": test_data.loc[i].at["Latitude"],
          "Longitude": test_data.loc[i].at["Longitude"],
          "House Price of Unit Area": test_data.loc[i].at["House Price of Unit Area"],
          "Pridicted Value": predictValue[0][0],
      },
      ignore_index=True)

test_data_output.to_csv("result.csv")
Alert: You have to convert the Text into the number format with the same logic used during the ML environment.

You will find the result in an additional file created with the name “results.csv“.

Table of Contents