**If you want to fit a curved line to your data with scikit-learn using polynomial regression, you are in the right place. But first, make sure you’re already familiar with linear regression. I’ll also assume in this article that you have matplotlib, pandas and numpy installed. Now let’s get down to coding your first polynomial regression model.**

If you don’t have your own Python environment for data science, go with one of these options to get one:

- Get your own data server and install the most popular data science libraries
- Install Anaconda on your local computer

**What’s polynomial regression good for? A quick example.**

Bad news: you can’t just linear regression your way through every dataset. 😔

Oftentimes you’ll encounter data where the relationship between the feature(s) and the response variable can’t be best described with a straight line.

Just like here:

See the problem? Of course we could fit a straight line to the data, but just by looking at the scatterplot we get the feeling that this time a linear line won’t cut it.

If you think that the line should somehow be curved to better fit the data, then you intuitively understand why we use polynomial regression: it gives us the curvature we need so we can have more precise predictions based on our data.

But let’s hold our horses just yet. 🐴

Why is it called a “*polynomial*”?

That’s what we’ll discover in the next section.

**What is a polynomial? The most important definitions you need to know.**

Let’s break it down:

- “
*poly*” means “*many*”, - “
*nomial*” means “*terms*” (or “*parts*” or “*names*”).

Here’s an example of a polynomial:

4x + 7

`4x + 7`

is a simple mathematical expression consisting of two terms: `4x`

(first term) and `7`

(second term). In algebra, terms are separated by the logical operators `+`

or * -*, so you can easily count how many terms an expression has.

`9x`

is a polynomial (consisting of 3 terms), too. And to confuse you a bit, ^{2}y - 3x + 1`3x`

is also a polynomial, although it doesn’t have “many terms” (`3x`

is called a monomial, because it consists of one term – but don’t worry too much about it, I just wanted to let you in on this secret 🤫).

For now, let’s stick with `4x + 7`

. I plotted it for you:

Now I want you to take a close look at the plot – what do you see? A straight line! 😮

Let’s recap what just happened:

- we took a polynomial (
`4x + 7`

), - plotted it,
- and got a straight line (remember: a linear regression model gives you a straight line).

What’s happening…? 🤔

You see, the formula that defines a straight line in a linear regression model is actually a polynomial, and it goes by its own special name:* linear polynomial* (anything that takes the form of `ax + b`

is a linear polynomial).

Nice. Can we start coding already?

Actually… no.

### Polynomial regression related vocabulary

Before we get to the practical part, there’s some more things you need to know.

We’ll use 3x^{4} – 7x^{3 }+ 2x^{2} + 11 to improve your polynomial related vocabulary with some essential definitions:

**degree of a polynomial:**the highest power (largest exponent) in your polynomial; in our example it’s 4 because of`x`

, meaning that we’re dealing with a 4^{4}^{th}degree polynomial,**coefficient:**each number (3, 7, 2, 11) in our polynomial is a coefficient; these are the parameters that are unknown and our polynomial regression model will try to estimate when trained on our dataset,**leading term:**the term with the highest power (in our case it’s`3x`

); this is the most important part of a polynomial, because it determines the polynomial’s graph behavior,^{4}**leading coefficient:**the coefficient of the leading term (it’s 3 in our polynomial),**constant term:**the*y*intercept, it never changes: no matter what the value of*x*is, the constant term remains the same.

### Polynomial Regression: the official definition

Now that you speak proper polynomial lingo, I’d like to give you an official definition. An expression is a polynomial if:

**the expression has a finite number of terms,****and each term has a coefficient,****and these coefficients are multiplied by a variable (in our case***x*),**and the variables are raised to a non-negative integer power.**

If you’ve been paying attention, you may wonder: how is `4x + 7`

a polynomial when the second term (`7`

) clearly lacks the variable * x*?

Actually, * x* is there in the form of

`7x`^{o}

. Since `x`^{o}

^{ }is equal to

`1`

, and `7*1`

is equal to `7`

, there’s really no need to write `x`^{o}

^{ }down.

Do you like the article so far? If so, you’ll * love* this 6-week data science course on Data36:

__The Junior Data Scientist’s First Month__. It’s a 6-week simulation of being a junior data scientist at a true-to-life startup. Go check it out here:

__https://data36.com/jds__!

**The difference between linear and polynomial regression.**

Let’s return to `3x`

: if we write a polynomial’s terms from the highest degree term to the lowest degree term, it’s called a polynomial’s standard form.^{4} - 7x^{3 }+ 2x^{2} + 11

In the context of machine learning, you’ll often see it reversed:

`y = ß`

_{0} + ß_{1}x + ß_{2}x^{2} + … + ß_{n}x^{n}

`y`

is the response variable we want to predict,`x`

is the feature,`ß`

_{0}_{ }is the`y`

intercept,- the other
`ß`

s are the coefficients/parameters we’d like to find when we train our model on the available`x`

and`y`

values, `n`

is the degree of the polynomial (the higher`n`

is, the more complex curved lines you can create).

The above polynomial regression formula is very similar to the linear regression formula:

`y = ß`

_{0} + ß_{1}x + ß_{2}x + … + ß_{n}x^{n}

It’s not a coincidence: polynomial regression is a linear model used for describing non-linear relationships.

How is this possible? The magic lies in creating new features by raising the original features to a power.

For instance if we have feature `x`

, and we’ll use a 3^{rd} degree polynomial, then our formula will also include `x`

and ^{2}`x`

. And this is what gives curvature to a line:^{3}

What I’m trying to hammer home is this: **linear regression is just a first-degree polynomial. Polynomial regression uses higher-degree polynomials. Both of them are linear models, but the first results in a straight line, the latter gives you a curved line. That’s it.**

Now you’re ready to code your first polynomial regression model.

**Coding a polynomial regression model with scikit-learn**

For starters, let’s imagine that you’re presented with the below scatterplot:

Here’s how you can recreate the same chart:

import numpy as np import pandas as pd import matplotlib.pyplot as plt

x = np.arange(0, 30)

y = [3, 4, 5, 7, 10, 8, 9, 10, 10, 23, 27, 44, 50, 63, 67, 60, 62, 70, 75, 88, 81, 87, 95, 100, 108, 135, 151, 160, 169, 179]

plt.figure(figsize=(10,6))

plt.scatter(x, y)

plt.show()

It’s nothing special, really: just one feature (`x`

), and the responses (`y`

).

Now, let’s say that you’ve got a hunch that the relationship between the features and the responses is non-linear, and you’d like to fit a curved line to the data.

Since we have only one feature, the following polynomial regression formula applies:

`y = ß`

_{0} + ß_{1}x + ß_{2}x^{2} + … + ß_{n}x^{n}

In this equation the number of coefficients (`ß`

s) is determined by the feature’s highest power (aka the degree of our polynomial; not considering `ß`

, because it’s the intercept)._{0}

Two questions immediately arise:

**How do we establish the degree of our polynomial (and thus the number of ßs)?****How do we create x**^{2}**, x**^{3}**or x**^{n}**when originally we have x as our one and only feature?**

Fortunately, there are answers to both questions.

Answer 1.: there are methods to determine the degree of the polynomial that gives the best results, but more on this later. For now, let’s just go with the assumption that our dataset can be described with a 2^{nd} degree polynomial.

Answer 2.: we can create the new features (x raised to increasing powers) once you’ve installed sci-kit learn.

**STEP #1: Determining the degree of the polynomial**

First, import `PolynomialFeatures`

:

from sklearn.preprocessing import PolynomialFeatures

Then save an instance of `PolynomialFeatures`

with the following settings:

poly = PolynomialFeatures(degree=2, include_bias=False)

`degree`

sets the degree of our polynomial function. `degree=2`

means that we want to work with a 2^{nd} degree polynomial:

`y = ß`

_{0} + ß_{1}x + ß_{2}x^{2}

`include_bias=False`

should be set to `False`

, because we’ll use `PolynomialFeatures`

together with `LinearRegression()`

later on.

Long story short: `LinearRegression()`

will take care of this setting by default, so there’s no need to set `include_bias`

to `True`

. If it wasn’t taken care of, then `include_bias=False`

would mean that we deliberately want the y intercept (`ß`

) to be equal to _{0}`0`

– but we don’t want that. Here’s a great explanation on all of this.

If you print `poly`

, you see that so far we’ve just created an instance of `PolynomialFeatures`

, and that’s all there’s to it:

Onto the next step:

**STEP #2: Creating the new features**

poly_features = poly.fit_transform(x.reshape(-1, 1))

`reshape(-1,1)`

transforms our numpy array `x`

from a 1D array to a 2D array – this is required, otherwise we’d get the following error:

There’s only one method – `fit_transform()`

– but in fact it’s an amalgam of two separate methods: `fit()`

and `transform()`

. `fit_transform()`

is a shortcut for using both at the same time, because they’re often used together.

Since I want you to understand what’s happening under the hood, I’ll show them to you separately.

With `fit()`

we basically just declare what feature we want to transform:

`transform()`

performs the actual transformation:

What are these numbers? Well, let me kindly remind you that our feature values in the scatterplot range from `0`

to `29`

. In the first column we have our values for `x`

(e.g. `3`

). In the second column we have our values for `x`

squared (e.g. `9`

).

Looks familiar? Our 2^{nd} degree polynomial formula, again:

`y = ß`

_{0} + ß_{1}x + ß_{2}x^{2}

We wanted to create x^{2} values from our x values, and `fit_transform()`

did just that. We save the result to `poly_features`

:

poly_features = poly.fit_transform(x.reshape(-1, 1))

**STEP #3: Creating the polynomial regression model**

Now it’s time to create our machine learning model. Of course, we need to import it first:

from sklearn.linear_model import LinearRegression

Hold up a minute! 😮 Isn’t this tutorial supposed to be about polynomial regression? Why are we importing `LinearRegression`

then?

Just think back to what you’ve read not so long ago: polynomial regression is a linear model, that’s why we import `LinearRegression`

. 🙂

Let’s save an instance of `LinearRegression`

to a variable:

poly_reg_model = LinearRegression()

Here’s the code in real life:

Then we fit our model to our data:

poly_reg_model.fit(poly_features, y)

Fitting means that we train our model by letting it know what the feature (`poly_features`

) and the response (`y`

) values are. When fitting/training our model, we basically instruct it to solve for the coefficients (marked with **bold**) in our polynomial function:

`y = `

**ß _{0}** +

**ß**x +

_{1}**ß**x

_{2}^{2}

After running the code you may think that nothing happens, but believe me, the model estimated the coefficients (important: you don’t have to save it to a variable in order for it to work!):

Now that our model is properly trained, we can put it to work by instructing it to predict the responses (`y_predicted`

) based on `poly_features`

, and the coefficients it had estimated:

y_predicted = poly_reg_model.predict(poly_features)

Here you can see the predicted responses:

Let’s do some dataviz to see what our model looks like:

I think the plot’s title says it all, but I’d like to repeat it: **congratulations, you’ve created your very first polynomial regression model! 👏**

While you’re celebrating, I’m just gonna paste the code here in case you need it:

plt.figure(figsize=(10, 6))

plt.title("Your first polynomial regression – congrats! :)", size=16)

plt.scatter(x, y)

plt.plot(x, y_predicted, c="red")

plt.show()

Do you like the article so far? If so, you’ll * love* this 6-week data science course on Data36:

__The Junior Data Scientist’s First Month__. It’s a 6-week simulation of being a junior data scientist at a true-to-life startup. Go check it out here:

__https://data36.com/jds__!

**Coding a polynomial regression model with multiple features**

Oftentimes you’ll have to work with data that includes more than one feature (life is complicated, I know). Let’s simulate such a situation:

import numpy as np import pandas as pd import matplotlib.pyplot as plt

np.random.seed(1)

x_1 = np.absolute(np.random.randn(100, 1) * 10)

x_2 = np.absolute(np.random.randn(100, 1) * 30)

y = 2*x_1**2 + 3*x_1 + 2 + np.random.randn(100, 1)*20

`np.random.seed(1)`

is needed so you and I can work with the same “random” data. (Read this article if you want to understand how it is possible.) We create some random data with some added noise: `x_1`

contains 100 values for our first feature, `x_2`

holds 100 values for our second feature. Our responses (100) are saved into `y`

, as always.

Let’s plot both features, just to get a visualized understanding of the relationships between the features and the responses:

For your convenience, here’s the code:

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))

axes[0].scatter(x_1, y)

axes[1].scatter(x_2, y)

axes[0].set_title("x_1 plotted")

axes[1].set_title("x_2 plotted")

plt.show()

**STEP #1: Storing the variables in a dataframe**

Then we create a pandas dataframe where we’ll store our features and responses:

df = pd.DataFrame({"x_1":x_1.reshape(100,), "x_2":x_2.reshape(100,), "y":y.reshape(100,)}, index=range(0,100))

df

This is the resulting dataframe:

Again, `reshape(100,)`

is needed (`100,`

because we have 100 rows), otherwise we’d receive the following error:

**STEP #2: Defining the training and the test data**

Now we turn to scikit-learn again:

from sklearn.model_selection import train_test_split

`train_test_split`

helps us split our data into a training and a test dataset:

X, y = df[["x_1", "x_2"]], df["y"]

poly = PolynomialFeatures(degree=2, include_bias=False)

poly_features = poly.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=42)

Here’s a step by step explanation:

`X, y = df[["x_1", "x_2"]], df["y"]`

: From`df`

, we save the`x_1`

and`x_2`

columns to`X`

, the`y`

column to`y`

. At this point we have the features and the responses stored in separate variables.`poly = PolynomialFeatures(degree=2, include_bias=False)`

and`poly_features = poly.fit_transform(X)`

: just as before, we create the new polynomial features.`train_test_split(poly_features, y, test_size=0.3, random_state=42)`

: Within the train_test_split method we define all of our features (`poly_features`

) and all of our responses (`y`

). Then, with`test_size`

we set what percentage of our all features (`poly_features`

) and responses (`y`

) we’d like to allocate for testing our model’s prediction capability. In our case, it’ll be 30% (`0.3`

), because it’s a good practice. Don’t worry if it’s not clear at first: you’ll understand it in a minute.`random_state`

needs to receive an integer value: that way anytime you rerun`train_test_split`

, you’ll receive the same results.`X_train, X_test, y_train, y_test`

:`train_test_split`

splits our features (`poly_features`

) and responses (`y`

) into train and test groups – here we just save these groups into variables. The order of the variables is very important, so don’t shuffle them.

**STEP #3: Creating a polynomial regression model**

Now let’s create and fit our model, but this time, we’ll train our model only on the training data:

poly_reg_model = LinearRegression()

poly_reg_model.fit(X_train, y_train)

The reason for us training our model only on the training data is that later we want to see how well our model will predict responses based on feature values it hasn’t seen before.

Here’s how we can test how our model performs on previously unseen data:

poly_reg_y_predicted = poly_reg_model.predict(X_test)

from sklearn.metrics import mean_squared_error

poly_reg_rmse = np.sqrt(mean_squared_error(y_test, poly_reg_y_predicted))

poly_reg_rmse

It may be a lot to take in, so let me elaborate on it:

`poly_reg_y_predicted = poly_reg_model.predict(X_test)`

: We save the predicted values our model predicts based on the previously unseen feature values (`X_test`

).`from sklearn.metrics import mean_squared_error`

: We import`mean_squared_error`

– more on why in the next bullet point.`poly_reg_rmse = np.sqrt(mean_squared_error(y_test, poly_reg_y_predicted))`

: We take the square root of`mean_squared_error`

to get thewhich is a commonly used metric to evaluate a machine learning model’s performance. RMSE shows how far the values your model predicts (*RMSE (root mean square error)*`poly_reg_y_predicted`

) are from the true values (`y_test`

), on average. Roughly speaking: the smaller the RMSE, the better the model.

If you print `poly_reg_rmse`

, you’ll get this number:

**STEP #4: Creating a linear regression model**

Now let’s create a linear regression model as well, so we can compare the performance of the two models:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

lin_reg_model = LinearRegression()

lin_reg_model.fit(X_train, y_train)

lin_reg_y_predicted = lin_reg_model.predict(X_test)

lin_reg_rmse = np.sqrt(mean_squared_error(y_test, lin_reg_y_predicted))

lin_reg_rmse

As you can see, the steps are the same as in the case of our polynomial regression model. There’s one important difference, though. In the `train_test_split`

method we use `X`

instead of `poly_features`

, and it’s for a good reason. `X`

contains our two original features (`x_1`

and `x_2`

), so our linear regression model takes the form of:

`y = ß`

_{0} + ß_{1}x_{1} + ß_{2}x_{2}

If you print `lin_reg_model.coef_`

you can see the linear regression model’s values for `ß`

and _{1}`ß`

:_{2}

You can similarly print the intercept with `lin_reg_model.intercept_`

:

On the other hand, `poly_features`

contains new features as well, created out of `x_1`

and `x_2`

, so our polynomial regression model (based on a 2^{nd} degree polynomial with two features) looks like this:

`y = ß`

_{0} + ß_{1}x_{1} + ß_{2}x_{2 }+ ß_{3}x_{1}^{2} + ß_{4}x_{2}^{2} + ß_{5}x_{1}x_{2}

This is because `poly.fit_transform(X)`

added three new features to the original two (x_{1} (`x_1`

) and x_{2} (`x_2`

)): x_{1}^{2}, x_{2}^{2 }and x_{1}x_{2}.

x_{1}^{2} and x_{2}^{2} need no explanation, as we’ve already covered how they are created in the “Coding a polynomial regression model with scikit-learn” section.

What’s more interesting is x_{1}x_{2} – when two features are multiplied by each other, it’s called an interaction term. An interaction term accounts for the fact that one variable’s value may depend on another variable’s value (more on this here). `poly.fit_transform()`

automatically created this interaction term for us, isn’t that cool? 🙂

Accordingly, if we print `poly_reg_model.coef_`

, we’ll get the values for five coefficients (`ß`

, _{1}`ß`

, _{2}`ß`

, _{3}`ß`

, _{4}`ß`

):_{5}

But let’s get back to comparing our models’ performances by printing `lin_reg_rmse`

:

The RMSE for the polynomial regression model is `20.94`

(rounded), while the RMSE for the linear regression model is `62.3`

(rounded). **The polynomial regression model performs almost 3 times better than the linear regression model. That’s a spectacular difference.**

But you know what else is spectacular?

Your freshly gained knowledge on performing polynomial regression! 😉

**Conclusion**

Hopefully you’ve gained enough knowledge to have a basic understanding of polynomial regression. Not only that, you should also know one method (RMSE) for comparing the performance of machine learning models.

Throughout this article we used a 2^{nd} degree polynomial for our polynomial regression models. Naturally, you should always test before model deployment what degree of polynomial performs best on your dataset (after finishing this article, you should suspect how to do that! 😉).

Thank you for reading this article, and have fun with your new knowledge!

- If you want to learn more about how to become a data scientist, take Tomi Mester’s 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out the 6-week online course: The Junior Data Scientist’s First Month video course.

Cheers,

Tamas Ujhelyi

## Leave a Reply