

This means we can try another grid search in a narrower range we will try with values between 0 and 0.2. On this graph we can see that the darker the region is the better our model is (because the RMSE is closer to zero in darker regions). The last line plot the result of the grid search:
#Xlstat tutorial code#
There is two important points in the code above: TuneResult <- tune(svm, Y ~ X, data = data, It means we will train a lot of models for the different couples of and cost, and choose the best one. The standard way of doing it is by doing a grid search. The process of choosing these parameters is called hyperparameter optimization, or model selection. There is also a cost parameter which we can change to avoid overfitting. In our previous example, we performed an epsilon-regression, we did not set any value for epsilon ( ), but it took a default value of 0.1. In order to improve the performance of the support vector regression we will need to select the best parameters for the model. SvrPredictionRMSE <- rmse(error) # 3.157061Īs expected the RMSE is better, it is now 3.15 compared to 5.70 before.īut can we do better ? Step 4: Tuning your support vector regression model # /!\ this time svrModel$residuals is not the same as data$Y - predictedY This time the predictions is closer to the real values ! Let's compute the RMSE of our support vector regression model. The function will automatically choose SVM if it detects that the data is categorical (if the variable is a factor in R). Note that we called the svm function (not svr !) it's because this function can also be used to make classifications with Support Vector Machine. Points(data$X, predictedY, col = "red", pch=4)Īs you can see it looks a lot like the linear regression code.
#Xlstat tutorial install#
So be sure to install it and to add the library(e1071) line at the start of your file.īelow is the code to make predictions with Support Vector Regression: In order to create a SVR model with R you will need the package e1071. Let's try to improve it with SVR ! Step 3: Support Vector Regression We know now that the RMSE of our linear regression model is 5.70. Using R we can come with the following code to compute the RMSEĮrror <- model$residuals # same as data$Y - predictedY To compute the RMSE we take the square root and we get the RMSE If we do this for each data point and sum the error we will have the sum of the errors, and if we takes the mean we will get the Mean Squared Error (MSE)Ī common way to measure error in machine learning is to use the Root Mean Squared Error (RMSE) so we will use it instead. Note that the expression is the error, if we make a perfect prediction will be equal to and the error will be zero. We can compare each value with the associated predicted value and see how far away they are with a simple difference. In order to measure how good our model is we will compute how much errors it makes. The only difference with the previous graph is that the dots are not connected with each other. Points(data$X, predictedY, col = "blue", pch=4)įor each data point the model makes a prediction displayed as a blue cross on the graph. To do that we will change a little bit our code to visualize each prediction made by our model In order to be able to compare the linear regression with the support vector regression we first need a way to measure how good it is. The code above displays the following graph: We can now use R to display the data and fit a line:ĭataDirectory <- "D:/" # put your own folder hereĭata <- read.csv(paste(dataDirectory, 'regression.csv', sep=""), header = TRUE) Here is the same data in CSV format, I saved it in a file regression.csv : I prefer that over using an existing well-known data-set because the purpose of the article is not about the data, but more about the models we will use.Īs you can see there seems to be some kind of relation between our two variables X and Y, and it look like we could fit a line which would pass near each point. To begin with we will use this simple data set: We will first do a simple linear regression, then move to the Support Vector Regression so that you can see how the two behave with the same data.
#Xlstat tutorial how to#
In this article I will show how to use R to perform a Support Vector Regression.
