The ability to execute R scripts on SQL Server provides data scientists with a powerful way to operationalise data science. In a Microsoft blog written earlier this year it was explained “SQL Server R Services is an in-database analytics feature that tightly integrates R with SQL Server. With this feature, we want to provide a data intelligence platform that moves intelligence capabilities provided with R closer to the data. So why is that a good thing? The short answer is that a platform like this makes it much easier to consume and manage R securely and at scale in applications in production” (read more on why Microsoft built SQL Server R Services here).
In the following post I will illustrate an end-to-end example of running R in SQL. This includes setting up a SQL DB environment, creating a machine learning script in R with the use of Visual Studio and deploying the ML script on SQL Server. Microsoft documentation, blogs and webinars around running R code on SQL Server is extensive, but I often found rather disjointed, sometimes providing a lot of information on how to do the same thing in a variety of ways and at other times missing out essential set up information. The following example walks you through the way I found most effective at getting started with running R scripts on SQL Server, with a few of my own additional pointers added in.
Step 1:
· Install SQL Server 2017 (Developer edition) here
· Run the custom path, click install.
· In the SQL Server installation Centre pop-up, click on the installation tab and select ‘New SQL Server stand-alone installation or add features to an existing installation’
· Step through each of installation pages, once you come to the ‘Feature Selection’ tab, check ‘Database Engine Services’, ‘Machine Learning Services (In-Database)’, ‘R’, ‘Python’. Click here for more information on the installation process.
Step 2:
· Install Microsoft SQL Server Management studio (SSMS) here
Step 3:
· Connect SSMS to local SQL Server by opening SSMS, click file then ‘Connect Object Explorer’. Set Server type to ‘Database Engine’, server name to ‘computername\’ (default, unless set differently during installation) and authentication== Windows Authentication, then connect.
Step 4:
· Install Visual Studio 2017 here. During the installation process select ‘data science and analytical applications’ from workloads which will enable F#, R and Anaconda support. (Previously R Tools for visual studio 2015 would have to be installed separately)
Now the trivial, yet time consuming set up is done we can move on to the fun stuff.
In the following example we will work on a scenario where we have a ski rental business and are trying to predict number of rentals to better manage stock and staff. The data stored a .bak file can be found here. Save the file in a location that is accessible to SQL Server such as: C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Backup.
Step 5:
· Open a new query in SSMS and run the following code to enable external scripts to be run.
· Once you have executed the first two lines go the start menu and search for ‘administrative tools’, then double click on services find SQL Server, right click and select restart.
· Go back to SSMS and run ‘sp_configire’ you should now see 1’s in the last three columns of the row ‘external scripts enabled’.
Step 6:
· Open a new SSMS query and run the following code to restore the DB and access the rental data.
Step 7:
· Open Visual Studio and open a R project. Run the following code to load the data from SQL Server
Step 8:
· Run the following code to prepare the data
Step 9:
· Run the following code to train the model and make a prediction, using linear regression and a decision tree model. It appears the decision tree performs with greater accuracy at predicting ski rentals.
Step 10:
· The final step involves deploying the ML script with SQL Server by authoring a T-SQL program that contains the embedded R script, and the SQL Server database engine takes care of the execution. To deploy, you store your model in the database and create a stored procedure that predicts using the model.
Having worked through the set up and execution of using R in SQL with pretty clean data, I am currently trying to run a similar process for a Rail project where the SQL DB is on Azure (West Central US) and much more extensive data preparation is required with the use of a variety of packages- so keep a look out for my next post on the pros and cons of using R on SQL for real data!