Power BI is a versatile Business Intelligence platform that offers an endless range of options, features and technical skills to master. In addition Power BI can be enhanced by various extensions such as Python or R. These languages open a wide range of capabilities to use in your data analysis process. This series is about using the R-language in Power BI. I don’t know yet how many parts this series will have or which topics it will contain, but l hope you will join me on my journey into this interesting area of Power BI.
What R is and what it can do
R is a free and open source programming language and environment. It is a collection of different items such as functions, datasets, compiled code and documentations organized in packages. These packages are dedicated to accomplish tasks in a wide range of specific areas. By now, thousands of them with different functionality have been provided by the R community and are free to use. Among these areas of application are topics like the following ones:
So analysts all over the world use R in areas like finance, market research, social science studies, natural language processing, genetic studies or big data, to name only a few.
All these packages live in central repositories downloadable and installable on your local machine.
Where can we use R in Power BI?
In Power BI Desktop there are at least three areas in which you can utilize R scripts:
- The first one is the ability to use R as a data source (1.). This enables you to use R code to fetch data from whatever data source R is capable of, transform it the way you need and load it into the data model of Power BI.
- The second is the option to use R's visualization capabilities (2.). You can use R to create additional types of visuals that are not available in Power BI (yet). You can also create an existing type of visual but enhance it with capabilities that are not supported in Power BI. Several packages for visualization are supported, first of all the package “ggplot2”.
- Last but not least there is the option to add a data transformation step in the Transform tab of the Power Query Editor (3.). This empowers you to cleanse, shape and enhance your data on its way from the data source to your data model, using R capabilities.
But to get all this going you to have to set up and prepare your computer by installing R framework and the packages you need. Install an "Integrated Development Environment" (IDE) if you like to and register R in Power BI Desktop.
How to install R on your computer?
To use R in Power BI it has to be installed on the computer where Power BI is running. For your development in Power BI Desktop this means you have to install it on your local machine.
To install R go to the homepage of “The comprehensive R Archive Network” also known as CRAN. At time of writing the latest version is R 4.0.0. For installation you have to follow these steps:
- Click on the link "Download for Windows"
- Click on the link "base" to install base R packages
- Click on the link "Download R 4.0.0 for Windows" to download the installation files
- Run installation files. 32-bit and 64-bit versions are installed in case you run 64-bit Windows
Alternatively you also can install R from Microsoft R Application Network.
In order to provide a good coding experience it is also recommended to install an IDE of your choice. An IDE will support your coding by providing some options in the GUI or features like IntelliSense, Syntax highlightening or parameter hints.
There are a lot of good IDEs available for R, e.g. Eclipse, Architect, Revolution R or Live-R, to name only a few. For this series I will go on with RStudio which also is a popular development environment for R-language. You can download a free open source version of RStudio Desktop from the RStudio homepage and install it on your local machine.
Installing and using further packages
With the installation of R framework from CRAN repository you already installed several base R packages. These packages form your local library. To see which packages are available in your library open RStudio:
You have the option to either run the library() command in the console window (bottom left) to create a list of all installed packages, including a short description of what these packages perform, or you simply take a look at the packages window (bottom right) in RStudio to figure them out.
But having packages in your library does not mean they are ready to use. You have to load them into memory explicitly. This is not performed automatically. As I already pointed out, there are thousands of packages available and therefore potentially installed in your local library. Loading them all by default might eat up all your RAM.
You can use search() command to list all packages that are already loaded to RAM. Once again you could also have a look into the packages window of RStudio. All loaded packages are marked there.
To load a new package from your library into memory you can use the library(packagename) command providing the name of the package as parameter. In our example above we loaded the package “cluster”, a package to identify groups in your data. Running search() command once again or taking a look at the package window shows, that it has been loaded to RAM.
If you want to use non standard packages you have to download them to your local library first. There are several online repositories that provide R packages. Some of them are BioConduct, Omegahat, R.Forge, rforge.net or GitHub. Nevertheless the main repository for R is CRAN and this is also the repository that is supported by Power BI as of today.
Let’s assume you want to use "ggplot2" for visualization purposes. You can use the install.packages(“packagename”) command to fetch it from the CRAN repository into your library. Using library("ggplot2") will load it into memory and make it ready for use.
You also can unload packages from memory by using the detach() command. To unload "ggplot2" you can write detach(package: ggplot2, unload=true).
To get an overview off available packages in CRAN and the topics they are dealing with you can reach out for the CRAN Task Views side .
How to register R in Power BI Desktop?
After installing R and the respective packages on your computer you have to take the following additional steps to use these packages within Power BI Desktop:
- Open Power BI Desktop
- Switch to the File section and select "Options and Settings"
- Select "Options" in the "Options and Settings" dialog.
- In the "Global" section go to "R scripting"
- Use the drop down under "Detected R home directories" to chose your R installation in case it is not recognized automatically
- Use the drop down under "Detect R IDEs" to select one of the IDEs you have installed for R
If RStudio is installed on your computer, as in our case, RStudio is registered in Power BI Desktop as the default IDE. After registering your IDE you will be able to open it from Power BI Desktop to develop your R Code.
In this case Power BI Desktop adds some rows of code at the top lines to enable loading of data in the IDE for test purposes. You can copy your code back into Power BI Desktop after development without these automatically generated rows.
What packages are supported in the Power BI Service?
In order to use R in Power BI Desktop on your local machine you have to install the favored R packages. But what about the Power BI Service? You are not able to install a package there.
Microsoft preinstalled a selection of packages and is constantly adding more.
At the moment there are already hundreds of packages supported in the Power BI Service. A list of all supported packages can be found in the Microsoft documentation.
There is also a small list of packages that are explicitly not supported. In most cases the reason for this is that these packages perform client-server queries over the World Wide Web.
Even though these packages are preinstalled by Microsoft in the Power BI Service, bear in mind, that you always have to load a package explicitly into memory to be able to use it.
Restrictions for using R in Power BI
When it comes to restrictions in using R in Power BI the most obvious point to keep in mind is that not all existing R packages are supported in the Power BI Service. Apart from this, at the time of writing, there are also some additional aspects to consider:
• Only packages with free and open source software licenses such as GPL-2, GPL-3 and MIT+ are supported
• Only packages published in CRAN repository are supported
• Packages that perform client-server queries over the World Wide Web are blocked due to security considerations (e.g. RgoolgeMaps)
• R in Power BI only accepts dataframes. The types complex and vector are not supported
• The server timeout for running code is set to 30 minutes
• No interaction code (e.g. user inputs) are allowed
• All data sources used in R need to have the privacy level "public"
• At the time of writing on-premises data gateway for data refresh from on-premises data sources does only support R when it is installed in personal mode.
These are what I would consider the essential restrictions. For additional information reach out to the Microsoft documentation.
We have discussed what the R framework is about and in which areas you can use it within your data analysis process. We also walked through the installation process and have seen where and how we can use R in Power BI. Finally we have pointed out some restrictions to consider when preparing your data analysis with R in Power BI.
So, a solid foundation is laid. But what is next? In the following parts of the series we will get our hands on some practical examples of tasks that can be accomplished by using R while not or not yet supported in native Power BI. So I hope that also will be of interest for you.
Stay tuned for more.
I hope you enjoyed this article and will be back for the next part of the series. You can also visit my YouTube-Channel for more content on Power BI, SSAS, Power Pivot, Power Query, DAX und M!
Greetings from Hamburg, Germany
STAY IN TOUCH...