Similar to Python, R is an open-source statistical software that is used to clean and analyze data. It is popular within the data science community and has many packages that make statistical modeling easier for statisticians and data scientists. Here is an idea of why it might be worth considering learning R.
To download R, go to the CRAN webiste and download the version of R for your system.
RStudio is an integrated development environment (IDE) for R that is availble in both open source and commercial editions. RStudio is also the group responsible for many R resources and package development. Take a look at the resources page on the RStudio website for more information, but below are just a couple of useful resources from RStudio.
To download RStudio, go to the RStudio webiste and download the version of RStudio for your system.
NOTE R and RStudio may be used interchangeably throughout this page.
RStudio can connect to various database for production development of models or ad hoc analysis. If you want to connect to Snowflake below are some steps to accomplish this.
brew install unixodbc
odbcinst -j
.odbcinst.ini (location of file based on output from the odbcinst -j
command above)
[Snowflake]
Driver = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib
odbc.ini (location of file based on output from the odbcinst -j
command above)
[ODBC Data Sources]
SnowflakeDSII = Snowflake
[SnowflakeDSII]
Server = gitlab.snowflakecomputing.com
Port =
UID =
Schema =
Warehouse =
Driver = /opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib
Description = Snowflake DSII
Locale = en-US
Tracing = 0
Authenticator = gitlab.okta.com
This video shows the basic steps to connect a tool (it covers Excel) to Snowflake via ODBC.
The next step is to connect RStudio to Snowflake using the driver configurations you've just set up. This can be accomplished by using the DBI
,tidyverse
, and odbc
packages in R. For a general overview on how to connect to databases in RStudio, please refer to this website for detailed information.
This is an example of the code that can be used to connect to Snowflake in R.
con <- DBI::dbConnect(odbc::odbc(),
driver = "Snowflake",
uid = rstudioapi::askForPassword("Database UserID"),
role = [your user role],
warehouse = [warehouse you wish to connect to],
pwd = rstudioapi::askForPassword("Database password"),
Authenticator = "externalbrowser",
database = [database you wish to connect to],
schema = [schema you wish to connect to],
server = "gitlab.snowflakecomputing.com"
)
Some details regarding the above code:
dbConnect()
function to pass our database parameters.odbc::odbc()
tells the function you are going to use an ODBC driver for this connection.rstudioapi::askForPassword("")
function prompts the user to enter their UID and/or their PWD so it is not stored in their script.driver = "Snowflake"
is specific to the odbcinst.ini file set up above. This specifies which driver will be used to connect. (NOTE: if you are experiencing issues connecting, try changing the syntax to the actual path of the driver in R. Example: driver = "/opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib"
.server = "gitlab.snowflakecomputing.com
is specific to the snowflake instance being accessed.Since OKTA or other authenticators are often used to connect to Snowflake, we have reference Authenticator
several times in the directions above. The first is in the odbc.ini file, specifying the authenticator used here at GitLab (OKTA). It is then referenced in the parameters used with DBI::dbConnect()
in the line Authenticator = "externalbrowser"
.
The "externalbrowser" lets dbConnect()
know it should reference the url specified in the configuration file to login to Snowflake. The password that is entered during the rstudioapi::askForPassword()
prompt should be the users OKTA password.
Once you've completed the steps above and try running the code, you should be taken to a webpage to complete login. The console in R should display the following text before it takes you to the webpage.
Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
NOTE: You will have to stay on the webpage until it indicates your identity was confirmed and you were connected to Snowflake.
It is recommended to set up a .Rprofile file to customize the startup process for a given session in RStudio. It can also simiplify sharing code with other users. Upon startup, R and RStudio will look for and run the .Rprofile file which can be used to control the behavior of your R session (e.g. setting options or environment variables).
.Rprofile files can be either at the user or project level. User-level .Rprofile files live in the base of the user's home directory, and project-level .Rprofile files live in the base of the project directory. R will source only one .Rprofile file. So if you have both a project-specific .Rprofile file and a user .Rprofile file that you want to use, you explicitly source the user-level .Rprofile at the top of your project-level .Rprofile with source("~/.Rprofile").
One easy way to edit your .Rprofile file is to use the usethis::edit_r_profile()
function from within an R session. You can specify whether you want to edit the user or project level .Rprofile.
Follow the example below to set up a new .Rprofile file that automatically sets your username, role, and driver for snowflake. If other users follow the same template, they will not have to update this information when they access Snowflake (or any other database) tables using your code in R:
edit_r_profile()
function from the usethis
packageinstall.packages("usethis")
library(usethis)
usethis::edit_r_profile()
.First <- function() cat("Welcome to R!")
.Last <- function() cat("Goodbye!")
uid = "CSMITH@GITLAB.COM"
role = "CSMITH"
driver = "/opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib"
styler::tidyverse_style()
message("*** Successfully loaded .Rprofile ***")
*** Successfully loaded .Rprofile ***
Welcome to R!
uid
, role
, and driver
in your environment. These variables are used for connecting to your database (Snowflake here at GitLab), or for any other variables you deem necesary.The dbplyr package can be used to interact with databases using the tidyverse language. If you're familiar with tidyverse already, you may find this package especially helpful.
brew install git
in your terminalwhich git
and hit the return key/usr/bin/git
. (Note: if navigating through Finder, hidden files can be viewed by pressing Command
+ Shift
+ .
)git config --global user.name 'yourGitHubUsername'
git config --global user.email ‘name@provider.com'
Google Sheets and R have the ability to interact via the googlesheets4
and googledrive
packages in R.
pkg <- c("googlesheets4", "googledrive")
invisible(lapply(pkg, function(x) if (x %in% rownames(installed.packages())==F) install.packages(x)))
invisible(lapply(pkg, library, character.only = TRUE))
rm(pkg)
read_sheet()
function will allow you to read an existing spreadsheet
read_sheet()
command in R pointing to the Spreadsheet URL you want to viewgooglesheets4::read_sheet("https://docs.google.com/spreadsheets/...")
Yes
in RStudio when asked "Is it Ok to cache OAuth access credentials in the folder between R Sessions"read_sheet()
command again to confirm you can see output in RBelow are a list of functions that can be used to write data into a Google Sheet with examples.
(ss <- gs4_create("fluffy-bunny", sheets = list(flowers = head(iris))))
head(mtcars) %>%
sheet_write(ss, sheet = "autos")
df <- dataframe
ss <- "https://docs.google.com/spreadsheets/..."
googlesheets4::range_write(ss = ss,
data = df,
sheet = "tabname")
df <- dataframe
ss <- "https://docs.google.com/spreadsheets/..."
googlesheets4::range_clear(ss = ss,
sheet = "tabname"
range = "tabname!A2:ZZ1000000")
df <- dataframe
ss <- "https://docs.google.com/spreadsheets/..."
googlesheets4:sheet_append(
ss = ss,
data = df,
sheet = "tabname")