R
An R program can connect to a Dremio cluster through ODBC or JDBC, making it easy to load the data of a Dremio dataset or SQL query directly into an R dataframe.
Install the Dremio Connector and Dremio JDBC Driver
Install the Dremio Connector (ODBC) or Dremio JDBC Driver.
Using the RODBC Package
The RODBC
package enables R programs to utilize compliant ODBC drivers such as Dremio's ODBC driver. For the hostname or IP address, enter the IP address or hostname for one of the coordinator nodes in your cluster.
The following R program loads the dataset foo.bar.baz
into a data frame and prints some basic statistics about the data using R's summary
function:
if (!require(RODBC)) { install.packages(RODBC); require(RODBC) }
dremio_host <- "<hostname or IP address>"
dremio_port <- "31010"
dremio_uid <- "<username>"
dremio_pwd <- "<password>"
channel <- odbcDriverConnect(sprintf("DRIVER=Dremio Connector;HOST=%s;PORT=%s;UID=%s;PWD=%s;AUTHENTICATIONTYPE=Basic Authentication;CONNECTIONTYPE=Direct", dremio_host, dremio_port, dremio_uid, dremio_pwd))
df <- sqlQuery(channel, "SELECT * FROM foo.bar.baz")
if (is.character(df)) { close(channel); stop(paste(df, collapse = "\n")) } # stop if query failed
### REPLACE WITH YOUR CODE
print(nrow(df)) # print # records returned
df <- df[,sapply(df, class) != "ODBC_binary"] # remove binary columns (otherwise summary won't work)
print(summary(df)) # print statistics
###
close(channel)