An R program can connect to a Dremio cluster through ODBC or JDBC, making it easy to load the data of a Dremio dataset or SQL query directly into an R dataframe.

Install the Dremio Connector and Dremio JDBC Driver

Install the Dremio Connector (ODBC) or Dremio JDBC Driver.

Using the RODBC Package

The RODBC package enables R programs to utilize compliant ODBC drivers such as Dremio's ODBC driver. For the hostname or IP address, enter the IP address or hostname for one of the coordinator nodes in your cluster.

The following R program loads the dataset foo.bar.baz into a data frame and prints some basic statistics about the data using R's summary function:

if (!require(RODBC)) { install.packages(RODBC); require(RODBC) }

dremio_host <- "<hostname or IP address>"
dremio_port <- "31010"
dremio_uid <- "<username>"
dremio_pwd <- "<password>"

channel <- odbcDriverConnect(sprintf("DRIVER=Dremio Connector;HOST=%s;PORT=%s;UID=%s;PWD=%s;AUTHENTICATIONTYPE=Basic Authentication;CONNECTIONTYPE=Direct", dremio_host, dremio_port, dremio_uid, dremio_pwd))

df <- sqlQuery(channel, "SELECT * FROM foo.bar.baz")
if (is.character(df)) { close(channel); stop(paste(df, collapse = "\n")) } # stop if query failed

print(nrow(df)) # print # records returned
df <- df[,sapply(df, class) != "ODBC_binary"] # remove binary columns (otherwise summary won't work)
print(summary(df)) # print statistics


results matching ""

    No results matching ""