On this page

    Datasets

    To analyze data from a data source, files must be formatted into tables in Dremio Cloud. You can then create views from one or more tables. The following figure shows how data is represented in Dremio Cloud.

    Tables

    A table contains the data from your source, formatted as rows and columns. A table cannot be modified by Dremio Cloud. A table is represented by the icon .

    note:

    Since a table represents data stored in a physical location, it may be referred to in the Dremio app, APIs, or SQL commands as PDS (physical dataset).

    Views

    A view is a virtual table, created by running SQL statements or functions on a table or another view.

    Views are not copies of the data so they use very little memory and always reflect the current state of the parent table or view from which they are derived. A view is represented by the icon .

    note:

    Since a view is a virtual table, it may be referred to in the Dremio app, APIs, or SQL commands as VDS (virtual dataset).

    Layering Datasets

    Dremio recommends that, when you create tables and views, you create them in layers:

    • The bottom or first layer consists of your tables.
    • In the second layer are views, one for each table, that do lightweight preparation of data for views in the next layers. Here, administrators might create views that do limited casting, type conversion, and field renaming, and redacting sensitive information, among other prepping operations. Administrators can also add security by subsetting both rows and fields that users in other layers are not allowed to access. The data has been lightly scrubbed and restricted to the group of people who have the business knowledge that lets them use these views to build higher-order views that data consumers can use. Then, admins grant access to these views to users who create views in the next layer, without being able to see the raw data in the tables.
    • In the third layer, users create views that perform joins and other expensive operations. This layer is where the intensive work on data is performed. These users then create reflections (raw, aggregation, or both) from their views.
    • In the fourth layer, users can create lightweight views for dashboards, reports, and visualization tools. They can also create aggregation reflections, as needed.

    Managing Datasets

    The following section describes how to create and manage datasets.

    Formatting a Table

    For formatting a file or folder as a table, see formatting datasets.

    Viewing a Table

    To view a table:

    1. In the Datasets UI, navigate to a data source.

    2. If you have already formatted your dataset, then select that dataset. Otherwise, you can format a file or folder as a table.

    3. The dataset page displays the table fields of the table.

    4. To see the metadata of a table, click the table icon that is on the top-left of the page.

      The metadata fields of a table are described below.

      Metadata FieldDescription
      DescendantsThe number of views that are created from this table.
      FieldsColumns of the table in the table. Field types are also displayed. Columns with partitions (if any) have a separate partition icon and field types, dir0, dir1, dir2 as shown in the above figure.
      JobsThe number of jobs run on the table.

    Creating a View

    You can create a view from an existing table or view. Create a view from a table by performing the following steps:

    1. To create a view from a table, you can transform the data as required.

    2. Compose the query as required and click Run to validate the query. After running the query, click the Save icon in the top-right corner of the page. In the drop-down list, click Save As….

      As tables cannot be modified, Save is disabled in the drop-down list.

    3. Clicking Save As… prompts you to name the new view and select from a list of spaces where it will be stored. Provide a name and select one space to store it. If the space (path) is not changed, the view gets saved in your home space.

    4. To see the new view, navigate to the space that contains the view on the Datasets page. Click the view to see the data.

    5. You can edit the original query that created the view. Click the Edit Original Query button on top of the query editor. You can edit the query and click the Save icon, and then click Save in the drop-down list.

      You can also create a new view after editing the original query by clicking the Save icon and then clicking Save As…. Save it as a new view by providing a name.

    Example of Creating a View

    This example shows how to create a view from the SF weather 2018-2019.csv table by performing the following steps:

    1. In the Datasets UI, trace the SF weather 2018-2019.csv table in the corresponding source folder and click it. To create a view, change the data type of the ELEVATION into number by running the following query.

      SELECT
        TO_NUMBER(ELEVATION, '##.##') AS ELEVATION
      FROM
        "SF weather 2018-2019.csv"
      
    2. Click the Save icon in the top-right corner of the page, and then click Save As… from the drop-down list. As tables cannot be modified, Save is disabled in the drop-down list.

    3. Clicking Save As… prompts you to name the new view and select from a list of spaces where it will be stored. Provide a name and select a space to store it. Let us name the view as SFWeatherElevation. If you do not specify a space (path), the view is saved in your home space.

    4. On the Datasets page, navigate to the space that contains the SFWeatherElevation view. Click the view to see the query and data. You can see the table with a single Elevation column and the following query.

      SELECT
        *
      FROM
        SFWeatherElevation
      

    Exploring the Dataset Components

    For a given dataset, by default, you can see the tabular data in the Data component. Catalog, Graph, and Reflections are other components.

    Catalog

    Catalog lets you add useful information about a specific dataset in its wiki, and add searchable tags. This enhances team collaboration as other users can search the tags to trace a specific dataset.

    Graph

    The graph shows the data lineage of a dataset. Data lineage refers to the record of how data got into a specific location and the intermediate steps and transformations that took place in its transit.

    Reflections

    Dremio Cloud maintains physically optimized representations of source data known as data reflections.

    Starring a Dataset

    You can star a dataset that you use frequently, which adds the dataset to your Starred list for easier access.

    To star a dataset:

    1. Click on the This is the icon that represents the SQL runner. icon in the side navigation bar to navigate to the SQL Runner.
    2. In the Data catalog, find the dataset.
    3. Click on the This is the icon for starring a dataset. (Star) icon that appears next to the dataset. The dataset will appear on your Starred list.

    This is a screenshot showing how to star a dataset for the Starred list.

    In addition to starring datasets, you can star spaces, sources, and other objects in the data catalog. The Starred list can hold up to 25 entities at a time, and each starred item remains on the list even if you open a new browser or clear the cache. To unstar a catalog object, click the Star icon again.

    notes:

    • The starring option is not available for datasets in the Nessie repository.
    • Starring is different than pinning items. You can only pin spaces and sources in the Datasets UI, and pinned items are not saved if you open a new browser or clear the cache.

    Removing a Table

    Removing the format removes a table.

    Removing a View

    Perform the following steps to remove a view:

    1. In the Datasets UI, go to the home space (home icon) or any space under Spaces, where your view is located.

    2. Under the Action column of a view that you want to remove, click the ellipses () icon. From the list of actions, click Remove. Confirm that you want to remove the view.

    warning:

    If you are removing a dataset with children, you get a warning. Removing a dataset with children leaves you with disconnected views that you can no longer query.