On this page

    Datasets

    To analyze data from a data source, files must be formatted into tables in Dremio Sonar. You can then create views from one or more tables. The following figure shows how data is represented in Sonar.

    Tables

    A table contains the data from your source, formatted as rows and columns. A table cannot be modified by Dremio Sonar. A table is represented by the icon .

    Maximum Number of Leaf Columns in a Table

    The term leaf columns refers to all columns in a table and all leaf-level fields that are used in complex data types in the table. Tables in Dremio are allowed a maximum of 6,400 leaf columns. For example, if a table has 1,000 columns and the data type of one column is a STRUCT that has three fields, the count of leaf columns is 1,003.

    Maximum Number of Leaf Columns Scanned Per Table in a Query

    The maximum number of leaf columns that a query is allowed to scan per table or per reflection is 800. If a query attempts to scan more than this limit, then you receive this error message:

    At most, <limit> columns including leaf level fields of complex type are allowed to be scanned, but the query is scanning <number>. Please include the columns you want to be returned from the query and try again.

    note:

    If you are trying to use a SELECT * query to view all columns in a table, you can avoid scanning too many columns by using the DESCRIBE TABLE command, which returns the name and other details about all columns in the table.

    Best Practices for Querying Wide Tables

    Dremio recommends these two best practices:

    • Create views for querying subsets of the leaf columns in wide tables.
      Users who run ad hoc queries directly against wide tables might inadvertently create queries that need to scan more than 800 columns per table. To help users avoid failed queries, create views on subsets of the leaf columns in the wide table and give users access to those views.
    • Create reflections to satisfy queries on wide tables.
      It is best to restrict the leaf columns included in reflections to those that are most frequently queried. Doing so conserves the resources that are required for storing and refreshing reflections.

    Views

    A view is a view derived from tables or other views.

    Views are not copies of the data so they use very little memory and always reflect the current state of the parent tables or views they are derived from. A view is represented by this icon: .

    Layering Tables and Views

    Dremio recommends that, when you create tables and views, you create them in layers:

    • The bottom or first layer consists of your tables.
    • In the second layer are views, one for each table, that do lightweight preparation of data for views in the next layers. Here, administrators might create views that do limited casting, type conversion, and field renaming, and redacting sensitive information, among other prepping operations. Administrators can also add security by subsetting both rows and fields that users in other layers are not allowed to access. The data has been lightly scrubbed and restricted to the group of people who have the business knowledge that lets them use these views to build higher-order views that data consumers can use. Then, admins grant access to these views to users who create views in the next layer, without being able to see the raw data in the tables.
    • In the third layer, users create views that perform joins and other expensive operations. This layer is where the intensive work on data is performed. These users then create reflections (raw, aggregation, or both) from their views.
    • In the fourth layer, users can create lightweight views for dashboards, reports, and visualization tools. They can also create aggregation reflections, as needed.

    Managing Tables and Views

    The following section describes how to create and manage tables and views.

    Formatting a Table

    For formatting a file or folder as a table, see formatting tables and views.

    Viewing a Table

    To view a table:

    1. On the Datasets page, navigate to a data source.

    2. If you have already formatted your table, then select that table. Otherwise, you can format a file or folder as a table.

    3. Click Run or Preview to query the dataset, the Data tab displays the table fields.

    Creating a View

    You can create a view from an existing table or view. Create a view from a table by performing the following steps:

    1. To create a view from a table, you can transform the data as required.

    2. Compose the query as required and click Run to validate the query. After running the query, click the Save icon in the top-right corner of the page. In the drop-down list, click Save As….

      As tables cannot be modified, Save is disabled in the drop-down list.

    3. Clicking Save As… prompts you to name the new view and select from a list of spaces where it will be stored. Provide a name and select one space to store it. If the space (path) is not changed, the view gets saved in your home space.

    4. To see the new view, navigate to the space that contains the view on the Datasets page. Click the view to see the data.

    5. You can edit the original query that created the view. Click the Edit Original Query button on top of the query editor. You can edit the query and click the Save icon, and then click Save in the drop-down list.

      You can also create a new view after editing the original query by clicking the Save icon and then clicking Save As…. Save it as a new view by providing a name.

    Example of Creating a View

    This example shows how to create a view from the SF weather 2018-2019.csv table by performing the following steps:

    1. On the Datasets page, trace the SF weather 2018-2019.csv table in the corresponding source folder and click it. To create a view, change the data type of the ELEVATION into number by running the following query.

      Query to change ELEVATION data type
      SELECT
        TO_NUMBER(ELEVATION, '##.##') AS ELEVATION
      FROM
        "SF weather 2018-2019.csv"
      
    2. Click the Save icon in the top-right corner of the page, and then click Save As… from the drop-down list. As tables cannot be modified, Save is disabled in the drop-down list.

    3. Clicking Save As… prompts you to name the new view and select from a list of spaces where it will be stored. Provide a name and select a space to store it. Let us name the view as SFWeatherElevation. If you do not specify a space (path), the view is saved in your home space.

    4. On the Datasets page, navigate to the space that contains the SFWeatherElevation view. Click the view to see the query and data. You can see the table with a single Elevation column and the following query.

      Query for SFWeatherElevation
      SELECT
        *
      FROM
        SFWeatherElevation
      

    Starring a Table or View

    You can star a table or view that you use frequently, which adds the table or view to your Starred list for easier access.

    To star a table or view:

    1. Click on the This is the icon that represents the SQL runner. icon in the side navigation bar to navigate to the SQL Runner.

    2. In the Data panel, find the dataset.

    3. Click on the This is the icon for starring a dataset. (Star) icon that appears next to the dataset. The dataset will appear on your Starred list.

    This is a screenshot showing how to star a table or view for the Starred list.

    In addition to starring tables and views, you can star spaces, sources, and other objects in the data catalog. The Starred list can hold up to 25 entities at a time, and each starred item remains on the list even if you open a new browser or clear the cache. To unstar a catalog object, click the Star icon again.

    notes:

    • The starring option is not available for datasets in the Nessie repository.
    • Starring is different than pinning items. You can only pin spaces and sources on the Datasets page, and pinned items are not saved if you open a new browser or clear the cache.

    Removing a Table

    Removing the formatting on a table will revert the table to the folder or file format that it was originally in. For the steps, see Removing Formatting on Data.

    Removing a View

    Perform the following steps to remove a view:

    1. On the Datasets page, go to the Homespace This is the icon that represents the Homespace. (house) icon or any space under Spaces, where your view is located.

    2. Under the Action column of a view that you want to remove, click the ellipses () icon. From the list of actions, click Remove. Confirm that you want to remove the view.

    warning:

    If you are removing a table or view with children, you get a warning. Removing a table or view with children leaves you with disconnected views that you can no longer query.

    Exploring Datasets

    For a given dataset, by default, you can see the data in a dataset in the Data tab. The other tabs for viewing datasets are shown below:

    This is a screenshot showing the Datasets page.

    LocationDescription
    1Catalog lets you add useful information about a specific dataset in its wiki, and add searchable labels. This enhances team collaboration as other users can search the labels to trace a specific dataset.
    2Graph shows the data lineage of a dataset. Data lineage refers to the record of how data got into a specific location and the intermediate steps and transformations that took place in its transit.
    3Reflections are physically optimized representations of source data.

    Opening Datasets

    By clicking on a dataset in the Data panel, you can view or edit the DDL (or original SQL) statement that was used for the dataset, which has been pre-populated in the SQL editor on the Datasets page.

    The SQL editor will pre-populate the DDL statement as long as you have privilege access for the dataset. If you don’t have privilege access, the default SELECT * FROM <name of your dataset> statement will be pre-populated instead, as shown in this image:

    This screenshot is displaying the SQL editor within the Datasets page.

    LocationDescription
    1The Data panel is used to explore your data catalog.
    2Collapsing the Data panel hides the panel from view.
    3The Data component on the Datasets page is where the SQL editor is located.

    note:

    When opening a dataset, a preview of the results will not render by default. The results will only load if you explicitly click Run or Preview.

    Viewing the Metadata of a Dataset

    Wherever a dataset is referenced in Dremio Cloud, you can view a metadata card with details about the dataset.

    To view the metadata, hover over a dataset to see a metadata card appear with details about the dataset. The components of the metadata card are described below.

    This screenshot is displaying the metadata card.

    LocationDescription
    1Icon represents the dataset format as a table or a view .
    2Card title displays the dataset name.
    3Click the SQL Runner icon to query the dataset on the SQL Runner page.
    4Click the Datasets icon to open the dataset on the Datasets page to further view the metadata and edit the definition of the dataset.
    5The path of the dataset shows where the dataset is located.
    6Displays any labels that have been applied to the dataset.
    7Columns: The number of columns in the dataset.
    8Jobs: The number of jobs run on the dataset in the last 30 days, which links to the Jobs page.
    9Descendants: The number of views that are created from the dataset.
    10Open Details: Click this link to open the dataset details panel on the right side of the browser window. In the details panel you can add or remove labels on the dataset and add or modify wiki content.
    11Columns: The number of columns in the dataset, including the name and data type of each column. Columns with partitions (if any) use a separate partition icon. See Data Types for more information.

    note:

    When viewing a metadata card in the SQL Runner’s data panel, the Columns section at the bottom of the card is not displayed because columns can be displayed by expanding the dataset.