Cloud Cache

The cloud caching provides a local (per executor node) cache for Parquet files. This cache is persistent across reboots.

Data Sources

Cloud caching is implemented for Parquet files associated with the following data sources:

  • Amazon S3
  • ADLS (Gen 1)
  • Azure Storage (ADLS Gen 2) - v2 only

Statistics and Usage

Dremio allows you to track the space usage and cache effectiveness. The following information is available:

  • Global usage statistics -- Provides information on the cache’s effectiveness globally and across each data source.
  • Query usage statistics -- Provides information on cache usage for each query run such as if a specific query is making use of the cache and by how much.

Cache Manager Statistics include:

  • sys.cache.mount_points
  • sys.cache.storage_plugins
  • sys.cache.datasets
  • sys.cache.objects

sys.cache.mount_points

This table shows statistics at each mount point level.

hostname mount_point_path mount_point_id sub_dir_count approx_file_count max_space used_space avg_read_time_nanos avg_write_time_nanos
ip-172-31-13-87.us-west-2.compute.internal /cachemanager/cm 0 128 1947 73014444032 2041577472 190627 906810
ip-172-31-6-122.us-west-2.compute.internal /cachemanager/cm 0 128 1520 73014444032 1593835520 274164 951066

whereas

  • hostname
  • mount_point_path: Path to the mount point (one row per mount point).
  • sub_dir_count: Number of sub-directories under the mount point.
  • approx_file_count: Number of files chunks cached.
  • max_space: Total space available.
  • used_space: Total space used.
  • avg_read_time_nanos: Average time taken to read a cached block from disk (in nanos).
  • avg_write_time_nanos: Average time taken to write a cached block to disk (in nanos).

sys.cache.storage_plugins

This table shows cloud cache information at source level on each mount point.

hostname storage_plugin_name storage_plugin_id sub_dir_count approx_file_count approx_size_bytes
ip-172-31-6-122.us-west-2.compute.internal s3 0 0 571 598736896
ip-172-31-6-122.us-west-2.compute.internal s31 0 0 366 383778816
ip-172-31-6-122.us-west-2.compute.internal s33 0 0 583 611319808
ip-172-31-6-122.us-west-2.compute.internal s3 0 0 561 588251136
ip-172-31-6-122.us-west-2.compute.internal s31 0 0 549 575668224
ip-172-31-6-122.us-west-2.compute.internal s33 0 0 837 877658112

whereas

  • hostname
  • storage_plugin_name: Name of the storage plugin.
  • storage_plugin_id: ID for the storage plugin.
  • approx_size_bytes: Amount of cache space used to cache data from this storage plugin.

sys.cache.datasets

This table shows information about each dataset that is cached on executor nodes.

hostname dataset_name storage_plugin_name approx_file_count percent_data_25 percent_data_50 percent_data_75 percent_data_100
ip-172-31-6-122.us-west-2.compute.internal s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s3 558 2019-09-11T10:07:11.387 2019-09-11T10:07:11.387 2019-09-11T10:07:11.387 2019-09-11T10:07:11.387
ip-172-31-6-122.us-west-2.compute.internal s331."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s31 366 2019-09-11T09:58:52.770 2019-09-11T09:58:52.770 2019-09-11T09:58:52.770 2019-09-11T09:58:52.770
ip-172-31-6-122.us-west-2.compute.internal s333."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s33 25 2019-09-11T10:07:12.357 2019-09-11T10:07:12.357 2019-09-11T10:07:12.357 2019-09-11T10:07:12.357
ip-172-31-6-122.us-west-2.compute.internal s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s3 571 2019-09-16T05:49:49.558 2019-09-16T05:49:49.558 2019-09-16T05:49:49.558 2019-09-16T05:49:49.558
ip-172-31-6-122.us-west-2.compute.internal s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s33 837 2019-09-11T10:07:12.800 2019-09-11T10:07:12.800 2019-09-11T10:07:12.800 2019-09-11T10:07:12.800
ip-172-31-6-122.us-west-2.compute.internal s31."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s31 549 2019-09-11T09:58:54.345 2019-09-11T09:58:54.345 2019-09-11T09:58:54.345 2019-09-11T09:58:54.345
ip-172-31-6-122.us-west-2.compute.internal s3."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem s3 561 2019-09-16T05:49:47.060 2019-09-16T05:49:47.060 2019-09-16T05:49:47.060 2019-09-16T05:49:47.060

whereas

  • hostname
  • dataset_name: Dataset name.
  • storage_plugin_name: Name of the storage plugin to which this dataset belongs.
  • chunk_count: Number of chunks from this dataset cached.
  • percent_data_25: The more recent time at which 25% of the dataset was accessed.
  • percent_data_50: The more recent time at which 50% of the dataset was accessed.
  • percent_data_75: The more recent time at which 75% of the dataset was accessed.
  • percent_data_100: The more recent time at which 100% of the dataset was accessed.

sys.cache.objects

This table shows information about all the objects/files created on each mount point.

hostname plugin dataset path version offset atime
ip-172-31-6-122.us-west-2.compute.internal s31 s31."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_2_9.parquet 1529799187000 110100480 2019-09-11T09:58:51.753
ip-172-31-6-122.us-west-2.compute.internal s31 s31."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_2_9.parquet 1529799187000 145752064 2019-09-11T09:58:51.724
ip-172-31-6-122.us-west-2.compute.internal s33 s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_3_2.parquet 1529799199000 96468992 2019-09-11T10:07:07.919
ip-172-31-6-122.us-west-2.compute.internal s33 s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_5_12.parquet 1529799241000 36700160 2019-09-11T10:07:09.865
ip-172-31-6-122.us-west-2.compute.internal s33 s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_2_9.parquet 1529799187000 148897792 2019-09-11T10:07:09.624
ip-172-31-6-122.us-west-2.compute.internal s31 s31."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_5_12.parquet 1529799241000 110100480 2019-09-11T09:58:51.814
ip-172-31-6-122.us-west-2.compute.internal s33 s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_5_12.parquet 1529799241000 33554432 2019-09-11T10:07:09.806
ip-172-31-6-122.us-west-2.compute.internal s3 s3."datasets.dremio.com".tpch.sf10.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf10/parquet_typed_dict/lineitem/1_3_1.parquet 1529795646000 58720256 2019-09-16T05:49:46.137
ip-172-31-6-122.us-west-2.compute.internal s31 s31."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_2_9.parquet 1529799187000 149946368 2019-09-11T09:58:52.355
ip-172-31-6-122.us-west-2.compute.internal s33 s33."datasets.dremio.com".tpch.sf100.parquet_typed_dict.lineitem /datasets.dremio.com/tpch/sf100/parquet_typed_dict/lineitem/1_2_9.parquet 1529799187000 100663296 2019-09-11T10:07:08.319

whereas

  • hostname
  • plugin: Name of the storage plugin.
  • dataset: Dataset ID.
  • path: File path.
  • version: Version of the file cached.
  • offset: Offset of the file chunk cached.
  • atime: Last access time for this file chunk.

For More Information


results matching ""

    No results matching ""