Configuring RecordService

Client Configurations

Although you should not need to change the default configuration, you can modify RecordService properties.

To change any of the setting listed below:

  1. In Cloudera Manager, navigate to the RecordService configuration page.
  2. Search for Safety Valve.
  3. In the search results, look for RecordService (Beta) Client Advanced Configuration Snippet (Safety Valve) for recordservice-conf/recordservice-site.xml.
  4. Add or change the value in the field provided. For example, to change recordservice.task.fetch.size to 1000, add the following code:
    <property>
      <name>recordservice.task.fetch.size</name>
      <value>1000</value>
    </property>
    
  5. Click Save Changes.
  6. From the Actions menu, choose Deploy Client Configuration.

For more information, see Modifying Configuration Properties Using Cloudera Manager.

You can adjust the following configuration settings in your RecordService instance.

CATEGORYPARAMETERDESCRIPTION DEFAULT VALUE
Connectivityrecordservice.planner.hostportsComma separated list of planner service host/ports.localhost:12050
Connectivityrecordservice.kerberos.principalKerberos principal for the planner service. Required if using Kerberos.
Connectivityrecordservice.planner.retry.attemptsMaximum number of attempts to retry RecordService RPCs with Planner.3
Connectivityrecordservice.planner.retry.sleepMsSleep between retry attempts with Planner in milliseconds.5000
Connectivityrecordservice.planner.connection.timeoutMsTimeout when connecting to the Planner service in milliseconds.30000
Connectivityrecordservice.planner.rpc.timeoutMsTimeout for Planner RPCs in milliseconds.120000
Connectivityrecordservice.worker.retry.attemptsMaximum number of attempts to retry RecordService RPCs with a worker.3
Connectivityrecordservice.worker.retry.sleepMsSleep in milliseconds between retry attempts with worker.5000
Connectivityrecordservice.worker.connection.timeoutMsTimeout when connecting to the worker in milliseconds.10000
Connectivityrecordservice.worker.rpc.timeoutMsTimeout for Worker RPCs in milliseconds.120000
Performancerecordservice.task.fetch.sizeConfigures the maximum number of records returned when fetching results from the RecordService. If not set, the server default is used.

You might need to adjust this value according to the type of workload (for example, MapReduce or Spark), due to differences in data processing speed.
5000
Resource Managementrecordservice.task.memlimit.bytesMaximum memory the server uses per task. Tasks exceeding this limit are aborted. If not set, the server process limit is used.-1 (Unlimited)
Resource Managementrecordservice.task.plan.maxTasksHint for maximum number of tasks to generate per PlanRequest. This is not strictly enforced by the server, but is used to determine if task combining should occur. This value might need to be set for large datasets.-1 (Unlimited)
Resource Management (Advanced)recordservice.task.records.limitMaximum number of records returned per task.-1 (Unlimited)
Logging (Advanced)recordservice.worker.server.enableLoggingEnable server logging (logging level from Log4j).FALSE

Server Configurations

The properties listed on the Cloudera Manager RecordService Configuration page are the ones Cloudera considers the most reasonable to change. However, adjusting these values should not be necessary. Very advanced administrators might consider making minor adjustments.

Dynamic Fetch Size

The following properties allow you to adjust dynamic fetch size on the server.

CATEGORYPARAMETERDESCRIPTION DEFAULT VALUE
Resource Management rs_compressed_max_fetch_size Maximum fetch size when scanning compressed text files. 1000
Resource Management rs_fetch_size_decrease_factor Correction factor to decrease fetch size; must be >= 1. 1.5
Resource Management rs_fetch_size_increase_factor Correction factor to increase fetch size; must be > 0 and <= 1. 0.001
Resource Management rs_min_fetch_size The minimum fetch size for the scanner thread. 500
Resource Management rs_spare_capacity_correction_factor Correction factor for spare capacity; must be > 0 and <= 1. 0.8

Kerberos Configuration

No special configuration is required through Cloudera Manager. Enabling Kerberos on the cluster configures everything required.

Sentry Table Configuration

Sentry is configured for you in the RecordService VM. This section describes how to configure Sentry in a non-VM deployment.

Prerequisite

Follow CDH documentation to install Sentry and enable it for Hive (and Impala, if applicable).

See http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/sg_sentry_service_install.html.

Configure Sentry with RecordService

  1. Enable RecordService to read policy metadata from Sentry:
    1. In Cloudera Manager, navigate to the Sentry Configuration page.
    2. In Admin Groups, add the user recordservice.
    3. In Allowed Connecting Users, add the user recordservice.
  2. Save changes.
  3. Enable Sentry for RecordService.
    1. In Cloudera Manager, navigate to RecordService Configuration.
    2. Select the Sentry-1 service.
    3. If you are using Cloudera Manager 5.7 with RecordService 0.3.0, enter the following settings in the Sentry Advanced Configuration Snippet (Safety Valve) field.
    <property>
        <name>hive.sentry.server</name>
        <value>server1</value>
    </property>
    

    If you are using a version of Cloudera Manager lower than 5.7, enter the following settings in the Sentry Advanced Configuration Snippet (Safety Valve) field.

    <property>
        <name>sentry.service.server.principal</name>
        <value>sentry/_HOST@principal</value>
    </property>
    
    <property>
        <name>sentry.service.security.mode</name>
        <value>kerberos</value>
    </property>
    
    <property>
        <name>sentry.service.client.server.rpc-address</name>
        <value>hostname</value>
    </property>
    
    <property>
        <name>sentry.service.client.server.rpc-port</name>
        <value>portnum</value>
    </property>
    
    <property>
        <name>hive.sentry.server</name>
        <value>server1</value>
    </property>
    
  4. Save your changes.
  5. Restart the Sentry and RecordService services.

Delegation Token Configuration

No special configuration is required with Cloudera Manager. This is enabled automatically if the cluster is Kerberized.

RecordService persists state in ZooKeeper, by default, under the /recordservice ZooKeeper directory. If this directory is already in use, you can configure the directory with recordservice.zookeeper.znode. This is a Hadoop XML configuration that you can add to the advanced service configuration snippet.

Planner Auto Discovery Configuration

RecordService 0.3.0 and higher includes the Planner Auto Discovery feature. You do not need to specify a list of planner host/ports for your RecordService clients through the configuration property recordservice.planner.hostports. Instead, you can use the property recordservice.zookeeper.connectString, which specifies the connection string to the ZooKeeper session used to keep store information about planner/worker membership (as well as other information, such as delegation tokens). Both the client and the server use this property.

Planner Auto Discovery allows client-side applications independent of changes in the planner configuration. Planners might come and go in the cluster, but the client-side application uses the same configuration settings.

If you use Cloudera Manager to manage the cluster, this property is automatically populated to the client side configurations through the CSD.

Setting the property enables Planner Auto Discovery. A RecordService job first contacts ZooKeeper to fetch a list of available RecordService planners, and then uses those resources for planning. If this step fails, the job reads recordservice.planner.hostports and uses static membership information.

Additional properties provide tuning options.

Property Description
recordservice.zookeeper.connectTimeoutMillis Specifies a timeout when initiating a ZooKeeper connection
recordservice.zookeeper.znode Specifies the root ZooKeeper directory. Default is /recordservice

Both the client and the server use these properties.