Although you should not need to change the default configuration, you can modify RecordService properties.
To change any of the setting listed below:
- In Cloudera Manager, navigate to the RecordService configuration page.
- In the search results, look for RecordService (Beta) Client Advanced Configuration Snippet (Safety Valve) for recordservice-conf/recordservice-site.xml.
- Add or change the value in the field provided. For example, to change
1000, add the following code:
<property> <name>recordservice.task.fetch.size</name> <value>1000</value> </property>
- Click Save Changes.
- From the Actions menu, choose Deploy Client Configuration.
For more information, see Modifying Configuration Properties Using Cloudera Manager.
You can adjust the following configuration settings in your RecordService instance.
|Connectivity||recordservice.planner.hostports||Comma separated list of planner service host/ports.||localhost:12050|
|Connectivity||recordservice.kerberos.principal||Kerberos principal for the planner service. Required if using Kerberos.|
|Connectivity||recordservice.planner.retry.attempts||Maximum number of attempts to retry RecordService RPCs with Planner.||3|
|Connectivity||recordservice.planner.retry.sleepMs||Sleep between retry attempts with Planner in milliseconds.||5000|
|Connectivity||recordservice.planner.connection.timeoutMs||Timeout when connecting to the Planner service in milliseconds.||30000|
|Connectivity||recordservice.planner.rpc.timeoutMs||Timeout for Planner RPCs in milliseconds.||120000|
|Connectivity||recordservice.worker.retry.attempts||Maximum number of attempts to retry RecordService RPCs with a worker.||3|
|Connectivity||recordservice.worker.retry.sleepMs||Sleep in milliseconds between retry attempts with worker.||5000|
|Connectivity||recordservice.worker.connection.timeoutMs||Timeout when connecting to the worker in milliseconds.||10000|
|Connectivity||recordservice.worker.rpc.timeoutMs||Timeout for Worker RPCs in milliseconds.||120000|
|Performance||recordservice.task.fetch.size||Configures the maximum number of records returned when fetching results from the RecordService. If not set, the server default is used. |
You might need to adjust this value according to the type of workload (for example, MapReduce or Spark), due to differences in data processing speed.
|Resource Management||recordservice.task.memlimit.bytes||Maximum memory the server uses per task. Tasks exceeding this limit are aborted. If not set, the server process limit is used.||-1 (Unlimited)|
|Resource Management||recordservice.task.plan.maxTasks||Hint for maximum number of tasks to generate per PlanRequest. This is not strictly enforced by the server, but is used to determine if task combining should occur. This value might need to be set for large datasets.||-1 (Unlimited)|
|Resource Management (Advanced)||recordservice.task.records.limit||Maximum number of records returned per task.||-1 (Unlimited)|
|Logging (Advanced)||recordservice.worker.server.enableLogging||Enable server logging (logging level from Log4j).||FALSE|
The properties listed on the Cloudera Manager RecordService Configuration page are the ones Cloudera considers the most reasonable to change. However, adjusting these values should not be necessary. Very advanced administrators might consider making minor adjustments.
Dynamic Fetch Size
The following properties allow you to adjust dynamic fetch size on the server.
|Resource Management||rs_compressed_max_fetch_size||Maximum fetch size when scanning compressed text files.||1000|
|Resource Management||rs_fetch_size_decrease_factor||Correction factor to decrease fetch size; must be >= 1.||1.5|
|Resource Management||rs_fetch_size_increase_factor||Correction factor to increase fetch size; must be > 0 and <= 1.||0.001|
|Resource Management||rs_min_fetch_size||The minimum fetch size for the scanner thread.||500|
|Resource Management||rs_spare_capacity_correction_factor||Correction factor for spare capacity; must be > 0 and <= 1.||0.8|
No special configuration is required through Cloudera Manager. Enabling Kerberos on the cluster configures everything required.
Sentry Table Configuration
Sentry is configured for you in the RecordService VM. This section describes how to configure Sentry in a non-VM deployment.
Follow CDH documentation to install Sentry and enable it for Hive (and Impala, if applicable).
Configure Sentry with RecordService
- Enable RecordService to read policy metadata from Sentry:
- In Cloudera Manager, navigate to the Sentry Configuration page.
- In Admin Groups, add the user recordservice.
- In Allowed Connecting Users, add the user recordservice.
- Save changes.
- Enable Sentry for RecordService.
- In Cloudera Manager, navigate to RecordService Configuration.
- Select the Sentry-1 service.
- If you are using Cloudera Manager 5.7 with RecordService 0.3.0, enter the following settings in the Sentry Advanced Configuration Snippet (Safety Valve) field.
<property> <name>hive.sentry.server</name> <value>server1</value> </property>
If you are using a version of Cloudera Manager lower than 5.7, enter the following settings in the Sentry Advanced Configuration Snippet (Safety Valve) field.
<property> <name>sentry.service.server.principal</name> <value>sentry/_HOST@principal</value> </property> <property> <name>sentry.service.security.mode</name> <value>kerberos</value> </property> <property> <name>sentry.service.client.server.rpc-address</name> <value>hostname</value> </property> <property> <name>sentry.service.client.server.rpc-port</name> <value>portnum</value> </property> <property> <name>hive.sentry.server</name> <value>server1</value> </property>
- Save your changes.
- Restart the Sentry and RecordService services.
Delegation Token Configuration
No special configuration is required with Cloudera Manager. This is enabled automatically if the cluster is Kerberized.
RecordService persists state in ZooKeeper, by default, under the
/recordservice ZooKeeper directory. If this directory is already in use, you can configure the directory with
recordservice.zookeeper.znode. This is a Hadoop XML configuration that you can add to the advanced service configuration snippet.
Planner Auto Discovery Configuration
RecordService 0.3.0 and higher includes the Planner Auto Discovery feature.
You do not need to specify a list of planner host/ports for your RecordService clients through the configuration property
recordservice.planner.hostports. Instead, you can use the property
recordservice.zookeeper.connectString, which specifies the connection string to the ZooKeeper session used to keep store information about planner/worker membership (as well as other information, such as delegation tokens). Both the client and the server use this property.
Planner Auto Discovery allows client-side applications independent of changes in the planner configuration. Planners might come and go in the cluster, but the client-side application uses the same configuration settings.
If you use Cloudera Manager to manage the cluster, this property is automatically populated to the client side configurations through the CSD.
Setting the property enables Planner Auto Discovery. A RecordService job first contacts ZooKeeper to fetch a list of available RecordService planners, and then uses those resources for planning. If this step fails, the job reads
recordservice.planner.hostports and uses static membership information.
Additional properties provide tuning options.
||Specifies a timeout when initiating a ZooKeeper connection|
||Specifies the root ZooKeeper directory. Default is
Both the client and the server use these properties.