Installing Pepperdata (RPM/DEB)
To install Pepperdata, first install the package or parcel for your distro/environment; next, open up listen ports as necessary; and then, optionally, reconfigure Pepperdata properties for settings such as Unix utility command locations. Repeat this installation process on every host in your cluster.
On This Page
Run all commands as the root
user.
Task 1: Install the Pepperdata Software
A single RPM/DEB package contains all the Pepperdata products.
Procedure
-
Obtain the appropriate Pepperdata <supervisor-package-name> RPM/DEB package. There are version-specific Pepperdata packages for some Hadoop versions. In such cases, the Pepperdata package name includes the Hadoop version number. See the Downloads page.
-
Depending on the management of the cluster, install the RPM/DEB package by running the appropriate command for your environment or by using site-specific administrative tools.
The table describes the locations of the Pepperdata files after you install the package. Except for the primary installation target, the locations are created by symlinks.
Directory Description /opt/pepperdata/supervisor-<your-version>
Primary installation target, containing many subdirectories and files /opt/pepperdata/lib/
JAR and library files /etc/init.d/
Initialization scripts /etc/pepperdata/
Configuration files, configuration templates, and site-specific configuration files
If the installation fails on any host, contact Pepperdata Support.
Task 2: Copy Configuration Template Files
Navigate to the etc/pepperdata
directory and copy the following configuration template files:
pepperdata-config.sh-template
->pepperdata-config.sh
pepperdata-site.xml-template
->pepperdata-site.xml
After you finish installing Pepperdata on all the hosts in your cluster, subsequent steps will explain how to edit the Pepperdata configuration file, pepperdata-config.sh
, to configure Pepperdata for your environment.
Task 3: Add the Pepperdata License
Copy the license.txt
file that we emailed to you to the license file location.
By default, the license file location is the /etc/pepperdata/
directory.
If you customized the license file location (Manage the License Key File), the directory is specified by the pepperdata.license.key.specification
property in pepperdata-site.xml
.
Be sure that the file permissions for license.txt
permit the PD_USER
user and YARN Resource Manager process to read the license file.
The permissions must be at least ----r--r--r
(0444
in octal or a+r
in symbolic notation).
(By default, the Pepperdata site file, pepperdata-site.xml
, is located in /etc/pepperdata
. If you customized the location, the file is specified by the PD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details.)
Task 4: (Kerberized Clusters) Enable Kerberos Authentication
If the core services of the ResourceManagers, the MapReduce Job History Server, and, for Tez support in Application Profiler, the YARN Timeline Server are Kerberized (secured with Kerberos), add the Kerberos principal and the path of the corresponding keytab file to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh
.
Prerequisites
- Be sure that the PepAgent user has read access to the keytab file.
(To determine the PepAgent user name, see the
PD_USER
entry in the Pepperdata configuration file,/etc/pepperdata/pepperdata-config.sh
.)
Procedure
-
(Optional) Create a new user principal and keytab file to use for Pepperdata.
Although you can reuse an existing principal and keytab file, best practice is to create a new one for Pepperdata. Separate users let you apply ACLs (access control lists) in accordance with your organization’s security policies. User principals, unlike service principals, do not include the hostname.
(Cloudera Manager) If the cluster configuration is managed by Cloudera Manager, the path to the keytab file is dynamic. In this case, copy the keytab file to
/etc/pepperdata/
, and use the copied, static file to enable Kerberos authentication. This is unnecessary if you are using Cloudera Parcels, a different configuration manager, or manually managing your cluster (even on clusters with a Cloudera CDH distro). -
Verify that the Kerberos principal and keytab file are valid.
-
Obtain and cache a Kerberos ticket-granting ticket by using the
kinit
command, which should return without error. Be sure to substitute your user name, realm name, and the location of your keytab file for the<your-kerberos-user>
,<your-realm-name>
, and<path-of-your-keytab-file>
placeholders.kinit <your-kerberos-user>@<your-realm-name> -kt <path-of-your-keytab-file>
-
Authenticate and connect by using the
curl --negotiate
command.Be sure to substitute your ResourceManager domain for the
resourcemanager.example.com
placeholder.-
For non-secured endpoints (HTTP):
curl -L --tlsv1.2 --negotiate -u : http://resourcemanager.example.com:8088
-
For secured endpoints (HTTPS):
curl -L --tlsv1.2 --negotiate -u : https://resourcemanager.example.com:8090
If you can connect, you’ve confirmed that the Kerberos principal and keytab file are valid. Otherwise, debug the connection failure.
-
-
-
Add the Kerberos principal and the path of the corresponding keytab file to the Pepperdata configuration.
-
Open the
/etc/pepperdata/pepperdata-config.sh
for editing. -
Add the required environment variables. Be sure to substitute your user name, realm name, and the location of your keytab file for the
your-kerberos-user
,your-realm-name
, andpath-of-your-keytab-file
placeholders.export PD_AGENT_PRINCIPAL=your-kerberos-user@your-realm-name export PD_AGENT_KEYTAB_LOCATION=path-of-your-keytab-file
Important: If your Kerberos principal contains the_HOST
macro expansion, it is replaced at runtime by the fully-qualified domain name of the host. For this replacement to work, reverse DNS must be working correctly on every host where the_HOST
macro is configured. -
Save your changes and close the file.
-
-
(Hadoop clusters with YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros), add authentication properties to the Pepperdata configuration to enable REST access.
Note: If you will be configuring Application Profiler, you can add these authentication properties now or during the configuration process.If you’re installing Pepperdata on a cluster without Hadoop, such as a Kafka-only cluster for Streaming Spotlight, skip this step.-
On the ResourceManager host, open the Pepperdata site file,
pepperdata-site.xml
, for editing.By default, the Pepperdata site file,
pepperdata-site.xml
, is located in/etc/pepperdata
. If you customized the location, the file is specified by thePD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details. -
Add the required properties.
Be sure to substitute your HTTP service policy—
HTTP_ONLY
orHTTPS_ONLY
—for theyour-http-service-policy
placeholder in the following code snippet.For Kerberized clusters, the HTTP service policy is usually
HTTPS_ONLY
. But you should check with your cluster administrator or look for the value of theyarn.http.policy
property in the cluster’syarn-site.xml
file or the Hadoop configuration.<property> <name>pepperdata.agent.yarn.http.authentication.type</name> <value>kerberos</value> </property> <property> <name>pepperdata.agent.yarn.http.policy</name> <value>your-http-service-policy</value> </property>
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Task 5: (Rarely Required) Open Port for Listening
PepAgents listen on port 50505
, whether they’re running on ResourceManager hosts, as we recommend, or on NodeManager hosts.
In most environments this port is available for use and is not blocked by internal firewalls. However, in rare situations you might need to open/unblock this port or reconfigure which port Pepperdata uses.
50505
is used by another service, you can reconfigure which port Pepperdata uses by redefining the pepperdata.agent.rpc.server.port
property in the Pepperdata site file, pepperdata-site.xml
.• After you reconfigure the
pepperdata.agent.rpc.server.port
property (default=50505), restart the PepAgents.(By default, the Pepperdata site file,
pepperdata-site.xml
, is located in /etc/pepperdata
. If you customized the location, the file is specified by the PD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details.)To enable SSL support, see Configure SSL Near Real-Time Monitoring on Port 50505.
For information about accessing the stats that are provided via the Web servlets associated with this port, with either HTTP or SSL-secured HTTPS communication, see Pepperdata Status Views via Web Servlets.
Next: Configuring Pepperdata