Preparing the Cloud Environment
To prepare a cloud environment for Pepperdata, set up the required buckets and folders; obtain the appropriate Pepperdata package, extract it, and upload the contents to the folder you create in your bucket; and upload the Pepperdata configuration and license files. After you finish the preparation steps, you’ll configure your Pepperdata products.
In cloud, Pepperdata enables graceful shutdown of ephemeral clusters by default. This ensures following two things:
-
There are no metric data gaps at the end of the life of a node in an ephemeral cluster.
-
If autoscaling optimization was enabled on the cluster, the autoscaling policies are restored to their original settings (requires Pepperdata Supervisor v8.1.2 or later).
On This Page
Run all commands as the root
user.
Prerequisites
-
Ensure that your cloud platform is supported; see the entries for Pepperdata 8.1.x in the table of Supported Platforms by Pepperdata Version.
-
Ensure that your cloud environment is configured for read access to buckets by all cluster nodes. Read access is required so that the cluster’s bootstrap script can access the Pepperdata installation packages and configuration.
-
(EMR) When clusters are created, they must be configured to include Hadoop as an application.
-
Before you install Pepperdata, you must decide whether to install into an existing/running cluster or a new cluster.
There are several factors to consider.
-
When you install Pepperdata into an existing/running cluster, you must separately install and activate Pepperdata on every already-running host, which can be a time-consuming process if there are more than just a few hosts.
-
To install Pepperdata into an existing/running cluster, every currently-running host in the cluster must already have an initialization (bootstrap) script.
If there is no initialization script, you must destroy the cluster and re-create it so that every host has an initialization script. The script can be empty or you can follow the procedure for activating Pepperdata on a new cluster; in Install Pepperdata (Cloud), see the procedure for your environment.
-
Installing Pepperdata into a new cluster means that you do not have to install Pepperdata on individual hosts.
-
If you have cluster management functions that are unrelated to Pepperdata (such as certificate management), it’s easier to install Pepperdata into an existing/running cluster because there’s already an initialization (bootstrap) script that you can edit to add a call the Pepperdata bootstrap script.
-
If you want to install Pepperdata into a new cluster, and want to invoke non-Pepperdata cluster management functions (either now or in the future), you can create a “helper bootstrap script” to invoke those functions and call the Pepperdata bootstrap script.
-
If you will be configuring autoscaling optimization in an EMR environment, you must install Pepperdata into a new cluster. (There is no support for adding autoscaling optimization in an EMR environment to an existing/running cluster.)
-
Task 1: Upload the Pepperdata Software
Click the tab for your cloud environment, and perform the procedure.
- EMR
-
Set up the required bucket and folders in your Amazon S3 execution environment.
-
In your Amazon S3 execution environment, create a bucket for Pepperdata.
For EMR 5.32.0 and EMR 6.2.0, the bucket name cannot include any dot characters (
.
). Otherwise, you can name it anything, so long as you adhere to the AWS bucket naming rules. This documentation refers to the bucket as<my-bucket>
. -
In the Pepperdata bucket, create folders for
install-packages
andconfig
. Do not use any other names for these folders.s3://<my-bucket>/install-packages
s3://<my-bucket>/config
-
In the
config
folder, create a folder for the cluster configuration.Important: Name the folder with the same name as your cluster name, which must match the cluster name used in the Pepperdata license file and the bootstrap script.For example, if your cluster is named,
my-cluster
, the new folder would be:s3://<my-bucket>/config/my-cluster
This folder is referred to as the cluster configuration folder in the rest of the installation and configuration procedures.
-
-
Obtain the appropriate Pepperdata package; the filename ends in
-rpm-cloud.tgz
. See the Downloads page. -
Extract the package contents of the TGZ archive to any local location.
-
Upload the base directory and all its files and subfolders to the
install-packages
folder that you created (s3://<my-bucket>/install-packages
).-
The base directory is
supervisor-X.Y.Z-<distribution>
, where<distribution>
is the final part of the package name, without the file type; for example,supervisor-X.Y.Z-H26_YARN2_A
. -
In addition to the
emr/
contents that you’ll use, the package contents include files for other cloud environments. You can ignore those files or delete them. -
You can store multiple versions of Pepperdata in the
/install-packages
folder.
-
- Dataproc
-
Set up the required bucket and folders in your GDP environment.
-
In your GDP environment, create a bucket for Pepperdata.
You can name it anything.This documentation refers to the bucket as
<my-bucket>
. -
In the Pepperdata bucket, create folders for
install-packages
andconfig
. Do not use any other names for these folders.gs://<my-bucket>/install-packages
gs://<my-bucket>/config
-
In the
config
folder, create a folder for the cluster configuration.Important: Name the folder with the same name as your cluster name, which must match the cluster name used in the Pepperdata license file and the bootstrap script.For example, if your cluster is named,
my-cluster
, the new folder would be:gs://<my-bucket>/config/my-cluster
This folder is referred to as the cluster configuration folder in the rest of the installation and configuration procedures.
-
-
Obtain the appropriate Pepperdata package; the filename ends in
-deb-cloud.tgz
. See the Downloads page. -
Extract the package contents of the TGZ archive to any local location.
-
Upload the base directory and all its files and subfolders to the
install-packages
folder that you created (gs://<my-bucket>/install-packages
).-
The base directory is
supervisor-X.Y.Z-<distribution>
, where<distribution>
is the final part of the package name, without the file type; for example,supervisor-X.Y.Z-H26_YARN2_A
. -
In addition to the
dataproc/
contents that you’ll use, the package contents include files for other cloud environments. You can ignore those files or delete them. -
You can store multiple versions of Pepperdata in the
/install-packages
folder.
-
Procedure
Procedure
Task 2: Copy Configuration Template Files
Click the tab for your Pepperdata installation/cluster manager/environment, and perform the procedure.
- EMR
-
Copy the
config-template/pepperdata-config.sh-template
andconfig-template/pepperdata-site.xml-template
configuration files from the extracted contents of the Pepperdata package to any local location, and rename them to remove the-template
suffix.pepperdata-config.sh-template
->pepperdata-config.sh
pepperdata-site.xml-template
->pepperdata-site.xml
These files are referred to as the cluster-level Pepperdata configuration file and the cluster-level Pepperdata site file, respectively, in the rest of the installation and configuration procedures.
-
Upload the
pepperdata-config.sh
andpepperdata-site.xml
files to the cluster configuration folder in your Amazon S3 execution environment (s3://<my-bucket>/config/my-cluster
).Continuing with our
my-cluster
example, the files would be:s3://<my-bucket>/config/my-cluster/pepperdata-config.sh
s3://<my-bucket>/config/my-cluster/pepperdata-site.xml
- Dataproc
-
Copy the
config-template/pepperdata-config.sh-template
andconfig-template/pepperdata-site.xml-template
configuration files from the extracted contents of the Pepperdata package to any local location, and rename them to remove the-template
suffix.pepperdata-config.sh-template
->pepperdata-config.sh
pepperdata-site.xml-template
->pepperdata-site.xml
These files are referred to as the cluster-level Pepperdata configuration file and the cluster-level Pepperdata site file, respectively, in the rest of the installation and configuration procedures.
-
Upload the
pepperdata-config.sh
andpepperdata-site.xml
files to the cluster configuration folder that you created in your GDP environment (gs://<my-bucket>/config/my-cluster
).Continuing with our
my-cluster
example, the files would be:gs://<my-bucket>/config/my-cluster/pepperdata-config.sh
gs://<my-bucket>/config/my-cluster/pepperdata-site.xml
Procedure
Procedure
Task 3: Add the Pepperdata License
Click the tab for your cloud environment, and perform the procedure.
- EMR
- Dataproc
Copy the license.txt
file that we emailed to you to the cluster configuration folder in your Amazon S3 execution environment.
Continuing with our my-cluster
example, the file would be:
s3://<my-bucket>/config/my-cluster/license.txt
Copy the license.txt
file that we emailed to you to the cluster configuration folder in your GDP environment.
Continuing with our my-cluster
example, the file would be:
gs://<my-bucket>/config/my-cluster/license.txt
Task 4: (Kerberized Clusters) Enable Kerberos Authentication
If the core services of the ResourceManagers, the MapReduce Job History Server, and, for Tez support in Application Profiler, the YARN Timeline Server are Kerberized (secured with Kerberos), add the Kerberos principal and the path of the corresponding keytab file to the Pepperdata configuration file, /etc/pepperdata/pepperdata-config.sh
.
Prerequisites
- Be sure that the PepAgent user has read access to the keytab file.
(To determine the PepAgent user name, see the
PD_USER
entry in the Pepperdata configuration file,pepperdata-config.sh
.)
Procedure
-
(Optional) Create a new user principal and keytab file to use for Pepperdata.
Although you can reuse an existing principal and keytab file, best practice is to create a new one for Pepperdata. Separate users let you apply ACLs (access control lists) in accordance with your organization’s security policies. User principals, unlike service principals, do not include the hostname.
-
Verify that the Kerberos principal and keytab file are valid.
-
Obtain and cache a Kerberos ticket-granting ticket by using the
kinit
command, which should return without error. Be sure to substitute your user name, realm name, and the location of your keytab file for the<your-kerberos-user>
,<your-realm-name>
, and<path-of-your-keytab-file>
placeholders.kinit <your-kerberos-user>@<your-realm-name> -kt <path-of-your-keytab-file>
-
Authenticate and connect by using the
curl --negotiate
command.Be sure to substitute your ResourceManager domain for the
resourcemanager.example.com
placeholder.-
For non-secured endpoints (HTTP):
curl -L --tlsv1.2 --negotiate -u : http://resourcemanager.example.com:8088
-
For secured endpoints (HTTPS):
curl -L --tlsv1.2 --negotiate -u : https://resourcemanager.example.com:8090
If you can connect, you’ve confirmed that the Kerberos principal and keytab file are valid. Otherwise, debug the connection failure.
-
-
-
Add the Kerberos principal and the path of the corresponding keytab file to the Pepperdata configuration.
-
Download a copy of your existing cluster-level Pepperdata configuration file,
pepperdata-config.sh
, from the environment’s cluster configuration folder (in the cloud) to a location where you can edit it. -
Open the file for editing, and add the required environment variables. Be sure to substitute your user name, realm name, and the location of your keytab file for the
your-kerberos-user
,your-realm-name
, andpath-of-your-keytab-file
placeholders.export PD_AGENT_PRINCIPAL=your-kerberos-user@your-realm-name export PD_AGENT_KEYTAB_LOCATION=path-of-your-keytab-file
Important: If your Kerberos principal contains the_HOST
macro expansion, it is replaced at runtime by the fully-qualified domain name of the host. For this replacement to work, reverse DNS must be working correctly on every host where the_HOST
macro is configured. -
Save your changes and close the file.
-
Upload the revised file to overwrite the original
pepperdata-config.sh
file.
-
-
(YARN 3.x) For YARN 3.x environments (which typically align with Hadoop 3.x-based distros such as EMR 6.x), add authentication properties to the Pepperdata configuration to enable REST access.
Note: If you will be configuring Application Profiler, you can add these authentication properties now or during the configuration process.-
Log in to the ResourceManager host, and download a copy of the host’s existing Pepperdata site file,
pepperdata-site.xml
, from the environment’s cluster configuration folder (in the cloud) to a location where you can edit it. -
Open the file for editing, and add the required properties.
Be sure to substitute your HTTP service policy—
HTTP_ONLY
orHTTPS_ONLY
—for theyour-http-service-policy
placeholder in the following code snippet.For Kerberized clusters, the HTTP service policy is usually
HTTPS_ONLY
. But you should check with your cluster administrator or look for the value of theyarn.http.policy
property in the cluster’syarn-site.xml
file or the Hadoop configuration.<property> <name>pepperdata.agent.yarn.http.authentication.type</name> <value>kerberos</value> </property> <property> <name>pepperdata.agent.yarn.http.policy</name> <value>your-http-service-policy</value> </property>
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Upload the revised file to overwrite the original
pepperdata-site.xml
file.
-
Task 5: (Rarely Required) Open Port for Listening
PepAgents listen on port 50505
, whether they’re running on ResourceManager hosts, as we recommend, or on NodeManager hosts.
In most environments this port is available for use and is not blocked by internal firewalls. However, in rare situations you might need to open/unblock this port or reconfigure which port Pepperdata uses.
50505
is used by another service, you can reconfigure which port Pepperdata uses by redefining the pepperdata.agent.rpc.server.port
property in the Pepperdata site file, pepperdata-site.xml
.• After you reconfigure the
pepperdata.agent.rpc.server.port
property (default=50505), restart the PepAgents.To enable SSL support, see Configure SSL Near Real-Time Monitoring on Port 50505.
For information about accessing the stats that are provided via the Web servlets associated with this port, with either HTTP or SSL-secured HTTPS communication, see Pepperdata Status Views via Web Servlets.
Next: Configuring Pepperdata