Configure Query Spotlight: Impala (RPM/DEB)
On This Page
Prerequisites
Before you begin configuring Query Spotlight for Impala query monitoring, ensure that your system meets the required prerequisites.
- Pepperdata must be installed on the host(s) to be configured for Query Spotlight.
- Your cluster uses a supported combination of Impala version and platform; see the entries for Query Spotlight 8.1.x in the table of Supported Impala-Distro Combinations by Query Spotlight Version.
Task 1: Enable Fetching of Impala Query Data
To enable Pepperdata to fetch data from the Impala query data, add the required variables to the Pepperdata configuration.
Procedure
-
On any coordinator—a host on which the Impala
impalad
daemon is running—open the host’s Pepperdata site file,pepperdata-site.xml
, for editing.By default, the Pepperdata site file,
pepperdata-site.xml
, is located in/etc/pepperdata
. If you customized the location, the file is specified by thePD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details. -
Add the properties to enable query monitoring, and (optionally) to configure a non-default location for where to read the query profiles.
By default, the PepAgent reads profiles of completed queries from
/var/log/impalad/profiles/
.-
To use the default location, omit the
pepperdata.impala.query.queryLogDir
property. -
To use a different location, add the
pepperdata.impala.query.queryLogDir
property, and be sure to substitute your location for theyour-impalad-profiles-location
placeholder.
<property> <name>pepperdata.impala.query.monitoring.enabled</name> <value>true</value> </property> <property> <name>pepperdata.impala.query.queryLogDir</name> <value>your-impalad-profiles-location</value> </property>
-
-
(HTTPS
impalad
daemon endpoints) If yourimpalad
daemon is configured for HTTPS instead of HTTP, add thepepperdata.agent.genericJsonFetch.impala.httpsEnabled
property so that the fetcher for information about Impala queries in flight uses the HTTPS endpoint instead of the default HTTP endpoint (http://LOCALHOST:25000/queries?json
).<property> <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name> <value>true</value> </property>
-
(Digest authentication for the Impala Web UI for debugging) If the
impalad
daemon for your Impala Web UI for debugging is secured by digest authentication, add the authentication credentials.Note: This step is for using digest auth to secure theimpalad
daemon of the Impala Web UI for debugging , not for securing the Impala core services with Kerberos or LDAP.Be sure to substitute your username and password for the
your-username
andyour-password
placeholders in the following code snippet.<property> <name>pepperdata.agent.genericJsonFetch.impala.http.authentication.type</name> <value>digest</value> </property> <property> <name>pepperdata.agent.genericJsonFetch.impala.auth.username</name> <value>your-username</value> </property> <property> <name>pepperdata.agent.genericJsonFetch.impala.auth.password</name> <value>your-password</value> </property> <property> <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name> <value>true</value> </property>
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
(Kerberos for the Impala Web UI for debugging) If the
impalad
daemon for your Impala Web UI for debugging is Kerberized, add the authentication credentials.Note: This step is for using Kerberos to secure theimpalad
daemon of the Impala Web UI for debugging , not for securing the Impala core services with Kerberos or LDAP.Be sure to substitute your Kerberos principal and the path of the corresponding keytab file for the
your-kerberos-principal
andyour-kerberos-keytab-pathname
placeholders in the following code snippet.If you already configured thePD_AGENT_PRINCIPAL
andPD_AGENT_KEYTAB_LOCATION
environment variables during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication), except to override the cluster-level assignments.
The fetcher properties (pepperdata.agent.genericJsonFetch.impala.kerberos.principal
andpepperdata.agent.genericJsonFetch.impala.keytab.location
) are inherited from the properties that were automatically assigned when you installed Pepperdata in the cluster.<property> <name>pepperdata.agent.genericJsonFetch.impala.http.authentication.type</name> <value>kerberos</value> </property> <property> <name>pepperdata.agent.genericJsonFetch.impala.kerberos.principal</name> <value>your-kerberos-principal</value> </property> <property> <name>pepperdata.agent.genericJsonFetch.impala.keytab.location</name> <value>your-kerberos-keytab-pathname</value> </property> <property> <name>pepperdata.agent.genericJsonFetch.impala.httpsEnabled</name> <value>true</value> </property>
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Restart the PepAgent.
You can use either the
service
(if provided by your OS) orsystemctl
command:sudo service pepagentd restart
sudo systemctl restart pepagentd
If any of the process’s startup checks fail, an explanatory message appears and the process does not start. Address the issues and try again to start the process.
Tip: Any time you modify the yaml rules file, you must reload the rules file by restarting PepAgent. -
Repeat steps 1–7 on every coordinator host in your cluster.
Important: Be sure to repeat steps 1–7 on every coordinator host. If you skip the configuration process on a coordinator host, Pepperdata is unable to collect metrics for queries that run on that host. -
Contact Pepperdata Support to request that Impala query metrics be activated for your Pepperdata dashboard.
Task 2: (Optional) Encrypt the Connect String for the Hive Metastore
If you want to encrypt the connect string for the Hive metastore, regardless of whether you’ll store it in the Pepperdata site file or an external file, use the Pepperdata password encryption script.
At a minimum, the unencrypted connect string must include the jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/
string.
You can add as many connection properties/parameters as you need for your environment, separating them with a semicolon, ;
.
Example Connect Strings
- Without properties/parameters:
jdbc:hive2://localhost:10000/
- Add properties for authenticated environments:
jdbc:hive2://localhost:10000/;user=YOUR-USERNAME;password=YOUR-PASSWORD
- Multiple properties/parameters:
jdbc:hive2://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_namespace>
Procedure
-
Run the Pepperdata encryption script.
/opt/pepperdata/supervisor/encrypt_password.sh
-
At the
Enter the password to encrypt:
prompt, enter your connect string. -
Copy (or make note of) the resulting encrypted connect string.
For example, in the following output from the script, the encrypted connect string is the string
W+ONY3ZcR6QLP5sqoRqcpA=2
.Encrypted password is W+ONY3ZcR6QLP5sqoRqcpA=2
Use this encrypted result as the value for pepperdata.jdbcfetch.hive.connect.string.encrypted
(which you’ll configure later), or store it in the external file specified by the pepperdata.jdbcfetch.hive.connect.string.encrypted.file
property.
Task 3: Enable Fetching of Hive Databases and Tables’ Metadata
To enable Pepperdata to fetch data from the Hive metastore, add the required variables to the Pepperdata configuration.
Procedure
-
On any of the hosts that are configured to be a Hive client (and from which you launch Hive queries), open the Pepperdata site file,
pepperdata-site.xml
, for editing.It’s sufficient to add the variables to a single Hive client host. But if you want to replicate the configuration on every host—perhaps to ease configuration management—that is okay, too.
By default, the Pepperdata site file,
pepperdata-site.xml
, is located in/etc/pepperdata
. If you customized the location, the file is specified by thePD_CONF_DIR
environment variable. See Change the Location of pepperdata-site.xml for details. -
Add the property to configure the hostname.
Be sure to substitute your fully-qualified, canonical hostname for the
YOUR.CANONICAL.HOSTNAME
placeholder in the following code snippet.<property> <name>pepperdata.jdbcfetch.hive.pepagent.host</name> <value>YOUR.CANONICAL.HOSTNAME</value> <description>Host where the fetching should be enabled.</description> </property>
-
Configure the connect string.
Add one of the following properties, depending on your environment and security requirements.
Be sure to substitute your information for the
YOUR...
placeholders.-
Plain text connect string stored in the Pepperdata site file.
At a minimum, the connect string must include the
jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/
string. You can add as many connection properties/parameters as you need for your environment, separating them with a semicolon,;
.Example Connect Strings
- Without properties/parameters:
jdbc:hive2://localhost:10000/
- Add properties for authenticated environments:
jdbc:hive2://localhost:10000/;user=YOUR-USERNAME;password=YOUR-PASSWORD
- Multiple properties/parameters:
jdbc:hive2://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hiveserver2_namespace>
<property> <name>pepperdata.jdbcfetch.hive.connect.string</name> <value>jdbc:hive2://YOUR-HOSTNAME:YOUR-PORTNUM/${;OPTIONAL-ADDITIONAL-PROPERTY}</value> <description>JDBC Connect string to be used.</description> </property>
- Without properties/parameters:
-
Plain text connect string stored in an external file:
<property> <name>pepperdata.jdbcfetch.hive.connect.string.file</name> <value>YOUR-PATH-TO-JDBCSTRING-FILE</value> <description>Path to file containing JDBC Connect string.</description> </property>
-
Encrypted connect string—the result from encrypting the string earlier in the configuration procedure—stored in the Pepperdata site file:
<property> <name>pepperdata.jdbcfetch.hive.connect.string.encrypted</name> <value>YOUR-ENCRYPTED-TEXT</value> <description>Encrypted JDBC Connect string to be used.</description> </property>
-
Encrypted connect string—the result from encrypting the string earlier in the configuration procedure—stored in an external file:
<property> <name>pepperdata.jdbcfetch.hive.connect.string.encrypted.file</name> <value>YOUR-PATH-TO-JDBCSTRING-FILE</value> <description>Path to file containing encrypted JDBC Connect string.</description> </property>
-
-
(Kerberized Clusters) If the
hiveserver2
service is Kerberized, add the properties for the Kerberos principal and keytab to the Pepperdata site file.-
Enable fetching from a Kerberized Hiveserver2.
<property> <name>pepperdata.jdbcfetch.hive.kerberos.enabled</name> <value>true</value> <description>Should kerberos be used when connecting to Hive?</description> </property>
-
Configure the principal and keytab.
If you already configured thePD_AGENT_PRINCIPAL
andPD_AGENT_KEYTAB_LOCATION
environment variables during the installation process (Task 4. (Kerberized clusters) Enable Kerberos Authentication), you do not need to configure them again, and you should skip this substep.Be sure to substitute your information for the
YOUR...
placeholders.<property> <name>pepperdata.jdbcfetch.hive.kerberos.principal</name> <value>YOUR_PRINICPAL/HOST@DOMAIN.COM</value> <description>The Kerberos principal to use to authenticate with the Hive client.</description> </property> <property> <name>pepperdata.jdbcfetch.hive.kerberos.keytab.location</name> <value>YOUR-PATH-TO-KEYTAB-FILE</value> <description>Path to the keytab file for the specified principal.</description> </property>
-
-
Validate the XML snippets that you added.
Malformed XML files can cause operational errors that can be difficult to debug. To prevent such errors, we recommend that you use a linter, such asxmllint
, after you edit any .xml configuration file. -
Save your changes and close the file.
-
Add the
hive-jdbc-*standalone.jar
JAR file to the PepAgent’s classpath on the host that you selected in step 1.-
Find the fully-qualified name of the JAR, which depends on the cluster’s distro.
-
The filename pattern is
hive-jdbc-*standalone.jar
. -
The location depends on the distro; for example, in Cloudera CDH/CDP Private Cloud Base, RPM/DEB installations, the path is
/usr/lib/hive/lib/
-
You can use the
find
command to locate all available JAR files, and output their names to the console; for example:find /opt/cloudera/parcels/CDH/jars/ /usr/lib/hive/lib/ /usr/lib/hive/jdbc/ -name "hive-jdbc-*standalone.jar" 2>/dev/null /usr/lib/hive/lib/hive-jdbc-standalone.jar
Make a note of the JAR file to use. You’ll need this information in the next substep, as the value for the
YOUR-HIVE-JDBC-JAR
placeholder. -
-
Open the Pepperdata configuration file,
/etc/pepperdata/pepperdata-config.sh
, for editing. -
Add the following variable.
Be sure to substitute the actual path and filename for the
YOUR-HIVE-JDBC-JAR
placeholder.export PD_EXTRA_CLASSPATH_ITEMS=YOUR-HIVE-JDBC-JAR
-
Save your changes and close the file.
-
-
Restart the PepAgent.
You can use either the
service
(if provided by your OS) orsystemctl
command:sudo service pepagentd restart
sudo systemctl restart pepagentd