An issue has been seen when a KDC kerberized Hortonworks cluster attempts to start Yarn services or any other services that leverage WebHDFS to start with OneFS 8.0.1. The incorrect generation of the the krb5.conf can leave the file without a READ permission for the services handling WebHDFS calls and authentication cannot occur leading to the services failing to start.
Symptoms
On starting the Yarn services Ambari makes a kerberized WebHDFS call to Isilon, this fails with a general http 401 error and no yarn service can start. (any service using a Kerberized WebHDFS call will show similar behavior)
The yarn service log looks similar to this:
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 305, in _get_file_status
list_status = self.util.run_command(target, ‘GETFILESTATUS’, method=’GET’, ignore_status_codes=[‘404’], assertable_result=False)
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 210, in run_command
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of ‘curl -sS -L -w ‘%{http_code}’ -X GET –negotiate -u : ‘http://rip2-horton1.foo.com:8082/webhdfs/v1/ats/done?op=GETFILESTATUS&user.name=hdfs” returned status_code=401.
<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”>
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn’t understand how to supply
the credentials required.</p>
</body></html>
If the Kerberized curl command is run interactively it still fails with the same http 401 error.
No error are logged in the Isilon hdfs log for this issue, in fact the only log entries are located in the Apache log in the form of.
From the apache error log:
2016-11-28T14:00:19-08:00 <18.3> rip2-horton1.foo.com httpd[76700]: [error] [client 10.111.223.200] gss_accept_sec_context() failed: Unspecified GSS failure. Minor code may provide more information (Permission denied)
2016-11-28T14:00:20-08:00 <18.3>rip2-horton1.foo.com httpd[76700]: [error] [client 10.111.223.200] gss_accept_sec_context() failed: Unspecified GSS failure. Minor code may provide more information (Permission denied)
The issue is caused by the permissions being incorrectly set on the /etc/krb5.conf files on cluster nodes. Since the WebHDFS calls are initially handled by Apache, which is running as a daemon user this user cannot read the /etc/krb5.conf and authentication fails before it has even started.
Resolution
Taking a look at the permissions on the /etc/krb5.conf file show that no permission exist on Other, with no read on Other the Apache user cannot read the file and authentication fails.
rip2-1# ls -le /etc/krb5.conf
-rw-r—– 1 root wheel 400 Aug 23 16:43 /etc/krb5.conf
In order to resolve this issue just add a READ permission on other for the krb5.conf on all nodes.
#isi_for_array chmod o+r /etc/krb5.conf
Review the updated permissions:
rip2-1# ls -le /etc/krb5.conf
-rw-r–r– 1 root wheel 400 Aug 23 16:43 /etc/krb5.conf
Restart the Yarn Services, now the Apache user can read the krb5.conf authentication can occur and service will start, you will now see Kerberos entries in the hdfs.log for these service start calls(you may need to increase logging level and view on the node the call is made to)
This issue has also been seen on 8.0.1 clusters that have upgrade from 8.0.0.x to 8.0.1.0, other configurations may also cause http 401 errors on service starts outside of this krb5.conf file permission issues. Other likely causes are incorrect permission on the hdfs root system directories but that is a topic for another blog.
KDC Kerberized Yarn Service Fail to Start on 8.0.1 with Ambari via WebHDFS curl calls