0019-Yarn的JobHistory目录权限问题导致MapReduce作业异常

温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。

1.问题描述

Hive的MapReduce作业无法正常运行，日志如下：

0: jdbc:hive2://localhost:10000>select count(*) from student;

…

command(queryId=hive_20170902081616_d676f921-c62c-4fac-84b9-272663a2fca0); Timetaken: 10.029 seconds

Error: Error while processing statement: FAILED: Execution Error,return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

0: jdbc:hive2://localhost:10000>

MapRedecu作业无法正常运行，日志如下：

[root@ip-172-31-6-148 hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 5 5
...
Diagnostics: Exception from container-launch.
Container id: container_1504338960864_0005_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
 at org.apache.hadoop.util.Shell.run(Shell.java:504)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
 at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
17/09/02 08:19:36 INFO mapreduce.Job: Counters: 0
Job Finished in 8.452 seconds
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-6-148:8020/user/root/QuasiMonteCarlo_1504340365604_1994724640/out/reduce-out
 at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
 at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
 at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1820)
 at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
 at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
 at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
[root@ip-172-31-6-148 hadoop-mapreduce]#

通过JobHistory页面无法查看作业的日志：

2.问题分析

1.查看Yarn的ResourceManager日志，无法正常创建Container，异常如下：

Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
 at org.apache.hadoop.util.Shell.run(Shell.java:504)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
 at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
…
Container id: container_1504341269835_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
 at org.apache.hadoop.util.Shell.run(Shell.java:504)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
 at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

2.查看NodeManager节点日志，异常日志如下：

2017-09-02 08:37:35,317 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1504341269835_0001_01_000001 and exit code: 1
ExitCodeException exitCode=1: 
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
 at org.apache.hadoop.util.Shell.run(Shell.java:504)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
 at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1504341269835_0001_01_000001

3.查看JobHistory服务的log日志

2017-09-02 08:40:31,676 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move intermediate done files
2017-09-02 08:40:32,880 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:PROXY) via mapred (auth:SIMPLE) cause:java.io.FileNotFoundException:
File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summary
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
 at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
2017-09-02 08:40:32,882 WARN org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService: Could not process job files
java.io.FileNotFoundException: File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summary
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
 at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

4.查看HDFS的Namenode日志，异常如下：

2017-09-02 08:37:29,445 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/root/.staging/job_1504341269835_0001/job.xml is closed by DFSClient_NONMAPREDUCE_478129775_1
2017-09-02 08:37:29,451 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.10.118:50010 is added to blk_1073744484_3660 size 106954
2017-09-02 08:37:35,265 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: P
ermission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:35,265 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.5.190:46293 Call#5 
Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:40,188 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: P
ermission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:40,188 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.10.118:49343 Call#5
 Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---
2017-09-02 08:37:41,200 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/fail/root_appattempt_1504341269835_0001_000002 is closed by DFSClient_NONMAPREDUCE_-
860670620_215
2017-09-02 08:37:41,276 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073744476_3652 172.31.10.118:50010 172.31.9.33:50010 172.31.5.190:50010

分析过程：

查看ResourceManager日志未发现原因
查看NodeManager日志未发现原因
JobHistory日志无法正常查看，由于MapReduce作业先在(/user/xxx用户/xxxJob)目录下创建临时日志文件，然后将日志文件移至/user/history目录。
查看HDFS的NameNode日志，作业产生的临时日志文件无法正常写入/user/history目录
问题原因是由于HDFS的/user/history目录权限低，导致Yarn作业日志无法记录

3.解决方法

修改/user/history目录的权限及属主

sudo -u hdfs hadoop dfs -chmod 777 /user/history
sudo –u hdfs hadoop dfs –chown mapred:hadoop /user/history

修改权限前

修改权限后，数据正常写入，MapReduce任务正常

醉酒鞭名马，少年多浮夸！岭南浣溪沙，呕吐酒肆下！挚友不肯放，数据玩的花！
温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。

欢迎关注Hadoop实操，第一时间，分享更多Hadoop干货，喜欢请关注分享。

原创文章，欢迎转载，转载请注明：转载自微信公众号Hadoop实操

相关推荐

在.net core中使用nginx做负载均衡

React 18 超全升级指南

LTUI v1.7 发布，一个基于 Lua 的跨平台字符终端 UI 界面库

生成对抗网络(GAN)的半监督学习

《若依ruoyi》第二十五章:Spring boot 上传下载封装详解二

QT进阶之路 : 布局详解

如何在 Element UI 中使用栅格布局实现响应式设计?

IDEA 中 Jetty 的配置操作手册 idea jfinal

如何在Dify平台上创建智能Agent:一步步教你实现超级智能体搭建

k8s自动化运维三