TroubleShooting

[Tuxedo] SHM 모드에서 BBL의 상태가 DEAD 일 때

최선을 다하자! 2022. 12. 8. 14:50

잘 띄워진 BBL을 kill -9 로 죽였다.

 

 

psr로 확인하니 DEAD 상태가 나왔다.

 

psr 과 psc 커맨드는 들어먹지만 pclt 커맨드는 사용되지 않는다.

BBL의 관리하에 WSH가 움직이기 때문이다.

 

 

BBL이 죽으면 tmshutdown 도 되지 않는다.

master node로 해당 명령어를 사용하라는 로그만 나온다.

>tmshutdown -cy
Shutting down all admin and server processes in /ofm/jwchoi/sw2/tp/tuxedo12.2.2.0.0/samples/atmi/simpapp/tuxconfig

tmshutdown: internal error: CMDTUX_CAT:766: ERROR: must run on master node

 >tmshutdown -k KILL
Shutdown all admin and server processes? (y/n): y
Shutting down all admin and server processes in /ofm/jwchoi/sw2/tp/tuxedo12.2.2.0.0/samples/atmi/simpapp/tuxconfig

tmshutdown: internal error: CMDTUX_CAT:766: ERROR: must run on master node

>tmshutdown -k TERM
Shutdown all admin and server processes? (y/n): y
Shutting down all admin and server processes in /ofm/jwchoi/sw2/tp/tuxedo12.2.2.0.0/samples/atmi/simpapp/tuxconfig

tmshutdown: internal error: CMDTUX_CAT:766: ERROR: must run on master node

 

관련해서 떨어지는 ULOG 

140908.node1!BBL.21823748.1.0: LIBTUX_CAT:577: ERROR: Unable to register because the slot is already owned by another process
140908.node1!BBL.21823748.1.0: LIBTUX_CAT:248: ERROR: System init function failed, Uunixerr =

142557.node1!tmshutdown.2556570.1.-2: FATAL: internal error: CMDTUX_CAT:766: ERROR: must run on master node
142622.node1!tmshutdown.18154228.1.-2: 12-08-2022: Tuxedo Version 12.2.2.0.0, 64-bit
142622.node1!tmshutdown.18154228.1.-2: FATAL: internal error: CMDTUX_CAT:766: ERROR: must run on master node
142631.node1!tmshutdown.56099298.1.-2: 12-08-2022: Tuxedo Version 12.2.2.0.0, 64-bit
142631.node1!tmshutdown.56099298.1.-2: FATAL: internal error: CMDTUX_CAT:766: ERROR: must run on master node

 

 

일단 BBL을 살리고 싶었다.

 

오라클 문서를 찾아보니 

 

SHM mode에서 BBL을 살리기 위해서는 restartsrv 커맨드를 사용하라고 한다.

 

*MP mode에서는 BBL 프로세스가 BRIDGE 프로세스에 의해 모니터링 되고 실패시 늘 restart 되지만 

SHM mode 에서는 그러한 프로세스가 존재하지 않기 때문에 BBL을 restart 하기 위해서는 메뉴얼된 intervention(조정,중개) 이 필요하다고 한다. 

 

restartsrv -g <group id> -i <server id>

 

이 때 SHM 모드에서 group id와 server id 는 항상 똑같다고 한다!

 

Group id : 30002

Sever id : 0

 

 

해당 커맨드로 BBL을 다시 되살리니 

BBL이 돌아왔다.

 

 

tmadmin 으로 접속하여 상태를 확인하니 IDLE 이다.

정상적으로 돌아왔다.

 

 

 

 

 

 

 

 

 

참고한 자료

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=345255758103779&parent=EXTERNAL_SEARCH&sourceId=HOWTO&id=765071.1&_afrWindowMode=0&_adf.ctrl-state=ott4wdmfi_147 

 

https://login.oracle.com:443/oam/server/obrareq.cgi?encquery%3DqYvRTQHPZuIZjOhahGRn%2B9F%2FJT1ziJFRgL5oP3qGlDNWP6YA3Gacizqm9S1mwlMzBdu6%2FS203ksuHdXN%2Frs7TIUpZ1OKb3VgxkvnSeUqNtRN1rIKvMN7AHgOxUzbinyig1FN9UjktMnsr5a8aBipuizsUdevl2S60BS%2Bfff8W2pT4gk33n4eXolvpgGAIglslkvTpHIlpxGKTSE0JZA9rb8P6UMf8UQ7xdfEZ1vrtVjlSI7hceuuxRWfei9ay97g7ddYOcG4a%2BE1v8WjenJDBf2IjtfSA6wh57Mxf3MaJDGUru3mziEe8sz4WA1EnOXdrLFx9L0PlnkJeKZlESSryJ7%2B%2FrYRi8e6%2B5RvdTSRBN5HG%2B8JWT01U1yFqMBYtpDWrnmDqd7WCzivsvqsI44W77SBFWWCd5Hre2Cew4Bkatm5MA2qqIom1iWdnhY3MSGRjXheCd6FK0mFHeffKYR4IcK9HU37vfIEJll5%2BQn62XJArDnFOl5%2BTO8usE%2F62qjBxLwnnIP5H5aLHLCIvUwp9Oq%2Bpje0pMdON7Oq2RPrBPhiYv2Qzth%2BXaAReNFrPrkClzOjxILy%2B5gOO9NhwuOU29A7OYEmG%2F9wTsbbG%2B9CYMsqEgrqsauXwqxou3MUsIL7uz4Tt52gFF9nCqLXiAf4kdOd8O7tslilsgyiryyBSrNdupz%2Fi7to83HSC%2FmuDOYIp1sj4Az8kKoWnDeeqB1QZ5K1sMYYZqoOB23JUe%2Fym0c%3D%20agentid%3DcorpWebgates%20ver%3D1%20crmethod%3D2

 

login.oracle.com:443

 

 

DESCRIPTION:

In MP mode, the BBL process is monitored by the BRIDGE process and restarted in case of a failure.
Such mechanism does not exist in SHM mode and requires a manual intervention to restart the BBL.

SOLUTION:

It's not possible to restart the BBL using the tmboot command but it can be done using the restartsrv command.
The restartsrv command usage is :

restartsrv -g <group id> -i <server id>

Fortunately the BBL group id and server id are always the same in SHM mode and are :

Group id: 30002
Server id: 0

So the BBL process can easily be restarted in SHM mode using:

restartsrv -g 30002 -i 0

One can use a script shell to monitor the BBL process using the pid printed at boot time when
you use tmboot -d1 option and restart the BBL in case of failure using the command given above.