คอนฟิก Sun Grid Engine qmaster host

หลังจาก ติดตั้ง Sun Grid Engine บน CentOS 5.5 แล้ว บทความนี้จะกล่าวถึงวิธีการคอนฟิก sge ให้ทำหน้าที่เป็น qmaster host สำหรับจัดการเรื่อง job

เริ่มต้น ใช้คำสั่ง cd ไปยังไดเรคทอรีที่ติดตั้ง sge

[root@cent55-sge sge6_2u5]# cd /gridware/sge/
[root@cent55-sge sge]# ls -l
total 136
drwxr-xr-x  2 root root  4096 Jan  8 12:40 3rd_party
drwxr-xr-x  3 root root  4096 Jan  8 12:48 bin
drwxr-xr-x  3 root root  4096 Jan  8 12:40 catman
drwxr-xr-x  2 root root  4096 Jan  8 12:40 ckpt
drwxr-xr-x  5 root root  4096 Jan  8 12:40 doc
drwxr-xr-x  2 root root  4096 Jan  8 12:40 dtrace
drwxr-xr-x  5 root root  4096 Jan  8 12:48 examples
drwxr-xr-x  2 root root  4096 Jan  8 12:40 hadoop
drwxr-xr-x  2 root root  4096 Jan  8 12:40 include
-rwxr-xr-x  1 root root   125 Dec 11  2009 install_execd
-rwxr-xr-x  1 root root   125 Dec 11  2009 install_qmaster
-rwxr-xr-x  1 root root 59960 Dec 11  2009 inst_sge
drwxr-xr-x  3 root root  4096 Jan  8 12:48 lib
drwxr-xr-x  6 root root  4096 Jan  8 12:40 man
drwxr-xr-x  4 root root  4096 Jan  8 12:40 mpi
drwxr-xr-x  3 root root  4096 Jan  8 12:40 pvm
drwxr-xr-x  3 root root  4096 Jan  8 12:40 qmon
-rwxr-xr-x  1 root root  1289 Dec 11  2009 start_gui_installer
drwxr-xr-x 10 root root  4096 Jan  8 12:40 util
drwxr-xr-x  3 root root  4096 Jan  8 12:48 utilbin

รันคำสั่ง ./install_qmaster หรือ ./inst_sge -m เพื่อเริ่มคอนฟิก sge บนเครื่องนี้ ให้ทำหน้าที่เป็น qmaster host

[root@cent55-sge sge]# ./inst_sge -m
Sun Microsystems, Inc. ("Sun") SOFTWARE LICENSE AGREEMENT
READ THE TERMS OF THIS AGREEMENT ("AGREEMENT") CAREFULLY BEFORE OPENING
SOFTWARE MEDIA PACKAGE. BY OPENING SOFTWARE MEDIA PACKAGE, YOU AGREE TO THE
TERMS OF THIS AGREEMENT. IF YOU ARE ACCESSING SOFTWARE ELECTRONICALLY,
INDICATE YOUR ACCEPTANCE OF THESE TERMS BY SELECTING THE "ACCEPT" (OR
EQUIVALENT) BUTTON AT THE END OF THIS AGREEMENT. IF YOU DO NOT AGREE TO ALL
OF THE TERMS, PROMPTLY RETURN THE UNUSED SOFTWARE TO YOUR PLACE OF PURCHASE
FOR A REFUND OR, IF SOFTWARE IS ACCESSED ELECTRONICALLY, SELECT THE
"DECLINE" (OR "EXIT") BUTTON AT THE END OF THIS AGREEMENT. IF YOU HAVE
SEPARATELY AGREED TO LICENSE TERMS ("MASTER TERMS") FOR YOUR LICENSE TO THIS
SOFTWARE, THEN SECTIONS 1-6 OF THIS AGREEMENT ("SUPPLEMENTAL LICENSE TERMS")
SHALL SUPPLEMENT AND SUPERSEDE THE MASTER TERMS IN RELATION TO THIS
SOFTWARE.
...
...
17. Records and Documentation.? During the term of the SLA and Entitlement,
and for a period of three (3) years thereafter, You agree to keep proper
records and documentation of Your compliance with the SLA and Entitlement.
Upon Sun?s reasonable request, You will provide copies of such records and
documentation to Sun for the purpose of confirming Your compliance with the
terms and conditions of the SLA and Entitlement. This section will survive
any termination of the SLA and Entitlement. You may terminate this SLA and
Entitlement at any time by destroying all copies of the Software in which
case the obligations set forth in Section 7 of the SLA shall apply.
หน้าจอแสดง LICENSE AGREEMENT ตอบ "y" เพื่อยอมรับ
Do you agree with that license? (y/n) [n] >> y

หน้าแสดงการเริ่มต้นติดตั้ง

Welcome to the Grid Engine installation
---------------------------------------
Grid Engine qmaster host installation
-------------------------------------
Before you continue with the installation please read these hints:
   - Your terminal window should have a size of at least
     80x24 characters
   - The INTR character is often bound to the key Ctrl-C.
     The term >Ctrl-C< is used during the installation if you
     have the possibility to abort the installation
The qmaster installation procedure will take approximately 5-10 minutes.
Hit <RETURN> to continue >>

เลือก user ที่ไม่ใช่ root สำหรับรัน sge

Grid Engine admin user account
------------------------------
The current directory
   /gridware/sge
is owned by user
   sgeadmin
If user >root< does not have write permissions in this directory on *all*
of the machines where Grid Engine will be installed (NFS partitions not
exported for user >root< with read/write permissions) it is recommended to
install Grid Engine that all spool files will be created under the user id
of user >sgeadmin<.
IMPORTANT NOTE: The daemons still have to be started by user >root<.
Do you want to install Grid Engine as admin user >sgeadmin< (y/n) [y] >>
Installing Grid Engine as admin user >sgeadmin<
Hit <RETURN> to continue >>

ไดเรคทอรีหลักของ sge

Checking $SGE_ROOT directory
----------------------------
The Grid Engine root directory is:
   $SGE_ROOT = /gridware/sge
If this directory is not correct (e.g. it may contain an automounter
prefix) enter the correct path to this directory or hit <RETURN>
to use default [/gridware/sge] >>
Your $SGE_ROOT directory: /gridware/sge
Hit <RETURN> to continue >>

เลือกหมายเลขพอร์ตสำหรับรัน sge_qmaster

Grid Engine TCP/IP communication service
----------------------------------------
The port for sge_qmaster is currently set as service.
   sge_qmaster service set to port 10500
Now you have the possibility to set/change the communication ports by using the
>shell environment< or you may configure it via a network service, configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
    sge_qmaster <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<:                           [1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 2) >>
Grid Engine TCP/IP service >sge_qmaster<
----------------------------------------
Using the service
   sge_qmaster
for communication with Grid Engine.
Hit <RETURN> to continue >>

 เลือกหมายเลขพอร์ตสำหรับรัน sge_execd

 Grid Engine TCP/IP communication service
----------------------------------------
The port for sge_execd is currently set as service.
   sge_execd service set to port 10501
Now you have the possibility to set/change the communication ports by using the
>shell environment< or you may configure it via a network service, configured
in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
    sge_execd <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<:                           [1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 2) >>
 
Grid Engine TCP/IP communication service
-----------------------------------------
Using the service
   sge_execd
for communication with Grid Engine.
Hit <RETURN> to continue >>

ตั้งชื่อ cell ในที่นี้จะตั้งเป็น MyCell

หมายเหตุ ชื่อ cell ใช้ในการจัดกลุ่มเพื่อส่งงานระหว่าง qmaster host และ execution host

 Grid Engine cells
-----------------
Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don't
know yet what is a Grid Engine cell it is safe to keep the default cell name
   default
If you want to install multiple cells you can enter a cell name now.
The environment variable
   $SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >> MyCell
Using cell >MyCell<.
Hit <RETURN> to continue >>

ตั้งชื่อ cluster ในที่นี้ตั้งเป็น MyCluster

Unique cluster name
-------------------
The cluster name uniquely identifies a specific Sun Grid Engine cluster.
The cluster name must be unique throughout your organization. The name
is not related to the SGE cell.
The cluster name must start with a letter ([A-Za-z]), followed by letters,
digits ([0-9]), dashes (-) or underscores (_).
Enter new cluster name or hit <RETURN>
to use default [p10500] >> MyCluster
creating directory: /gridware/sge/MyCell/common
Your $SGE_CLUSTER_NAME: MyCluster
Hit <RETURN> to continue >>

ไดเรคทอรี spool ของ qmaster

 Grid Engine qmaster spool directory
-----------------------------------
The qmaster spool directory is the place where the qmaster daemon stores
the configuration and the state of the queuing system.
The admin user >sgeadmin< must have read/write access
to the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start
the qmaster daemon on other hosts (see the corresponding section in the
Grid Engine Installation and Administration Manual for details) the account
on the shadow master hosts also needs read/write access to this directory.
Enter a qmaster spool directory [/gridware/sge/MyCell/spool/qmaster] >>
Using qmaster spool directory >/gridware/sge/MyCell/spool/qmaster<.
Hit <RETURN> to continue >>

จะรัน sge บนเครื่อง Windows ด้วยหรือไม่

Windows Execution Host Support
------------------------------
Are you going to install Windows Execution Hosts? (y/n) [n] >>

ตรวจสอบไฟล์ permission

Verifying and setting file permissions
--------------------------------------
Did you install this version with >pkgadd< or did you already verify
and set the file permissions of your distribution (enter: y) (y/n) [y] >>
We do not verify file permissions. Hit <RETURN> to continue >>

คารใช้ DNS domain ชื่อเดียวกัน

Select default Grid Engine hostname resolving method
----------------------------------------------------
Are all hosts of your cluster in one DNS domain? If this is
the case the hostnames
   >hostA< and >hostA.foo.com<
would be treated as equal, because the DNS domain name >foo.com<
is ignored when comparing hostnames.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >>
Ignoring domain name when comparing hostnames.
Hit <RETURN> to continue >>

จะเปิดการใช้ Grid Engine JMX MBean server หรือไม่ ในที่นี้จะตอบ “n” คือไม่ใช้

Grid Engine JMX MBean server
----------------------------
In order to use the SGE Inspect or the Service Domain Manager (SDM)
SGE adapter you need to configure a JMX server in qmaster. Qmaster
will then load a Java Virtual Machine through a shared library.
NOTE: Java 1.5 or later is required for the JMX MBean server.
Do you want to enable the JMX MBean server (y/n) [y] >> n

โปรแกรมจะเริ่มสร้างไฟล์ ไดเรคทอรี ต่างๆ ที่จะใช้

Making directories
------------------
creating directory: /gridware/sge/MyCell/spool/qmaster
creating directory: /gridware/sge/MyCell/spool/qmaster/job_scripts
Hit <RETURN> to continue >>

เลือกวิธีการ spooling

Setup spooling
--------------
Your SGE binaries are compiled to link the spooling libraries
during runtime (dynamically). So you can choose between Berkeley DB
spooling and Classic spooling method.
Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
The Berkeley DB spooling method provides two configurations!
Local spooling:
The Berkeley DB spools into a local directory on this host (qmaster host)
This setup is faster, but you can't setup a shadow master host
Berkeley DB Spooling Server:
If you want to setup a shadow master host, you need to use
Berkeley DB Spooling Server!
In this case you have to choose a host with a configured RPC service.
The qmaster host connects via RPC to the Berkeley DB. This setup is more
failsafe, but results in a clear potential security hole. RPC communication
(as used by Berkeley DB) can be easily compromised. Please only use this
alternative if your site is secure or if you are not concerned about
security. Check the installation guide for further advice on how to achieve
failsafety without compromising security.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>
Hit <RETURN> to continue >>
Berkeley Database spooling parameters
-------------------------------------
Please enter the database directory now, even if you want to spool locally,
it is necessary to enter this database directory.
Default: [/gridware/sge/MyCell/spool/spooldb] >>
creating directory: /gridware/sge/MyCell/spool/spooldb
Dumping bootstrapping information
Initializing spooling database
Hit <RETURN> to continue >>

ขอบเขตของ group id เพิ่มเติม ที่ sge ใช้

Grid Engine group id range
--------------------------
When jobs are started under the control of Grid Engine an additional group id
is set on platforms which do not support jobs. This is done to provide maximum
control for Grid Engine jobs.
This additional UNIX group id range must be unused group id's in your system.
Each job will be assigned a unique id during the time it is running.
Therefore you need to provide a range of id's which will be assigned
dynamically for jobs.
The range must be big enough to provide enough numbers for the maximum number
of Grid Engine jobs running at a single moment on a single host. E.g. a range
like >20000-20100< means, that Grid Engine will use the group ids from
20000-20100 and provides a range for 100 Grid Engine jobs at the same time
on a single host.
You can change at any time the group id range in your cluster configuration.
Please enter a range [20000-20100] >>
Using >20000-20100< as gid range. Hit <RETURN> to continue >>
Grid Engine cluster configuration
---------------------------------
Please give the basic configuration parameters of your Grid Engine
installation:
   <execd_spool_dir>
The pathname of the spool directory of the execution hosts. User >sgeadmin<
must have the right to create this directory and to write into it.
Default: [/gridware/sge/MyCell/spool] >>

อีเมล์สำหรับการแจ้งปัญหา

Grid Engine cluster configuration (continued)
---------------------------------------------
<administrator_mail>
The email address of the administrator to whom problem reports are sent.
It is recommended to configure this parameter. You may use >none<
if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [none] >>
 
The following parameters for the cluster configuration were configured:
   execd_spool_dir        /gridware/sge/MyCell/spool
   administrator_mail     none
Do you want to change the configuration parameters (y/n) [n] >>

การสร้างไฟล์คอนฟิก

Creating local configuration
----------------------------
Creating >act_qmaster< file
Adding default complex attributes
Adding default parallel environments (PE)
Adding SGE default usersets
Adding >sge_aliases< path aliases file
Adding >qtask< qtcsh sample default request file
Adding >sge_request< default submit options file
Creating >sgemaster< script
Creating >sgeexecd< script
Creating settings files for >.profile/.cshrc<
Hit <RETURN> to continue >>

สร้าง startup script เวลาบู๊ตเครื่อง

qmaster startup script
----------------------
We can install the startup script that will
start qmaster at machine boot (y/n) [y] >>
cp /gridware/sge/MyCell/common/sgemaster /etc/init.d/sgemaster.MyCluster
/usr/lib/lsb/install_initd /etc/init.d/sgemaster.MyCluster
Hit <RETURN> to continue >>

รันเซอร์วิส sge_qmaster

Grid Engine qmaster startup
---------------------------
Starting qmaster daemon. Please wait ...
   starting sge_qmaster
Hit <RETURN> to continue >>

ระบุชื่อ host สำหรับรันโปรแกรม sge_execd ในที่นี้จะใส่สองเครื่องคือ “cent55-node1” และ “cent55-node2”

Adding Grid Engine hosts
------------------------
Please now add the list of hosts, where you will later install your execution
daemons. These hosts will be also added as valid submit hosts.
Please enter a blank separated list of your execution hosts. You may
press <RETURN> if the line is getting too long. Once you are finished
simply press <RETURN> without entering a name.
You also may prepare a file with the hostnames of the machines where you plan
to install Grid Engine. This may be convenient if you are installing Grid
Engine on many hosts.
Do you want to use a file which contains the list of hosts (y/n) [n] >>

ใส่ชื่อเครื่อง cent55-node1

Adding admin and submit hosts
-----------------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.
Host(s): cent55-node1
cent55-node1.spalinux.com added to administrative host list
cent55-node1.spalinux.com added to submit host list

ใส่ชื่อเครื่อง cent55-node2

Adding admin and submit hosts
-----------------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.
Host(s): cent55-node2
cent55-node2.spalinux.com added to administrative host list
cent55-node2.spalinux.com added to submit host list
Hit <RETURN> to continue >>

เมื่อใส่ชื่อเครื่องหมดแล้ว กด [enter] ผ่าน

Adding admin and submit hosts
-----------------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.
Host(s):
Finished adding hosts. Hit <RETURN> to continue >>

ระบุชื่อเครื่องเพิ่มเติมสำหรับ shadow host ในที่นี้ไม่ได้ใช้

If you want to use a shadow host, it is recommended to add this host
to the list of administrative hosts.
If you are not sure, it is also possible to add or remove hosts after the
installation with <qconf -ah hostname> for adding and <qconf -dh hostname>
for removing this host
Attention: This is not the shadow host installation
procedure.
You still have to install the shadow host separately
Do you want to add your shadow host(s) now? (y/n) [y] >>
Adding Grid Engine shadow hosts
-------------------------------
Please now add the list of hosts, where you will later install your shadow
daemon.
Please enter a blank separated list of your execution hosts. You may
press <RETURN> if the line is getting too long. Once you are finished
simply press <RETURN> without entering a name.
You also may prepare a file with the hostnames of the machines where you plan
to install Grid Engine. This may be convenient if you are installing Grid
Engine on many hosts.
Do you want to use a file which contains the list of hosts (y/n) [n] >>
Adding admin hosts
------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are
entering an empty list. You will see messages from Grid Engine
when the hosts are added.
Host(s):
Finished adding hosts. Hit <RETURN> to continue >>
Creating the default <all.q> queue and <allhosts> hostgroup
-----------------------------------------------------------
root@cent55-sge.spalinux.com added "@allhosts" to host group list
root@cent55-sge.spalinux.com added "all.q" to cluster queue list
Hit <RETURN> to continue >>

การคอนฟิก Scheduler Tuning

Scheduler Tuning
----------------
The details on the different options are described in the manual.
Configurations
--------------
1) Normal
          Fixed interval scheduling, report limited scheduling information,
          actual + assumed load
2) High
          Fixed interval scheduling, report limited scheduling information,
          actual load
3) Max
          Immediate Scheduling, report no scheduling information,
          actual load
Enter the number of your preferred configuration and hit <RETURN>!
Default configuration is [1] >>
 
We're configuring the scheduler with >Normal< settings!
Do you agree? (y/n) [y] >>

แนะวิธีการใช้ sge

Using Grid Engine
-----------------
You should now enter the command:
   source /gridware/sge/MyCell/common/settings.csh
if you are a csh/tcsh user or
   # . /gridware/sge/MyCell/common/settings.sh
if you are a sh/ksh user.
This will set or expand the following environment variables:
   - $SGE_ROOT         (always necessary)
   - $SGE_CELL         (if you are using a cell other than >default<)
   - $SGE_CLUSTER_NAME (always necessary)
   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)
   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)
   - $PATH/$path       (to find the Grid Engine binaries)
   - $MANPATH          (to access the manual pages)
Hit <RETURN> to see where Grid Engine logs messages >>
Grid Engine messages
--------------------
Grid Engine messages can be found at:
   /tmp/qmaster_messages (during qmaster startup)
   /tmp/execd_messages   (during execution daemon startup)
After startup the daemons log their messages in their spool directories.
   Qmaster:     /gridware/sge/MyCell/spool/qmaster/messages
   Exec daemon: <execd_spool_dir>/<hostname>/messages
Grid Engine startup scripts
---------------------------
Grid Engine startup scripts can be found at:
   /gridware/sge/MyCell/common/sgemaster (qmaster)
   /gridware/sge/MyCell/common/sgeexecd (execd)
Do you want to see previous screen about using Grid Engine again (y/n) [n] >>
 
Your Grid Engine qmaster installation is now completed
------------------------------------------------------
Please now login to all hosts where you want to run an execution daemon
and start the execution host installation procedure.
If you want to run an execution daemon on this host, please do not forget
to make the execution host installation in this host as well.
All execution hosts must be administrative hosts during the installation.
All hosts which you added to the list of administrative hosts during this
installation procedure can now be installed.
You may verify your administrative hosts with the command
   # qconf -sh
and you may add new administrative hosts with the command
   # qconf -ah <hostname>
Please hit <RETURN> >>

ใช้คำสั่ง ps ตรวจสอบโปรเซส sge_qmaster

[root@cent55-sge sge]# ps -ef | grep sge
sgeadmin  4667     1  1 13:55 ?        00:00:00 /gridware/sge/bin/lx24-amd64/sge_qmaster

การเปิด/ปิด เซอร์วิส sge_qmaster

การปิดเซอร์วิส sge_qmaster

[root@cent55-sge ~]# /etc/init.d/sgemaster.MyCluster stop
   shutting down Grid Engine qmaster

การเปิดหรือรันเซอร์วิส qmaster

[root@cent55-sge ~]# /etc/init.d/sgemaster.MyCluster start
   starting sge_qmaster

ข้อมูลอ้างอิง

Leave a Reply

Your email address will not be published.