หลังจาก ติดตั้ง Sun Grid Engine บน CentOS 5.5 แล้ว บทความนี้จะกล่าวถึงวิธีการคอนฟิก sge ให้ทำหน้าที่เป็น qmaster host สำหรับจัดการเรื่อง job
เริ่มต้น ใช้คำสั่ง cd ไปยังไดเรคทอรีที่ติดตั้ง sge
[root@cent55-sge sge6_2u5]# cd /gridware/sge/
[root@cent55-sge sge]# ls -l total 136 drwxr-xr-x 2 root root 4096 Jan 8 12:40 3rd_party drwxr-xr-x 3 root root 4096 Jan 8 12:48 bin drwxr-xr-x 3 root root 4096 Jan 8 12:40 catman drwxr-xr-x 2 root root 4096 Jan 8 12:40 ckpt drwxr-xr-x 5 root root 4096 Jan 8 12:40 doc drwxr-xr-x 2 root root 4096 Jan 8 12:40 dtrace drwxr-xr-x 5 root root 4096 Jan 8 12:48 examples drwxr-xr-x 2 root root 4096 Jan 8 12:40 hadoop drwxr-xr-x 2 root root 4096 Jan 8 12:40 include -rwxr-xr-x 1 root root 125 Dec 11 2009 install_execd -rwxr-xr-x 1 root root 125 Dec 11 2009 install_qmaster -rwxr-xr-x 1 root root 59960 Dec 11 2009 inst_sge drwxr-xr-x 3 root root 4096 Jan 8 12:48 lib drwxr-xr-x 6 root root 4096 Jan 8 12:40 man drwxr-xr-x 4 root root 4096 Jan 8 12:40 mpi drwxr-xr-x 3 root root 4096 Jan 8 12:40 pvm drwxr-xr-x 3 root root 4096 Jan 8 12:40 qmon -rwxr-xr-x 1 root root 1289 Dec 11 2009 start_gui_installer drwxr-xr-x 10 root root 4096 Jan 8 12:40 util drwxr-xr-x 3 root root 4096 Jan 8 12:48 utilbin
รันคำสั่ง ./install_qmaster หรือ ./inst_sge -m เพื่อเริ่มคอนฟิก sge บนเครื่องนี้ ให้ทำหน้าที่เป็น qmaster host
[root@cent55-sge sge]# ./inst_sge -m Sun Microsystems, Inc. ("Sun") SOFTWARE LICENSE AGREEMENT
READ THE TERMS OF THIS AGREEMENT ("AGREEMENT") CAREFULLY BEFORE OPENING SOFTWARE MEDIA PACKAGE. BY OPENING SOFTWARE MEDIA PACKAGE, YOU AGREE TO THE TERMS OF THIS AGREEMENT. IF YOU ARE ACCESSING SOFTWARE ELECTRONICALLY, INDICATE YOUR ACCEPTANCE OF THESE TERMS BY SELECTING THE "ACCEPT" (OR EQUIVALENT) BUTTON AT THE END OF THIS AGREEMENT. IF YOU DO NOT AGREE TO ALL OF THE TERMS, PROMPTLY RETURN THE UNUSED SOFTWARE TO YOUR PLACE OF PURCHASE FOR A REFUND OR, IF SOFTWARE IS ACCESSED ELECTRONICALLY, SELECT THE "DECLINE" (OR "EXIT") BUTTON AT THE END OF THIS AGREEMENT. IF YOU HAVE SEPARATELY AGREED TO LICENSE TERMS ("MASTER TERMS") FOR YOUR LICENSE TO THIS SOFTWARE, THEN SECTIONS 1-6 OF THIS AGREEMENT ("SUPPLEMENTAL LICENSE TERMS") SHALL SUPPLEMENT AND SUPERSEDE THE MASTER TERMS IN RELATION TO THIS SOFTWARE.
... ...
17. Records and Documentation.? During the term of the SLA and Entitlement, and for a period of three (3) years thereafter, You agree to keep proper records and documentation of Your compliance with the SLA and Entitlement. Upon Sun?s reasonable request, You will provide copies of such records and documentation to Sun for the purpose of confirming Your compliance with the terms and conditions of the SLA and Entitlement. This section will survive any termination of the SLA and Entitlement. You may terminate this SLA and Entitlement at any time by destroying all copies of the Software in which case the obligations set forth in Section 7 of the SLA shall apply.
หน้าจอแสดง LICENSE AGREEMENT ตอบ "y" เพื่อยอมรับ
Do you agree with that license? (y/n) [n] >> y
หน้าแสดงการเริ่มต้นติดตั้ง
Welcome to the Grid Engine installation ---------------------------------------
Grid Engine qmaster host installation -------------------------------------
Before you continue with the installation please read these hints:
- Your terminal window should have a size of at least 80x24 characters
- The INTR character is often bound to the key Ctrl-C. The term >Ctrl-C< is used during the installation if you have the possibility to abort the installation
The qmaster installation procedure will take approximately 5-10 minutes.
Hit <RETURN> to continue >>
เลือก user ที่ไม่ใช่ root สำหรับรัน sge
Grid Engine admin user account ------------------------------
The current directory
/gridware/sge
is owned by user
sgeadmin
If user >root< does not have write permissions in this directory on *all* of the machines where Grid Engine will be installed (NFS partitions not exported for user >root< with read/write permissions) it is recommended to install Grid Engine that all spool files will be created under the user id of user >sgeadmin<.
IMPORTANT NOTE: The daemons still have to be started by user >root<.
Do you want to install Grid Engine as admin user >sgeadmin< (y/n) [y] >>
Installing Grid Engine as admin user >sgeadmin< Hit <RETURN> to continue >>
ไดเรคทอรีหลักของ sge
Checking $SGE_ROOT directory ----------------------------
The Grid Engine root directory is:
$SGE_ROOT = /gridware/sge
If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit <RETURN> to use default [/gridware/sge] >>
Your $SGE_ROOT directory: /gridware/sge
Hit <RETURN> to continue >>
เลือกหมายเลขพอร์ตสำหรับรัน sge_qmaster
Grid Engine TCP/IP communication service ----------------------------------------
The port for sge_qmaster is currently set as service.
sge_qmaster service set to port 10500
Now you have the possibility to set/change the communication ports by using the >shell environment< or you may configure it via a network service, configured in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
sge_qmaster <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<: [1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 2) >>
Grid Engine TCP/IP service >sge_qmaster< ----------------------------------------
Using the service
sge_qmaster
for communication with Grid Engine.
Hit <RETURN> to continue >>
เลือกหมายเลขพอร์ตสำหรับรัน sge_execd
Grid Engine TCP/IP communication service ----------------------------------------
The port for sge_execd is currently set as service.
sge_execd service set to port 10501
Now you have the possibility to set/change the communication ports by using the >shell environment< or you may configure it via a network service, configured in local >/etc/service<, >NIS< or >NIS+<, adding an entry in the form
sge_execd <port_number>/tcp
to your services database and make sure to use an unused port number.
How do you want to configure the Grid Engine communication ports?
Using the >shell environment<: [1]
Using a network service like >/etc/service<, >NIS/NIS+<: [2]
(default: 2) >>
Grid Engine TCP/IP communication service -----------------------------------------
Using the service
sge_execd
for communication with Grid Engine.
Hit <RETURN> to continue >>
ตั้งชื่อ cell ในที่นี้จะตั้งเป็น MyCell
หมายเหตุ ชื่อ cell ใช้ในการจัดกลุ่มเพื่อส่งงานระหว่าง qmaster host และ execution host
Grid Engine cells -----------------
Grid Engine supports multiple cells.
If you are not planning to run multiple Grid Engine clusters or if you don't know yet what is a Grid Engine cell it is safe to keep the default cell name
default
If you want to install multiple cells you can enter a cell name now.
The environment variable
$SGE_CELL=<your_cell_name>
will be set for all further Grid Engine commands.
Enter cell name [default] >> MyCell
Using cell >MyCell<. Hit <RETURN> to continue >>
ตั้งชื่อ cluster ในที่นี้ตั้งเป็น MyCluster
Unique cluster name -------------------
The cluster name uniquely identifies a specific Sun Grid Engine cluster. The cluster name must be unique throughout your organization. The name is not related to the SGE cell.
The cluster name must start with a letter ([A-Za-z]), followed by letters, digits ([0-9]), dashes (-) or underscores (_).
Enter new cluster name or hit <RETURN> to use default [p10500] >> MyCluster
creating directory: /gridware/sge/MyCell/common
Your $SGE_CLUSTER_NAME: MyCluster
Hit <RETURN> to continue >>
ไดเรคทอรี spool ของ qmaster
Grid Engine qmaster spool directory -----------------------------------
The qmaster spool directory is the place where the qmaster daemon stores the configuration and the state of the queuing system.
The admin user >sgeadmin< must have read/write access to the qmaster spool directory.
If you will install shadow master hosts or if you want to be able to start the qmaster daemon on other hosts (see the corresponding section in the Grid Engine Installation and Administration Manual for details) the account on the shadow master hosts also needs read/write access to this directory.
Enter a qmaster spool directory [/gridware/sge/MyCell/spool/qmaster] >>
Using qmaster spool directory >/gridware/sge/MyCell/spool/qmaster<. Hit <RETURN> to continue >>
จะรัน sge บนเครื่อง Windows ด้วยหรือไม่
Windows Execution Host Support ------------------------------
Are you going to install Windows Execution Hosts? (y/n) [n] >>
ตรวจสอบไฟล์ permission
Verifying and setting file permissions --------------------------------------
Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y) (y/n) [y] >>
We do not verify file permissions. Hit <RETURN> to continue >>
คารใช้ DNS domain ชื่อเดียวกัน
Select default Grid Engine hostname resolving method ----------------------------------------------------
Are all hosts of your cluster in one DNS domain? If this is the case the hostnames
>hostA< and >hostA.foo.com<
would be treated as equal, because the DNS domain name >foo.com< is ignored when comparing hostnames.
Are all hosts of your cluster in a single DNS domain (y/n) [y] >>
Ignoring domain name when comparing hostnames.
Hit <RETURN> to continue >>
จะเปิดการใช้ Grid Engine JMX MBean server หรือไม่ ในที่นี้จะตอบ “n” คือไม่ใช้
Grid Engine JMX MBean server ----------------------------
In order to use the SGE Inspect or the Service Domain Manager (SDM) SGE adapter you need to configure a JMX server in qmaster. Qmaster will then load a Java Virtual Machine through a shared library. NOTE: Java 1.5 or later is required for the JMX MBean server.
Do you want to enable the JMX MBean server (y/n) [y] >> n
โปรแกรมจะเริ่มสร้างไฟล์ ไดเรคทอรี ต่างๆ ที่จะใช้
Making directories ------------------
creating directory: /gridware/sge/MyCell/spool/qmaster creating directory: /gridware/sge/MyCell/spool/qmaster/job_scripts Hit <RETURN> to continue >>
เลือกวิธีการ spooling
Setup spooling -------------- Your SGE binaries are compiled to link the spooling libraries during runtime (dynamically). So you can choose between Berkeley DB spooling and Classic spooling method. Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
The Berkeley DB spooling method provides two configurations!
Local spooling: The Berkeley DB spools into a local directory on this host (qmaster host) This setup is faster, but you can't setup a shadow master host
Berkeley DB Spooling Server: If you want to setup a shadow master host, you need to use Berkeley DB Spooling Server! In this case you have to choose a host with a configured RPC service. The qmaster host connects via RPC to the Berkeley DB. This setup is more failsafe, but results in a clear potential security hole. RPC communication (as used by Berkeley DB) can be easily compromised. Please only use this alternative if your site is secure or if you are not concerned about security. Check the installation guide for further advice on how to achieve failsafety without compromising security.
Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>
Hit <RETURN> to continue >>
Berkeley Database spooling parameters -------------------------------------
Please enter the database directory now, even if you want to spool locally, it is necessary to enter this database directory.
Default: [/gridware/sge/MyCell/spool/spooldb] >>
creating directory: /gridware/sge/MyCell/spool/spooldb Dumping bootstrapping information Initializing spooling database
Hit <RETURN> to continue >>
ขอบเขตของ group id เพิ่มเติม ที่ sge ใช้
Grid Engine group id range --------------------------
When jobs are started under the control of Grid Engine an additional group id is set on platforms which do not support jobs. This is done to provide maximum control for Grid Engine jobs.
This additional UNIX group id range must be unused group id's in your system. Each job will be assigned a unique id during the time it is running. Therefore you need to provide a range of id's which will be assigned dynamically for jobs.
The range must be big enough to provide enough numbers for the maximum number of Grid Engine jobs running at a single moment on a single host. E.g. a range like >20000-20100< means, that Grid Engine will use the group ids from 20000-20100 and provides a range for 100 Grid Engine jobs at the same time on a single host.
You can change at any time the group id range in your cluster configuration.
Please enter a range [20000-20100] >>
Using >20000-20100< as gid range. Hit <RETURN> to continue >>
Grid Engine cluster configuration ---------------------------------
Please give the basic configuration parameters of your Grid Engine installation:
<execd_spool_dir>
The pathname of the spool directory of the execution hosts. User >sgeadmin< must have the right to create this directory and to write into it.
Default: [/gridware/sge/MyCell/spool] >>
อีเมล์สำหรับการแจ้งปัญหา
Grid Engine cluster configuration (continued) ---------------------------------------------
<administrator_mail>
The email address of the administrator to whom problem reports are sent.
It is recommended to configure this parameter. You may use >none< if you do not wish to receive administrator mail.
Please enter an email address in the form >user@foo.com<.
Default: [none] >>
The following parameters for the cluster configuration were configured:
execd_spool_dir /gridware/sge/MyCell/spool administrator_mail none
Do you want to change the configuration parameters (y/n) [n] >>
การสร้างไฟล์คอนฟิก
Creating local configuration ---------------------------- Creating >act_qmaster< file Adding default complex attributes Adding default parallel environments (PE) Adding SGE default usersets Adding >sge_aliases< path aliases file Adding >qtask< qtcsh sample default request file Adding >sge_request< default submit options file Creating >sgemaster< script Creating >sgeexecd< script Creating settings files for >.profile/.cshrc<
Hit <RETURN> to continue >>
สร้าง startup script เวลาบู๊ตเครื่อง
qmaster startup script ----------------------
We can install the startup script that will start qmaster at machine boot (y/n) [y] >>
cp /gridware/sge/MyCell/common/sgemaster /etc/init.d/sgemaster.MyCluster /usr/lib/lsb/install_initd /etc/init.d/sgemaster.MyCluster
Hit <RETURN> to continue >>
รันเซอร์วิส sge_qmaster
Grid Engine qmaster startup ---------------------------
Starting qmaster daemon. Please wait ... starting sge_qmaster
Hit <RETURN> to continue >>
ระบุชื่อ host สำหรับรันโปรแกรม sge_execd ในที่นี้จะใส่สองเครื่องคือ “cent55-node1” และ “cent55-node2”
Adding Grid Engine hosts ------------------------
Please now add the list of hosts, where you will later install your execution daemons. These hosts will be also added as valid submit hosts.
Please enter a blank separated list of your execution hosts. You may press <RETURN> if the line is getting too long. Once you are finished simply press <RETURN> without entering a name.
You also may prepare a file with the hostnames of the machines where you plan to install Grid Engine. This may be convenient if you are installing Grid Engine on many hosts.
Do you want to use a file which contains the list of hosts (y/n) [n] >>
ใส่ชื่อเครื่อง cent55-node1
Adding admin and submit hosts -----------------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are entering an empty list. You will see messages from Grid Engine when the hosts are added.
Host(s): cent55-node1 cent55-node1.spalinux.com added to administrative host list cent55-node1.spalinux.com added to submit host list
ใส่ชื่อเครื่อง cent55-node2
Adding admin and submit hosts -----------------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are entering an empty list. You will see messages from Grid Engine when the hosts are added.
Host(s): cent55-node2 cent55-node2.spalinux.com added to administrative host list cent55-node2.spalinux.com added to submit host list Hit <RETURN> to continue >>
เมื่อใส่ชื่อเครื่องหมดแล้ว กด [enter] ผ่าน
Adding admin and submit hosts -----------------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are entering an empty list. You will see messages from Grid Engine when the hosts are added.
Host(s): Finished adding hosts. Hit <RETURN> to continue >>
ระบุชื่อเครื่องเพิ่มเติมสำหรับ shadow host ในที่นี้ไม่ได้ใช้
If you want to use a shadow host, it is recommended to add this host to the list of administrative hosts.
If you are not sure, it is also possible to add or remove hosts after the installation with <qconf -ah hostname> for adding and <qconf -dh hostname> for removing this host
Attention: This is not the shadow host installation procedure. You still have to install the shadow host separately
Do you want to add your shadow host(s) now? (y/n) [y] >>
Adding Grid Engine shadow hosts -------------------------------
Please now add the list of hosts, where you will later install your shadow daemon.
Please enter a blank separated list of your execution hosts. You may press <RETURN> if the line is getting too long. Once you are finished simply press <RETURN> without entering a name.
You also may prepare a file with the hostnames of the machines where you plan to install Grid Engine. This may be convenient if you are installing Grid Engine on many hosts.
Do you want to use a file which contains the list of hosts (y/n) [n] >>
Adding admin hosts ------------------
Please enter a blank seperated list of hosts.
Stop by entering <RETURN>. You may repeat this step until you are entering an empty list. You will see messages from Grid Engine when the hosts are added.
Host(s): Finished adding hosts. Hit <RETURN> to continue >>
Creating the default <all.q> queue and <allhosts> hostgroup -----------------------------------------------------------
root@cent55-sge.spalinux.com added "@allhosts" to host group list root@cent55-sge.spalinux.com added "all.q" to cluster queue list
Hit <RETURN> to continue >>
การคอนฟิก Scheduler Tuning
Scheduler Tuning ----------------
The details on the different options are described in the manual.
Configurations -------------- 1) Normal Fixed interval scheduling, report limited scheduling information, actual + assumed load
2) High Fixed interval scheduling, report limited scheduling information, actual load
3) Max Immediate Scheduling, report no scheduling information, actual load
Enter the number of your preferred configuration and hit <RETURN>! Default configuration is [1] >>
We're configuring the scheduler with >Normal< settings! Do you agree? (y/n) [y] >>
แนะวิธีการใช้ sge
Using Grid Engine -----------------
You should now enter the command:
source /gridware/sge/MyCell/common/settings.csh
if you are a csh/tcsh user or
# . /gridware/sge/MyCell/common/settings.sh
if you are a sh/ksh user.
This will set or expand the following environment variables:
- $SGE_ROOT (always necessary) - $SGE_CELL (if you are using a cell other than >default<) - $SGE_CLUSTER_NAME (always necessary) - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<) - $SGE_EXECD_PORT (if you haven't added the service >sge_execd<) - $PATH/$path (to find the Grid Engine binaries) - $MANPATH (to access the manual pages)
Hit <RETURN> to see where Grid Engine logs messages >>
Grid Engine messages --------------------
Grid Engine messages can be found at:
/tmp/qmaster_messages (during qmaster startup) /tmp/execd_messages (during execution daemon startup)
After startup the daemons log their messages in their spool directories.
Qmaster: /gridware/sge/MyCell/spool/qmaster/messages Exec daemon: <execd_spool_dir>/<hostname>/messages
Grid Engine startup scripts ---------------------------
Grid Engine startup scripts can be found at:
/gridware/sge/MyCell/common/sgemaster (qmaster) /gridware/sge/MyCell/common/sgeexecd (execd)
Do you want to see previous screen about using Grid Engine again (y/n) [n] >>
Your Grid Engine qmaster installation is now completed ------------------------------------------------------
Please now login to all hosts where you want to run an execution daemon and start the execution host installation procedure.
If you want to run an execution daemon on this host, please do not forget to make the execution host installation in this host as well.
All execution hosts must be administrative hosts during the installation. All hosts which you added to the list of administrative hosts during this installation procedure can now be installed.
You may verify your administrative hosts with the command
# qconf -sh
and you may add new administrative hosts with the command
# qconf -ah <hostname>
Please hit <RETURN> >>
ใช้คำสั่ง ps ตรวจสอบโปรเซส sge_qmaster
[root@cent55-sge sge]# ps -ef | grep sge sgeadmin 4667 1 1 13:55 ? 00:00:00 /gridware/sge/bin/lx24-amd64/sge_qmaster
การเปิด/ปิด เซอร์วิส sge_qmaster
การปิดเซอร์วิส sge_qmaster
[root@cent55-sge ~]# /etc/init.d/sgemaster.MyCluster stop shutting down Grid Engine qmaster
การเปิดหรือรันเซอร์วิส qmaster
[root@cent55-sge ~]# /etc/init.d/sgemaster.MyCluster start starting sge_qmaster