[Why] Nodes need to communicate with each other when testing cluster jobs. Ips can bind to on macs after setting 'direct_mac' and 'direct_ips' for cluster jobsI
[HowTo] 1) Need a cluster config file in $LKP_SRC/cluster, for example: ```yaml assumption: <assumption01> switch: switch01 ip0: 3
---
vm-2p8g--renwen: roles: [ server ] macs: [ "ec:f4:bb:cb:7b:92" ]
vm-2p4g--renwen: roles: [ client ] macs: [ "ec:f4:bb:cb:54:90" ] ```
nr_node=2 cs1 switch=sw1 ip0=0 ($ip_prefix.0-.1) cs2 switch=sw1 ip0=2 ($ip_prefix.2-.3)
"switch": only need if 2+ clusters share the same switch "ip0": only need if shared switch, and != 0 "assumption": the info about this cluster config
2) Scheduler get config from config file and set 'direct_macs' and 'direct_ips' for jobs.
3) When running a cluster job, bind direct_ips to direct_ macs in test environment.
4) Test client connect with daemon client to test job.
Signed-off-by: Ren Wen 15991987063@163.com --- src/lib/sched.cr | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/src/lib/sched.cr b/src/lib/sched.cr index 070a16b..8922874 100644 --- a/src/lib/sched.cr +++ b/src/lib/sched.cr @@ -144,7 +144,7 @@ class Sched def get_cluster_config(cluster_file, lkp_initrd_user, os_arch) lkp_src = Jobfile::Operate.prepare_lkp_tests(lkp_initrd_user, os_arch) cluster_file_path = Path.new(lkp_src, "cluster", cluster_file) - return YAML.parse(File.read(cluster_file_path)).as_h + return YAML.parse_all(File.read(cluster_file_path)) end
def get_commit_date(job) @@ -198,8 +198,13 @@ class Sched # collect all job ids job_ids = [] of String
+ basic_config = cluster_config[0] + net_id = basic_config["net_id"]? || "192.168.222" + ip0 = basic_config["ip0"].as_i + # switch = basic_config["switch"].to_s # useless now + # steps for each host - cluster_config.each do |host, config| + cluster_config[-1].as_h.each do |host, config| tbox_group = host.to_s job_id = add_task(tbox_group, lab)
@@ -219,7 +224,15 @@ class Sched job["testbox"] = tbox_group job.update_tbox_group(tbox_group) job["node_roles"] = config["roles"].as_a.join(" ") - job["node_macs"] = config["macs"].as_a.join(" ") + direct_macs = config["macs"].as_a + direct_ips = [] of String + direct_macs.size.times do + raise "Host id is greater than 254, host_id: #{ip0}" if ip0 > 254 + direct_ips << "#{net_id}.#{ip0}" + ip0 += 1 + end + job["direct_macs"] = direct_macs.join(" ") + job["direct_ips"] = direct_ips.join(" ")
response = add_job(job, job_id) message = (response["error"]? ? response["error"]["root_cause"] : "")
On Thu, Oct 22, 2020 at 10:15:23PM +0800, Ren Wen wrote:
[Why] Nodes need to communicate with each other when testing cluster jobs. Ips can bind to on macs after setting 'direct_mac' and 'direct_ips' for cluster jobs
[HowTo]
- Need a cluster config file in $LKP_SRC/cluster,
for example:
assumption: <assumption01> switch: switch01 ip0: 3 --- vm-2p8g--renwen: roles: [ server ] macs: [ "ec:f4:bb:cb:7b:92" ] vm-2p4g--renwen: roles: [ client ] macs: [ "ec:f4:bb:cb:54:90" ]
nr_node=2 cs1 switch=sw1 ip0=0 ($ip_prefix.0-.1) cs2 switch=sw1 ip0=2 ($ip_prefix.2-.3)
This means cs1 or cs2 both have ip0, and this ip0 means the physical machine's last mask scope? if switch same they are use same ip_prefix.
"switch": only need if 2+ clusters share the same switch "ip0": only need if shared switch, and != 0
The ip0 field is mandatory not only if shared switch, it's the cluster machine's ip scope.
"assumption": the info about this cluster config
How to write this "assumption"? may be can give a exaple.
- Scheduler get config from config file and set 'direct_macs' and
'direct_ips' for jobs.
- When running a cluster job, bind direct_ips to direct_ macs
in test environment.
- Test client connect with daemon client to test job.
^ server
Thanks, Zhangyu
Signed-off-by: Ren Wen 15991987063@163.com
src/lib/sched.cr | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/src/lib/sched.cr b/src/lib/sched.cr index 070a16b..8922874 100644 --- a/src/lib/sched.cr +++ b/src/lib/sched.cr @@ -144,7 +144,7 @@ class Sched def get_cluster_config(cluster_file, lkp_initrd_user, os_arch) lkp_src = Jobfile::Operate.prepare_lkp_tests(lkp_initrd_user, os_arch) cluster_file_path = Path.new(lkp_src, "cluster", cluster_file)
- return YAML.parse(File.read(cluster_file_path)).as_h
return YAML.parse_all(File.read(cluster_file_path)) end
def get_commit_date(job)
@@ -198,8 +198,13 @@ class Sched # collect all job ids job_ids = [] of String
- basic_config = cluster_config[0]
- net_id = basic_config["net_id"]? || "192.168.222"
- ip0 = basic_config["ip0"].as_i
- # switch = basic_config["switch"].to_s # useless now
- # steps for each host
- cluster_config.each do |host, config|
- cluster_config[-1].as_h.each do |host, config| tbox_group = host.to_s job_id = add_task(tbox_group, lab)
@@ -219,7 +224,15 @@ class Sched job["testbox"] = tbox_group job.update_tbox_group(tbox_group) job["node_roles"] = config["roles"].as_a.join(" ")
job["node_macs"] = config["macs"].as_a.join(" ")
direct_macs = config["macs"].as_a
direct_ips = [] of String
direct_macs.size.times do
raise "Host id is greater than 254, host_id: #{ip0}" if ip0 > 254
direct_ips << "#{net_id}.#{ip0}"
ip0 += 1
end
job["direct_macs"] = direct_macs.join(" ")
job["direct_ips"] = direct_ips.join(" ") response = add_job(job, job_id) message = (response["error"]? ? response["error"]["root_cause"] : "")
-- 2.23.0
About this ,i'll write a todo.
Thanks, RenWen
On Fri, Oct 23, 2020 at 10:30:44AM +0800, Zhang Yu wrote:
On Thu, Oct 22, 2020 at 10:15:23PM +0800, Ren Wen wrote:
[Why] Nodes need to communicate with each other when testing cluster jobs. Ips can bind to on macs after setting 'direct_mac' and 'direct_ips' for cluster jobs
[HowTo]
- Need a cluster config file in $LKP_SRC/cluster,
for example:
assumption: <assumption01> switch: switch01 ip0: 3 --- vm-2p8g--renwen: roles: [ server ] macs: [ "ec:f4:bb:cb:7b:92" ] vm-2p4g--renwen: roles: [ client ] macs: [ "ec:f4:bb:cb:54:90" ]
nr_node=2 cs1 switch=sw1 ip0=0 ($ip_prefix.0-.1) cs2 switch=sw1 ip0=2 ($ip_prefix.2-.3)
This means cs1 or cs2 both have ip0, and this ip0 means the physical machine's last mask scope? if switch same they are use same ip_prefix.
"switch": only need if 2+ clusters share the same switch "ip0": only need if shared switch, and != 0
The ip0 field is mandatory not only if shared switch, it's the cluster machine's ip scope.
"assumption": the info about this cluster config
How to write this "assumption"? may be can give a exaple.
- Scheduler get config from config file and set 'direct_macs' and
'direct_ips' for jobs.
- When running a cluster job, bind direct_ips to direct_ macs
in test environment.
- Test client connect with daemon client to test job.
^ server
Thanks, Zhangyu
Signed-off-by: Ren Wen 15991987063@163.com
src/lib/sched.cr | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/src/lib/sched.cr b/src/lib/sched.cr index 070a16b..8922874 100644 --- a/src/lib/sched.cr +++ b/src/lib/sched.cr @@ -144,7 +144,7 @@ class Sched def get_cluster_config(cluster_file, lkp_initrd_user, os_arch) lkp_src = Jobfile::Operate.prepare_lkp_tests(lkp_initrd_user, os_arch) cluster_file_path = Path.new(lkp_src, "cluster", cluster_file)
- return YAML.parse(File.read(cluster_file_path)).as_h
return YAML.parse_all(File.read(cluster_file_path)) end
def get_commit_date(job)
@@ -198,8 +198,13 @@ class Sched # collect all job ids job_ids = [] of String
- basic_config = cluster_config[0]
- net_id = basic_config["net_id"]? || "192.168.222"
- ip0 = basic_config["ip0"].as_i
- # switch = basic_config["switch"].to_s # useless now
- # steps for each host
- cluster_config.each do |host, config|
- cluster_config[-1].as_h.each do |host, config| tbox_group = host.to_s job_id = add_task(tbox_group, lab)
@@ -219,7 +224,15 @@ class Sched job["testbox"] = tbox_group job.update_tbox_group(tbox_group) job["node_roles"] = config["roles"].as_a.join(" ")
job["node_macs"] = config["macs"].as_a.join(" ")
direct_macs = config["macs"].as_a
direct_ips = [] of String
direct_macs.size.times do
raise "Host id is greater than 254, host_id: #{ip0}" if ip0 > 254
direct_ips << "#{net_id}.#{ip0}"
ip0 += 1
end
job["direct_macs"] = direct_macs.join(" ")
job["direct_ips"] = direct_ips.join(" ") response = add_job(job, job_id) message = (response["error"]? ? response["error"]["root_cause"] : "")
-- 2.23.0