On Jul 7, 2020, at 4:41 PM, Ying Fang <fangying1@huawei.com> wrote:



On 7/6/2020 5:16 PM, Haibin Zhang wrote:
On Jul 2, 2020, at 4:07 PM, Haibin Zhang <haibincheung@hotmail.com <mailto:haibincheung@hotmail.com>> wrote:



On Jul 2, 2020, at 3:36 PM, Ying Fang <fangying1@huawei.com <mailto:fangying1@huawei.com>> wrote:



On 7/2/2020 3:18 PM, Haibin Zhang wrote:
Hi, Ying Fang.
Hi Haibin, Thanks for report this issue for us.
We will verify and response to you later.

I use qemu create a VM (96 cores and 360g ram) on my kunpeng 920-4826 server, then do unixbench tests.
The performance of pipe-based context switching decreases severely, which shows below:
  Guest# ./context1 60
  COUNT|4779535|1|lps
  COUNT|4779536|1|lps
  Host# ./context1 60
  COUNT|13990745|1|lps
  COUNT|13990746|1|lps
Could you provide more details about your OS and virtualization software version information for us?

Host/guest OS : CentOS-8.1.1911-aarch64
Qemu : https://gitee.com/src-openeuler/qemu.git


As you know, pipe-based context switching measures the number of times two processes can exchange an
increasing integer through a pipe in a duration (like 60 seconds). Times of guest is only 34% as that
of host. Attachment is the context1.c from unixbench. It looks simple, just spawn a child process with
which it carries on a bi-directional pipe conversation.

Yes, we had knowledge about this context test now. Thanks.
  Guest# taskset -c 20 ./context1 60
  COUNT|18059443|1|lps
  COUNT|18059443|1|lps
  Host# taskset -c 20 ./context1 60
  COUNT|17881974|1|lps
  COUNT|17881975|1|lps
Use taskset to set cpu affinity, which makes results better.
Can you help me fix this issue? Maybe WFE/WFI traps lead the performance cost? Is there PLE(Pause Loop Exit) feature on Kunpeng?

We had some performance tuning techniques in the openEuler OS.
And Xie XiangYou is the expert on this area, maybe he could help you with it.

Ok, install openEuler OS and test again
Host/Guest: openEuler release 20.03 (LTS)
Unixbench results:
System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Pipe-based Context Switching                   4000.0     206994.4     517.5 -> Host
Pipe-based Context Switching                   4000.0     108939.5     272.3 -> Guest

Hi, Haibin:

We did a test for UnixBench context1 on KunPeng 930 Server:

HostOS 128U256G:
openeulerversion=openEuler-20.03-LTS
compiletime=2020-03-28-18-00-01
gccversion=7.3.0-20190804.h30.oe1
kernelversion=4.19.90-2003.4.0.0036.oe1
openjdkversion=1.8.0.242.b08-1.h5.oe1

GuestOS 16U32G:
openeulerversion=openEuler-20.03-LTS
compiletime=2020-03-24-06-52-56
gccversion=7.3.0-20190804.h30.oe1
kernelversion=4.19.90-2003.4.0.0036.oe1
openjdkversion=1.8.0.242.b08-1.h5.oe1


HostOS:
COUNT|14025377|1|lps
COUNT|14025378|1|lps
GuestOS:
COUNT|13498345|1|lps
COUNT|13498345|1|lps

Are there some hardware enhancements on Kunpeng 930?
On Kunpeng 920, results show as follow.

HostOS:
         COUNT|12008901|1|lps
         COUNT|12008901|1|lps
GuestOS:
         COUNT|7288718|1|lps
         COUNT|7288718|1|lps

I can make sure that WFI TRAP causes the performance lost.
Stats from /sys/kernel/debug/kvm/ shows that guest WFI_exits more frequently during the test.
Context1 processes wait for block pipe and call WFI instruction to yield cpu core.




Moreover, you can drop us an issue on gitee via:
https://gitee.com/src-openeuler/qemu