From: Xiang Chen chenxiang66@hisilicon.com
Max sectors of limitations for scsi host can be set through scsi_host_template->max_sectors in scsi driver. But we find that max sectors may exceed scsi_host_template->max_sectors for SATA disk even if we set it. We find that it may be overwrote in some scsi drivers (which calls the callback slave_configure and also calls function ata_scsi_dev_config in it). The invoking relationship is as follows:
scsi_probe_and_add_lun ... scsi_alloc_sdev scsi_mq_alloc_queue ... __scsi_init_queue blk_queue_max_hw_sectors(q, shost->max_sectors) //max_sectors coming from sht->max_sectors scsi_change_queue_depth scsi_sysfs_device_initialize shost->hostt->slave_alloc() xxx_salve_configure ... ata_scsi_dev_config blk_queue_max_hw_sectors(q, dev->max_sectors) //max_sectors is overwrote by dev->max_sectors
To avoid the issue, set q->limits.max_sectors with the minimum value between dev->max_sectors and q->limits.max_sectors.
Signed-off-by: Xiang Chen chenxiang66@hisilicon.com --- drivers/ata/libata-scsi.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 48b8934..fb7b243 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1026,12 +1026,15 @@ EXPORT_SYMBOL_GPL(ata_scsi_dma_need_drain); int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) { struct request_queue *q = sdev->request_queue; + unsigned int max_sectors;
if (!ata_id_has_unload(dev->id)) dev->flags |= ATA_DFLAG_NO_UNLOAD;
/* configure max sectors */ - blk_queue_max_hw_sectors(q, dev->max_sectors); + max_sectors = min_t(unsigned int, dev->max_sectors, + q->limits.max_sectors); + blk_queue_max_hw_sectors(q, max_sectors);
if (dev->class == ATA_DEV_ATAPI) { sdev->sector_size = ATA_SECT_SIZE;
To avoid the issue, set q->limits.max_sectors with the minimum value between dev->max_sectors and q->limits.max_sectors.
dev->max_sectors describes the ATA hardware limitation (similar to shost->max_sectors for SCSI). Whereas q->limit.max_sectors is the block layer soft limit for filesystem I/O. That value should not be used to set blk_queue_max_hw_sectors(). Nor should queue limits currently in effect be used to configure what is essentially a hardware capability.
I suspect you need to clamp the libata dev->max_sectors value to sdev->host->max_sectors.
Greeting,
FYI, we noticed a -25.9% regression of stress-ng.copy-file.ops_per_sec due to commit:
commit: 2c76f9f255f01743b65a16b667355452a1f69b99 ("[PATCH] libata: configure max sectors properly") url: https://github.com/0day-ci/linux/commits/chenxiang/libata-configure-max-sect... base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
in testcase: stress-ng on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory with following parameters:
nr_threads: 10% disk: 1HDD testtime: 60s fs: f2fs class: filesystem test: copy-file cpufreq_governor: performance ucode: 0x5003006
In addition to that, the commit also has significant impact on the following tests:
+------------------+--------------------------------------------------------------------------------+ | testcase: change | stress-ng: stress-ng.readahead.ops_per_sec 71.5% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory | | test parameters | class=os | | | cpufreq_governor=performance | | | disk=1HDD | | | fs=ext4 | | | nr_threads=10% | | | test=readahead | | | testtime=60s | | | ucode=0x5003006 | +------------------+--------------------------------------------------------------------------------+
If you fix the issue, kindly add following tag Reported-by: kernel test robot oliver.sang@intel.com
Details are as below: -------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run bin/lkp run generated-yaml-file
========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode: filesystem/gcc-9/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/copy-file/stress-ng/60s/0x5003006
commit: 6bcec6cee5 ("Merge branch 'for-5.14/io_uring' into for-next") 2c76f9f255 ("libata: configure max sectors properly")
6bcec6cee54edf7e 2c76f9f255f01743b65a16b6673 ---------------- --------------------------- %stddev %change %stddev \ | \ 2882 -25.9% 2136 stress-ng.copy-file.ops 47.84 -25.9% 35.43 stress-ng.copy-file.ops_per_sec 11911328 -25.7% 8848424 stress-ng.time.file_system_outputs 16.67 ± 4% -40.0% 10.00 stress-ng.time.percent_of_cpu_this_job_got 10.63 ± 3% -38.4% 6.55 ± 3% stress-ng.time.system_time 115088 ± 7% -38.6% 70635 ± 47% stress-ng.time.voluntary_context_switches 7482 ± 15% +28.7% 9626 ± 5% softirqs.CPU50.SCHED 94.62 +1.4% 95.90 iostat.cpu.idle 3.29 ± 8% -36.5% 2.09 ± 8% iostat.cpu.iowait 3.12 ± 8% -1.1 1.99 ± 8% mpstat.cpu.all.iowait% 0.22 ± 5% -0.1 0.15 ± 7% mpstat.cpu.all.sys% 1765 ± 8% +70.9% 3017 ± 9% slabinfo.dmaengine-unmap-16.active_objs 1790 ± 8% +68.6% 3019 ± 9% slabinfo.dmaengine-unmap-16.num_objs 661211 ± 10% -46.4% 354112 ± 56% numa-meminfo.node0.Active 659853 ± 10% -46.5% 352767 ± 56% numa-meminfo.node0.Active(file) 4128 ± 13% -78.1% 903.00 ± 58% numa-meminfo.node0.Writeback 94.17 +1.5% 95.57 vmstat.cpu.id 88096 ± 8% -30.0% 61677 ± 8% vmstat.io.bo 2916956 ± 4% -8.7% 2662410 ± 4% vmstat.memory.cache 5184 ± 9% -29.0% 3679 ± 24% vmstat.system.cs 669025 ± 8% -25.4% 499071 ± 8% meminfo.Active 666140 ± 8% -25.5% 496122 ± 8% meminfo.Active(file) 2853772 ± 5% -9.1% 2593698 ± 5% meminfo.Cached 4908 ± 19% +61.2% 7909 ± 9% meminfo.Dirty 4300 ± 15% -64.2% 1540 ± 44% meminfo.Writeback 162504 ± 10% -46.5% 86958 ± 56% numa-vmstat.node0.nr_active_file 782687 ± 5% -40.6% 464832 ± 58% numa-vmstat.node0.nr_dirtied 1016 ± 17% -83.9% 163.29 ± 60% numa-vmstat.node0.nr_writeback 781002 ± 5% -40.6% 463755 ± 58% numa-vmstat.node0.nr_written 162504 ± 10% -46.5% 86958 ± 56% numa-vmstat.node0.nr_zone_active_file 1682 ± 9% -36.2% 1072 ± 55% numa-vmstat.node0.nr_zone_write_pending 12554 +41.9% 17815 interrupts.315:PCI-MSI.376832-edge.ahci[0000:00:17.0] 114120 ± 4% -22.4% 88521 ± 22% interrupts.CAL:Function_call_interrupts 3736 ± 80% -83.7% 610.57 ± 22% interrupts.CPU1.CAL:Function_call_interrupts 146.67 ± 38% -27.8% 105.86 ± 40% interrupts.CPU15.NMI:Non-maskable_interrupts 146.67 ± 38% -27.8% 105.86 ± 40% interrupts.CPU15.PMI:Performance_monitoring_interrupts 159.00 ± 39% -41.8% 92.57 ± 42% interrupts.CPU55.NMI:Non-maskable_interrupts 159.00 ± 39% -41.8% 92.57 ± 42% interrupts.CPU55.PMI:Performance_monitoring_interrupts 101.33 ± 21% +47.6% 149.57 ± 17% interrupts.CPU66.NMI:Non-maskable_interrupts 101.33 ± 21% +47.6% 149.57 ± 17% interrupts.CPU66.PMI:Performance_monitoring_interrupts 166540 ± 8% -25.4% 124177 ± 8% proc-vmstat.nr_active_file 1495731 -25.7% 1111429 proc-vmstat.nr_dirtied 1159 ± 12% +69.6% 1967 ± 10% proc-vmstat.nr_dirty 712459 ± 5% -9.0% 648586 ± 5% proc-vmstat.nr_file_pages 1128 ± 12% -69.6% 342.71 ± 21% proc-vmstat.nr_writeback 1495710 -25.7% 1111195 proc-vmstat.nr_written 166540 ± 8% -25.4% 124177 ± 8% proc-vmstat.nr_zone_active_file 279115 -9.3% 253162 proc-vmstat.pgactivate 5983119 -25.7% 4445141 proc-vmstat.pgpgout 3.115e+08 ± 10% -15.5% 2.631e+08 ± 14% perf-stat.i.branch-instructions 5107 ± 9% -30.0% 3573 ± 26% perf-stat.i.context-switches 3.829e+08 ± 7% -13.9% 3.299e+08 ± 10% perf-stat.i.dTLB-loads 1.881e+08 ± 5% -11.1% 1.672e+08 ± 7% perf-stat.i.dTLB-stores 1.529e+09 ± 10% -15.6% 1.291e+09 ± 14% perf-stat.i.instructions 1801 ± 8% -13.7% 1554 ± 12% perf-stat.i.instructions-per-iTLB-miss 9.19 ± 8% -13.9% 7.91 ± 11% perf-stat.i.metric.M/sec 483599 ± 10% -35.8% 310368 ± 9% perf-stat.i.node-loads 389216 ± 11% -33.0% 260605 ± 9% perf-stat.i.node-stores 1834 ± 9% -13.6% 1585 ± 14% perf-stat.overall.instructions-per-iTLB-miss 3.069e+08 ± 10% -15.5% 2.593e+08 ± 14% perf-stat.ps.branch-instructions 5029 ± 9% -30.0% 3522 ± 26% perf-stat.ps.context-switches 3.772e+08 ± 7% -13.8% 3.251e+08 ± 10% perf-stat.ps.dTLB-loads 1.852e+08 ± 5% -11.0% 1.648e+08 ± 7% perf-stat.ps.dTLB-stores 1.506e+09 ± 10% -15.6% 1.272e+09 ± 14% perf-stat.ps.instructions 476080 ± 10% -35.8% 305808 ± 9% perf-stat.ps.node-loads 383190 ± 11% -33.0% 256784 ± 9% perf-stat.ps.node-stores 6.03 ± 13% -2.8 3.21 ± 21% perf-profile.calltrace.cycles-pp.__generic_file_write_iter.f2fs_file_write_iter.do_iter_readv_writev.do_iter_write.iter_file_splice_write 6.17 ± 14% -2.8 3.37 ± 22% perf-profile.calltrace.cycles-pp.do_iter_write.iter_file_splice_write.direct_splice_actor.splice_direct_to_actor.do_splice_direct 6.17 ± 14% -2.8 3.37 ± 22% perf-profile.calltrace.cycles-pp.do_iter_readv_writev.do_iter_write.iter_file_splice_write.direct_splice_actor.splice_direct_to_actor 6.01 ± 13% -2.8 3.21 ± 21% perf-profile.calltrace.cycles-pp.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.do_iter_readv_writev.do_iter_write 6.17 ± 14% -2.8 3.37 ± 22% perf-profile.calltrace.cycles-pp.f2fs_file_write_iter.do_iter_readv_writev.do_iter_write.iter_file_splice_write.direct_splice_actor 9.91 ± 12% -2.8 7.14 ± 17% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 9.91 ± 12% -2.8 7.14 ± 17% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 6.23 ± 14% -2.8 3.46 ± 23% perf-profile.calltrace.cycles-pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.vfs_copy_file_range.__x64_sys_copy_file_range 6.23 ± 14% -2.8 3.46 ± 23% perf-profile.calltrace.cycles-pp.iter_file_splice_write.direct_splice_actor.splice_direct_to_actor.do_splice_direct.vfs_copy_file_range 6.67 ± 13% -2.7 4.00 ± 19% perf-profile.calltrace.cycles-pp.splice_direct_to_actor.do_splice_direct.vfs_copy_file_range.__x64_sys_copy_file_range.do_syscall_64 6.67 ± 13% -2.7 4.00 ± 19% perf-profile.calltrace.cycles-pp.do_splice_direct.vfs_copy_file_range.__x64_sys_copy_file_range.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.67 ± 13% -2.7 4.00 ± 20% perf-profile.calltrace.cycles-pp.vfs_copy_file_range.__x64_sys_copy_file_range.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.67 ± 13% -2.7 4.00 ± 20% perf-profile.calltrace.cycles-pp.__x64_sys_copy_file_range.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.86 ± 20% -1.9 0.94 ± 23% perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.do_iter_readv_writev 2.71 ± 20% -1.9 0.83 ± 24% perf-profile.calltrace.cycles-pp.f2fs_set_data_page_dirty.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter 2.65 ± 13% -0.9 1.75 ± 23% perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.do_iter_readv_writev 2.12 ± 16% -0.5 1.63 ± 13% perf-profile.calltrace.cycles-pp.f2fs_write_single_data_page.f2fs_write_cache_pages.f2fs_write_data_pages.do_writepages.__filemap_fdatawrite_range 1.98 ± 15% -0.4 1.53 ± 14% perf-profile.calltrace.cycles-pp.f2fs_do_write_data_page.f2fs_write_single_data_page.f2fs_write_cache_pages.f2fs_write_data_pages.do_writepages 0.98 ± 22% -0.3 0.72 ± 9% perf-profile.calltrace.cycles-pp.f2fs_outplace_write_data.f2fs_do_write_data_page.f2fs_write_single_data_page.f2fs_write_cache_pages.f2fs_write_data_pages 0.75 ± 11% -0.2 0.58 ± 8% perf-profile.calltrace.cycles-pp.do_write_page.f2fs_outplace_write_data.f2fs_do_write_data_page.f2fs_write_single_data_page.f2fs_write_cache_pages 86.95 +2.6 89.54 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 6.02 ± 13% -2.8 3.21 ± 21% perf-profile.children.cycles-pp.generic_perform_write 6.03 ± 13% -2.8 3.21 ± 21% perf-profile.children.cycles-pp.__generic_file_write_iter 6.17 ± 14% -2.8 3.37 ± 22% perf-profile.children.cycles-pp.do_iter_write 6.17 ± 14% -2.8 3.37 ± 22% perf-profile.children.cycles-pp.do_iter_readv_writev 6.17 ± 14% -2.8 3.37 ± 22% perf-profile.children.cycles-pp.f2fs_file_write_iter 6.23 ± 14% -2.8 3.46 ± 23% perf-profile.children.cycles-pp.direct_splice_actor 6.23 ± 14% -2.8 3.46 ± 23% perf-profile.children.cycles-pp.iter_file_splice_write 10.70 ± 11% -2.7 7.98 ± 15% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 10.69 ± 11% -2.7 7.97 ± 16% perf-profile.children.cycles-pp.do_syscall_64 6.67 ± 13% -2.7 4.00 ± 19% perf-profile.children.cycles-pp.splice_direct_to_actor 6.67 ± 13% -2.7 4.00 ± 19% perf-profile.children.cycles-pp.do_splice_direct 6.67 ± 13% -2.7 4.00 ± 20% perf-profile.children.cycles-pp.vfs_copy_file_range 6.67 ± 13% -2.7 4.00 ± 20% perf-profile.children.cycles-pp.__x64_sys_copy_file_range 2.87 ± 20% -1.9 0.94 ± 23% perf-profile.children.cycles-pp.f2fs_write_end 2.71 ± 20% -1.9 0.83 ± 24% perf-profile.children.cycles-pp.f2fs_set_data_page_dirty 2.13 ± 24% -1.8 0.34 ± 18% perf-profile.children.cycles-pp.f2fs_update_dirty_page 2.56 ± 19% -1.6 1.00 ± 15% perf-profile.children.cycles-pp._raw_spin_lock 1.59 ± 30% -1.5 0.10 ± 29% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 2.66 ± 13% -0.9 1.75 ± 23% perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic 2.13 ± 16% -0.5 1.63 ± 13% perf-profile.children.cycles-pp.f2fs_write_single_data_page 1.98 ± 15% -0.4 1.53 ± 14% perf-profile.children.cycles-pp.f2fs_do_write_data_page 0.98 ± 23% -0.3 0.72 ± 9% perf-profile.children.cycles-pp.f2fs_outplace_write_data 0.76 ± 12% -0.2 0.58 ± 8% perf-profile.children.cycles-pp.do_write_page 0.51 ± 7% -0.1 0.37 ± 7% perf-profile.children.cycles-pp.f2fs_allocate_data_block 0.14 ± 16% -0.0 0.10 ± 25% perf-profile.children.cycles-pp.___might_sleep 0.04 ± 71% +0.1 0.09 ± 19% perf-profile.children.cycles-pp.ksys_mmap_pgoff 0.05 ± 75% +0.1 0.10 ± 19% perf-profile.children.cycles-pp.add_to_page_cache_lru 0.05 ± 74% +0.1 0.11 ± 17% perf-profile.children.cycles-pp.mmap_region 0.01 ±223% +0.1 0.07 ± 21% perf-profile.children.cycles-pp.__add_to_page_cache_locked 0.06 ± 50% +0.1 0.12 ± 16% perf-profile.children.cycles-pp.do_mmap 0.02 ±149% +0.3 0.29 ± 38% perf-profile.children.cycles-pp.mutex_spin_on_owner 0.13 ± 91% +0.5 0.63 ± 39% perf-profile.children.cycles-pp.__mutex_lock 1.47 ± 27% -1.4 0.10 ± 29% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 2.58 ± 14% -0.9 1.66 ± 21% perf-profile.self.cycles-pp.iov_iter_copy_from_user_atomic 0.32 ± 16% -0.2 0.07 ± 14% perf-profile.self.cycles-pp.f2fs_update_dirty_page 0.13 ± 16% -0.1 0.07 ± 71% perf-profile.self.cycles-pp.f2fs_do_write_data_page 0.14 ± 17% -0.0 0.10 ± 28% perf-profile.self.cycles-pp.___might_sleep 0.02 ±149% +0.3 0.29 ± 38% perf-profile.self.cycles-pp.mutex_spin_on_owner
stress-ng.time.percent_of_cpu_this_job_got
18 +----------------------------------------------------------------------+ | + : :: + : +: | 17 |-+ + +. .+ +.+ +.++.++. +. .+ +.+ + + .++ +. + :.+| 16 |+.+ +++ + ++.++.+ + ++ ++ +.+ + : + + + | | + : : : | 15 |-+ + + | | | 14 |-+ | | | 13 |-+ | 12 |-+ | | | 11 |-+O O | | | 10 +----------------------------------------------------------------------+
stress-ng.time.file_system_outputs
1.25e+07 +----------------------------------------------------------------+ | .+ | 1.2e+07 |+.+++.+ ++ ++.+++.++++.+++.+++.++++.+++.+++.++++.+++.+++.++++.+| 1.15e+07 |-+ + | | | 1.1e+07 |-+ | | | 1.05e+07 |-+ | | | 1e+07 |-+ | 9.5e+06 |-+ | | | 9e+06 |-+OO OO O O O O O O | |O O OO O O OO O O OO O | 8.5e+06 +----------------------------------------------------------------+
stress-ng.copy-file.ops
3000 +--------------------------------------------------------------------+ 2900 |+. +.+ .+ +. +. + +.+ ++. +.+ +.+ +.+ .++ .++ .+ .+ .+| | + ++ + + + +.++ +.+ + + + + + +.++.+++ ++ + | 2800 |-+ | 2700 |-+ | | | 2600 |-+ | 2500 |-+ | 2400 |-+ | | | 2300 |-+ | 2200 |-+O | |O O OOO OOO O O OOO O OO OO | 2100 |-+ OO O O O | 2000 +--------------------------------------------------------------------+
stress-ng.copy-file.ops_per_sec
50 +----------------------------------------------------------------------+ |+. .+ .+ .+ +. +.+ .+ .+ + +. .+ | 48 |-+++ +.+++ + + ++.+ ++.++.++ ++ +.+ +.+ ++ ++.++.+++.++.++.+| 46 |-+ | | | 44 |-+ | | | 42 |-+ | | | 40 |-+ | 38 |-+ | | | 36 |-+O OO OO O OO O | |O O OO O O OOO O OOO O | 34 +----------------------------------------------------------------------+
[*] bisect-good sample [O] bisect-bad sample
*************************************************************************************************** lkp-csl-2sp5: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode: os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/readahead/stress-ng/60s/0x5003006
commit: 6bcec6cee5 ("Merge branch 'for-5.14/io_uring' into for-next") 2c76f9f255 ("libata: configure max sectors properly")
6bcec6cee54edf7e 2c76f9f255f01743b65a16b6673 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.07e+08 +71.5% 1.836e+08 ± 6% stress-ng.readahead.ops 1783834 +71.5% 3059261 ± 6% stress-ng.readahead.ops_per_sec 479.07 -11.1% 426.10 ± 6% stress-ng.time.system_time 55.77 ± 2% +93.4% 107.84 ± 25% stress-ng.time.user_time 6632490 ±221% -99.4% 42440 ± 11% cpuidle.POLL.time 0.04 ± 8% +20.0% 0.05 ± 13% perf-sched.wait_and_delay.avg.ms.pipe_read.new_sync_read.vfs_read.ksys_read 19597 -1.0% 19407 proc-vmstat.pgactivate 1726 ± 3% -6.0% 1623 vmstat.system.cs 1910 ± 97% -74.5% 487.57 ±198% interrupts.CPU59.NMI:Non-maskable_interrupts 1910 ± 97% -74.5% 487.57 ±198% interrupts.CPU59.PMI:Performance_monitoring_interrupts 8.39 ± 2% -10.6% 7.50 ± 5% iostat.cpu.system 0.92 ± 3% +87.6% 1.72 ± 27% iostat.cpu.user 57046 ± 3% -30.2% 39790 ± 51% numa-vmstat.node0.nr_anon_pages 6475 ± 28% +267.8% 23819 ± 85% numa-vmstat.node1.nr_anon_pages 0.02 ± 4% +0.0 0.03 ± 6% mpstat.cpu.all.iowait% 7.06 ± 2% -0.8 6.22 ± 5% mpstat.cpu.all.sys% 0.87 ± 3% +0.8 1.62 ± 27% mpstat.cpu.all.usr% 97178 ± 9% -38.7% 59552 ± 54% numa-meminfo.node0.AnonHugePages 228163 ± 3% -30.2% 159145 ± 51% numa-meminfo.node0.AnonPages 164183 ± 9% -13.0% 142812 ± 14% numa-meminfo.node0.Slab 5744 ± 18% +515.7% 35368 ± 98% numa-meminfo.node1.AnonHugePages 25887 ± 28% +267.9% 95228 ± 85% numa-meminfo.node1.AnonPages 44270 ± 13% +156.2% 113422 ± 70% numa-meminfo.node1.AnonPages.max 9369 ± 6% -12.5% 8197 ± 20% softirqs.CPU59.SCHED 9695 ± 3% -12.7% 8463 ± 8% softirqs.CPU61.SCHED 9770 ± 4% -9.0% 8887 ± 10% softirqs.CPU66.SCHED 9473 ± 6% -14.5% 8096 ± 11% softirqs.CPU84.SCHED 9441 ± 6% -18.0% 7745 ± 20% softirqs.CPU85.SCHED 12271 ± 17% -15.3% 10392 ± 7% softirqs.TIMER 2.36 ± 21% +76.4% 4.16 ± 29% perf-stat.i.MPKI 1.545e+10 ± 2% -10.5% 1.382e+10 ± 6% perf-stat.i.branch-instructions 13484423 ± 3% +42.6% 19227872 ± 3% perf-stat.i.branch-misses 21671037 ± 12% +56.0% 33811435 ± 18% perf-stat.i.cache-misses 1.164e+08 ± 2% +66.2% 1.934e+08 ± 6% perf-stat.i.cache-references 1499 ± 4% -5.2% 1421 ± 2% perf-stat.i.context-switches 1323 ± 13% -26.6% 971.56 ± 23% perf-stat.i.cycles-between-cache-misses 1.749e+10 -14.4% 1.498e+10 ± 7% perf-stat.i.dTLB-loads 1.073e+10 ± 2% -19.9% 8.6e+09 ± 6% perf-stat.i.dTLB-stores 4054770 ± 6% +61.9% 6566022 ± 10% perf-stat.i.iTLB-load-misses 6.471e+10 ± 2% -13.4% 5.604e+10 ± 6% perf-stat.i.instructions 15495 ± 7% -46.3% 8315 ± 12% perf-stat.i.instructions-per-iTLB-miss 2.37 -13.1% 2.06 ± 6% perf-stat.i.ipc 201.39 ± 13% +43.3% 288.53 ± 13% perf-stat.i.metric.K/sec 456.15 -14.2% 391.60 ± 6% perf-stat.i.metric.M/sec 14406997 ± 14% +58.0% 22762612 ± 19% perf-stat.i.node-loads 111506 ± 28% +93.5% 215774 ± 25% perf-stat.i.node-store-misses 1.80 +91.9% 3.45 perf-stat.overall.MPKI 0.09 ± 2% +0.1 0.14 ± 4% perf-stat.overall.branch-miss-rate% 0.41 +14.7% 0.47 ± 7% perf-stat.overall.cpi 1235 ± 12% -35.5% 796.96 ± 17% perf-stat.overall.cycles-between-cache-misses 69.64 ± 4% +6.5 76.12 ± 3% perf-stat.overall.iTLB-load-miss-rate% 16027 ± 7% -46.1% 8642 ± 12% perf-stat.overall.instructions-per-iTLB-miss 2.46 -12.5% 2.15 ± 6% perf-stat.overall.ipc 1.521e+10 ± 2% -10.5% 1.362e+10 ± 6% perf-stat.ps.branch-instructions 13272497 ± 3% +42.7% 18933267 ± 3% perf-stat.ps.branch-misses 21338470 ± 12% +56.0% 33294227 ± 18% perf-stat.ps.cache-misses 1.146e+08 ± 2% +66.2% 1.905e+08 ± 6% perf-stat.ps.cache-references 1476 ± 4% -5.3% 1399 ± 2% perf-stat.ps.context-switches 1.723e+10 -14.3% 1.476e+10 ± 7% perf-stat.ps.dTLB-loads 1.057e+10 ± 2% -19.9% 8.471e+09 ± 6% perf-stat.ps.dTLB-stores 3992776 ± 6% +62.0% 6467207 ± 10% perf-stat.ps.iTLB-load-misses 6.372e+10 -13.4% 5.52e+10 ± 6% perf-stat.ps.instructions 14185966 ± 14% +58.0% 22414685 ± 19% perf-stat.ps.node-loads 109739 ± 28% +93.7% 212520 ± 25% perf-stat.ps.node-store-misses 4.112e+12 -12.5% 3.599e+12 ± 6% perf-stat.total.instructions 55.43 -11.8 43.67 ± 7% perf-profile.calltrace.cycles-pp.page_cache_ra_unbounded.generic_fadvise.ksys_readahead.do_syscall_64.entry_SYSCALL_64_after_hwframe 55.92 -11.4 44.53 ± 7% perf-profile.calltrace.cycles-pp.generic_fadvise.ksys_readahead.do_syscall_64.entry_SYSCALL_64_after_hwframe 56.20 -11.2 45.04 ± 7% perf-profile.calltrace.cycles-pp.ksys_readahead.do_syscall_64.entry_SYSCALL_64_after_hwframe 39.67 -8.6 31.12 ± 6% perf-profile.calltrace.cycles-pp.xa_load.page_cache_ra_unbounded.generic_fadvise.ksys_readahead.do_syscall_64 30.84 -6.6 24.21 ± 6% perf-profile.calltrace.cycles-pp.xas_load.xa_load.page_cache_ra_unbounded.generic_fadvise.ksys_readahead 64.08 -2.8 61.29 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 10.02 ± 2% -2.2 7.85 ± 7% perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_unbounded.generic_fadvise.ksys_readahead.do_syscall_64 7.17 ± 2% -1.5 5.65 ± 7% perf-profile.calltrace.cycles-pp.xas_start.xas_load.xa_load.page_cache_ra_unbounded.generic_fadvise 2.10 ± 3% -0.4 1.69 ± 8% perf-profile.calltrace.cycles-pp.rcu_read_unlock_strict.xa_load.page_cache_ra_unbounded.generic_fadvise.ksys_readahead 0.45 ± 45% +0.4 0.90 ± 7% perf-profile.calltrace.cycles-pp.__entry_text_start 0.00 +0.6 0.62 ± 8% perf-profile.calltrace.cycles-pp.touch_atime.filemap_read.new_sync_read.vfs_read.ksys_pread64 1.16 ± 9% +1.1 2.26 ± 12% perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_read.new_sync_read.vfs_read 1.26 ± 8% +1.2 2.44 ± 11% perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.new_sync_read.vfs_read.ksys_pread64 0.00 +2.4 2.41 ±186% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.23 ± 3% +2.5 5.76 ± 9% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout.copy_page_to_iter.filemap_read.new_sync_read 3.28 ± 3% +2.6 5.86 ± 9% perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.filemap_read.new_sync_read.vfs_read 3.61 ± 3% +2.8 6.46 ± 9% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.new_sync_read.vfs_read.ksys_pread64 5.93 ± 2% +4.9 10.84 ± 8% perf-profile.calltrace.cycles-pp.filemap_read.new_sync_read.vfs_read.ksys_pread64.do_syscall_64 6.25 ± 2% +5.2 11.43 ± 8% perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.92 ± 2% +5.7 12.65 ± 8% perf-profile.calltrace.cycles-pp.vfs_read.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.24 ± 2% +6.0 13.25 ± 8% perf-profile.calltrace.cycles-pp.ksys_pread64.do_syscall_64.entry_SYSCALL_64_after_hwframe 55.50 -11.7 43.75 ± 7% perf-profile.children.cycles-pp.page_cache_ra_unbounded 55.94 -11.4 44.56 ± 7% perf-profile.children.cycles-pp.generic_fadvise 56.22 -11.2 45.07 ± 7% perf-profile.children.cycles-pp.ksys_readahead 39.69 -8.6 31.12 ± 7% perf-profile.children.cycles-pp.xa_load 31.04 -6.5 24.55 ± 7% perf-profile.children.cycles-pp.xas_load 64.19 -2.8 61.43 ± 2% perf-profile.children.cycles-pp.do_syscall_64 10.06 ± 2% -2.2 7.90 ± 7% perf-profile.children.cycles-pp.read_pages 7.17 ± 2% -1.4 5.74 ± 7% perf-profile.children.cycles-pp.xas_start 2.14 ± 3% -0.4 1.73 ± 7% perf-profile.children.cycles-pp.rcu_read_unlock_strict 0.06 ± 13% +0.0 0.10 ± 10% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.02 ±141% +0.0 0.06 ± 13% perf-profile.children.cycles-pp.get_next_timer_interrupt 0.07 ± 10% +0.1 0.12 ± 11% perf-profile.children.cycles-pp.__might_sleep 0.08 ± 12% +0.1 0.13 ± 8% perf-profile.children.cycles-pp.___might_sleep 0.07 ± 7% +0.1 0.12 ± 9% perf-profile.children.cycles-pp.__might_fault 0.06 ± 14% +0.1 0.12 ± 5% perf-profile.children.cycles-pp.make_kgid 0.06 ± 8% +0.1 0.11 ± 12% perf-profile.children.cycles-pp.aa_file_perm 0.03 ±100% +0.1 0.08 ± 18% perf-profile.children.cycles-pp.timestamp_truncate 0.00 +0.1 0.06 ± 10% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup 0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.generic_file_read_iter 0.00 +0.1 0.07 ± 13% perf-profile.children.cycles-pp.do_page_cache_ra 0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.rcu_all_qs 0.00 +0.1 0.07 ± 9% perf-profile.children.cycles-pp.rw_verify_area 0.08 ± 7% +0.1 0.15 ± 10% perf-profile.children.cycles-pp.make_kuid 0.06 ± 15% +0.1 0.13 ± 11% perf-profile.children.cycles-pp.__cond_resched 0.10 ± 4% +0.1 0.18 ± 13% perf-profile.children.cycles-pp.current_time 0.12 ± 8% +0.1 0.21 ± 11% perf-profile.children.cycles-pp.syscall_enter_from_user_mode 0.14 ± 11% +0.1 0.23 ± 10% perf-profile.children.cycles-pp.mark_page_accessed 0.12 ± 14% +0.1 0.22 ± 8% perf-profile.children.cycles-pp.ext4_file_read_iter 0.11 ± 7% +0.1 0.22 ± 8% perf-profile.children.cycles-pp.map_id_range_down 0.17 ± 8% +0.1 0.30 ± 10% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 0.34 ± 15% +0.2 0.50 ± 7% perf-profile.children.cycles-pp.scheduler_tick 0.19 ± 9% +0.2 0.34 ± 12% perf-profile.children.cycles-pp.common_file_perm 0.21 ± 3% +0.2 0.37 ± 11% perf-profile.children.cycles-pp.__fsnotify_parent 0.21 ± 7% +0.2 0.39 ± 8% perf-profile.children.cycles-pp.force_page_cache_ra 0.21 ± 6% +0.2 0.39 ± 7% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.26 ± 8% +0.2 0.48 ± 11% perf-profile.children.cycles-pp.security_file_permission 0.32 ± 5% +0.2 0.57 ± 9% perf-profile.children.cycles-pp.atime_needs_update 0.35 ± 5% +0.3 0.63 ± 8% perf-profile.children.cycles-pp.touch_atime 0.34 ± 7% +0.3 0.65 ± 9% perf-profile.children.cycles-pp.__fget_light 0.53 ± 7% +0.4 0.90 ± 7% perf-profile.children.cycles-pp.__entry_text_start 0.83 ± 4% +0.7 1.51 ± 5% perf-profile.children.cycles-pp.syscall_return_via_sysret 1.17 ± 8% +1.1 2.28 ± 12% perf-profile.children.cycles-pp.filemap_get_read_batch 1.27 ± 8% +1.2 2.46 ± 11% perf-profile.children.cycles-pp.filemap_get_pages 0.35 ± 4% +2.1 2.48 ±180% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 3.26 ± 3% +2.6 5.82 ± 9% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 3.28 ± 3% +2.6 5.86 ± 9% perf-profile.children.cycles-pp.copyout 3.62 ± 3% +2.9 6.49 ± 9% perf-profile.children.cycles-pp.copy_page_to_iter 5.96 ± 2% +4.9 10.90 ± 8% perf-profile.children.cycles-pp.filemap_read 6.26 ± 2% +5.2 11.45 ± 8% perf-profile.children.cycles-pp.new_sync_read 6.95 ± 2% +5.7 12.69 ± 8% perf-profile.children.cycles-pp.vfs_read 7.25 ± 2% +6.0 13.25 ± 8% perf-profile.children.cycles-pp.ksys_pread64 23.77 -5.0 18.73 ± 7% perf-profile.self.cycles-pp.xas_load 9.96 ± 2% -2.2 7.78 ± 7% perf-profile.self.cycles-pp.read_pages 8.74 ± 2% -1.9 6.82 ± 7% perf-profile.self.cycles-pp.xa_load 6.03 ± 2% -1.2 4.84 ± 7% perf-profile.self.cycles-pp.xas_start 5.77 ± 2% -1.0 4.78 ± 9% perf-profile.self.cycles-pp.page_cache_ra_unbounded 1.09 ± 2% -0.2 0.88 ± 6% perf-profile.self.cycles-pp.rcu_read_unlock_strict 0.06 ± 11% +0.0 0.10 ± 11% perf-profile.self.cycles-pp.__might_sleep 0.06 ± 14% +0.0 0.11 ± 9% perf-profile.self.cycles-pp.security_file_permission 0.07 ± 8% +0.1 0.12 ± 11% perf-profile.self.cycles-pp.___might_sleep 0.09 ± 16% +0.1 0.14 ± 14% perf-profile.self.cycles-pp.atime_needs_update 0.03 ±100% +0.1 0.08 ± 10% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 0.08 ± 8% +0.1 0.14 ± 14% perf-profile.self.cycles-pp.ksys_pread64 0.03 ± 70% +0.1 0.09 ± 9% perf-profile.self.cycles-pp.aa_file_perm 0.00 +0.1 0.06 ± 13% perf-profile.self.cycles-pp.generic_file_read_iter 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.touch_atime 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.rw_verify_area 0.00 +0.1 0.07 ± 11% perf-profile.self.cycles-pp.__cond_resched 0.01 ±223% +0.1 0.08 ± 16% perf-profile.self.cycles-pp.timestamp_truncate 0.09 ± 7% +0.1 0.17 ± 11% perf-profile.self.cycles-pp.syscall_enter_from_user_mode 0.10 ± 8% +0.1 0.18 ± 7% perf-profile.self.cycles-pp.filemap_get_pages 0.12 ± 11% +0.1 0.21 ± 11% perf-profile.self.cycles-pp.mark_page_accessed 0.11 ± 8% +0.1 0.20 ± 8% perf-profile.self.cycles-pp.map_id_range_down 0.12 ± 16% +0.1 0.22 ± 7% perf-profile.self.cycles-pp.ext4_file_read_iter 0.15 ± 6% +0.1 0.26 ± 7% perf-profile.self.cycles-pp.ksys_readahead 0.12 ± 10% +0.1 0.23 ± 13% perf-profile.self.cycles-pp.new_sync_read 0.12 ± 5% +0.1 0.24 ± 37% perf-profile.self.cycles-pp.do_syscall_64 0.17 ± 9% +0.1 0.30 ± 10% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.18 ± 5% +0.1 0.32 ± 8% perf-profile.self.cycles-pp.exit_to_user_mode_prepare 0.19 ± 9% +0.1 0.32 ± 8% perf-profile.self.cycles-pp.vfs_read 0.19 ± 7% +0.2 0.35 ± 7% perf-profile.self.cycles-pp.copy_page_to_iter 0.20 ± 9% +0.2 0.37 ± 8% perf-profile.self.cycles-pp.force_page_cache_ra 0.20 ± 5% +0.2 0.37 ± 12% perf-profile.self.cycles-pp.__fsnotify_parent 0.20 ± 11% +0.2 0.38 ± 10% perf-profile.self.cycles-pp.generic_fadvise 0.31 ± 3% +0.3 0.58 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.33 ± 6% +0.3 0.62 ± 9% perf-profile.self.cycles-pp.__fget_light 0.53 ± 7% +0.4 0.90 ± 7% perf-profile.self.cycles-pp.__entry_text_start 0.56 ± 2% +0.5 1.03 ± 7% perf-profile.self.cycles-pp.filemap_read 0.81 ± 4% +0.7 1.48 ± 5% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.97 ± 11% +0.9 1.91 ± 13% perf-profile.self.cycles-pp.filemap_get_read_batch 0.09 ± 4% +1.9 2.02 ±223% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 3.24 ± 3% +2.5 5.78 ± 9% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
--- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks, Oliver Sang