- Kernel - mailweb.openeuler.org

[PATCH openEuler-1.0-LTS] sched: Fix null pointer derefrence for sd->span
by Zhang Changzhong 30 Jun '23

30 Jun '23

From: Hui Tang <tanghui20(a)huawei.com> hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7HFZV CVE: NA ---------------------------------------- There may be NULL pointer derefrence when hotplug running and creating taskgroup concurrently. sched_autogroup_create_attach -> sched_create_group -> alloc_fair_sched_group -> init_auto_affinity -> init_affinity_domains -> cpumask_copy(xx, sched_domain_span(tmp)) { tmp may be free due rcu lock missing } { hotplug will rebuild sched domain } sched_cpu_activate -> build_sched_domains -> cpuset_cpu_active -> partition_sched_domains -> build_sched_domains -> cpu_attach_domain -> destroy_sched_domains -> call_rcu(&sd->rcu, destroy_sched_domains_rcu) So sd should be protect with rcu lock in entire critical zone. [ 599.811593] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 600.112821] pc : init_affinity_domains+0xf4/0x200 [ 600.125918] lr : init_affinity_domains+0xd4/0x200 [ 600.331355] Call trace: [ 600.338734] init_affinity_domains+0xf4/0x200 [ 600.347955] init_auto_affinity+0x78/0xc0 [ 600.356622] alloc_fair_sched_group+0xd8/0x210 [ 600.365594] sched_create_group+0x48/0xc0 [ 600.373970] sched_autogroup_create_attach+0x54/0x190 [ 600.383311] ksys_setsid+0x110/0x130 [ 600.391014] __arm64_sys_setsid+0x18/0x24 [ 600.399156] el0_svc_common+0x118/0x170 [ 600.406818] el0_svc_handler+0x3c/0x80 [ 600.414188] el0_svc+0x8/0x640 [ 600.420719] Code: b40002c0 9104e002 f9402061 a9401444 (a9001424) [ 600.430504] SMP: stopping secondary CPUs [ 600.441751] Starting crashdump kernel... Fixes: 713cfd2684fa ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: Hui Tang <tanghui20(a)huawei.com> Reviewed-by: Zhang Qiao <zhangqiao22(a)huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com> --- kernel/sched/fair.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e9eb00e..622d433 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5582,7 +5582,7 @@ void free_affinity_domains(struct affinity_domain *ad) { int i; - for (i = 0; i < ad->dcount; i++) { + for (i = 0; i < AD_LEVEL_MAX; i++) { kfree(ad->domains[i]); kfree(ad->domains_orig[i]); ad->domains[i] = NULL; @@ -5621,6 +5621,12 @@ static int init_affinity_domains(struct affinity_domain *ad) int i = 0; int cpu; + for (i = 0; i < AD_LEVEL_MAX; i++) { + ad->domains[i] = kmalloc(sizeof(cpumask_t), GFP_KERNEL); + if (!ad->domains[i]) + goto err; + } + rcu_read_lock(); cpu = cpumask_first_and(cpu_active_mask, housekeeping_cpumask(HK_FLAG_DOMAIN)); @@ -5629,21 +5635,12 @@ static int init_affinity_domains(struct affinity_domain *ad) dcount++; } - if (!sd) { + if (!sd || dcount > AD_LEVEL_MAX) { rcu_read_unlock(); - return -EINVAL; - } - rcu_read_unlock(); - - for (i = 0; i < dcount; i++) { - ad->domains[i] = kmalloc(sizeof(cpumask_t), GFP_KERNEL); - if (!ad->domains[i]) { - ad->dcount = i; - goto err; - } + ret = -EINVAL; + goto err; } - rcu_read_lock(); idlest = sd_find_idlest_group(sd); cpu = group_find_idlest_cpu(idlest); i = 0; -- 2.9.5

1 0

[PATCH v2 OLK-5.10 0/2] Add lz4k support for zram
by Tu Jinjiang 30 Jun '23

30 Jun '23

Tu Jinjiang (2): crypto: add lz4k Cryptographic API mm/zram: Add lz4k support for zram crypto/Kconfig | 8 + crypto/Makefile | 1 + crypto/lz4k.c | 100 ++++++ drivers/block/zram/zcomp.c | 3 + include/linux/lz4k.h | 384 +++++++++++++++++++++++ lib/Kconfig | 6 + lib/Makefile | 2 + lib/lz4k/Makefile | 2 + lib/lz4k/lz4k_decode.c | 314 +++++++++++++++++++ lib/lz4k/lz4k_encode.c | 554 +++++++++++++++++++++++++++++++++ lib/lz4k/lz4k_encode_private.h | 142 +++++++++ lib/lz4k/lz4k_private.h | 282 +++++++++++++++++ 12 files changed, 1798 insertions(+) create mode 100644 crypto/lz4k.c create mode 100644 include/linux/lz4k.h create mode 100644 lib/lz4k/Makefile create mode 100644 lib/lz4k/lz4k_decode.c create mode 100644 lib/lz4k/lz4k_encode.c create mode 100644 lib/lz4k/lz4k_encode_private.h create mode 100644 lib/lz4k/lz4k_private.h -- 2.25.1

2 3

Re: [PATCH OLK-5.10] mm/zram: add lz4k algorithm
by Tu Jinjiang 30 Jun '23

30 Jun '23

在 2023/6/30 10:39, Kefeng Wang 写道: > > 整体的代码风格都刷新下，按照linux的要求来吧 > > 正在修改。 > On 2023/6/30 10:41, Tu Jinjiang wrote: >> hulk inclusion >> category: feature >> bugzilla: https://gitee.com/openeuler/kernel/issues/I7H9IA >> CVE: NA >> >> ------------------------------------------- >> >> Add lz4k algorithm support for zram. > 有没有原始作者信息和commit信息，要保留我这边没有这些信息。 >> >> Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> >> Signed-off-by: Tu Jinjiang <tujinjiang(a)huawei.com> >> --- >> crypto/Kconfig | 8 + >> crypto/Makefile | 1 + >> crypto/lz4k.c | 97 ++++++ >> drivers/block/zram/zcomp.c | 3 + >> include/linux/lz4k.h | 383 +++++++++++++++++++++++ >> lib/Kconfig | 6 + >> lib/Makefile | 2 + >> lib/lz4k/Makefile | 2 + >> lib/lz4k/lz4k_decode.c | 308 +++++++++++++++++++ >> lib/lz4k/lz4k_encode.c | 539 +++++++++++++++++++++++++++++++++ >> lib/lz4k/lz4k_encode_private.h | 137 +++++++++ >> lib/lz4k/lz4k_private.h | 269 ++++++++++++++++ >> 12 files changed, 1755 insertions(+) >> create mode 100644 crypto/lz4k.c >> create mode 100644 include/linux/lz4k.h >> create mode 100644 lib/lz4k/Makefile >> create mode 100644 lib/lz4k/lz4k_decode.c >> create mode 100644 lib/lz4k/lz4k_encode.c >> create mode 100644 lib/lz4k/lz4k_encode_private.h >> create mode 100644 lib/lz4k/lz4k_private.h >> >> diff --git a/crypto/Kconfig b/crypto/Kconfig >> index 64cb304f5103..35223cff7c8a 100644 >> --- a/crypto/Kconfig >> +++ b/crypto/Kconfig >> @@ -1871,6 +1871,14 @@ config CRYPTO_LZ4HC >> help >> This is the LZ4 high compression mode algorithm. >> +config CRYPTO_LZ4K >> + tristate "LZ4K compression algorithm" >> + select CRYPTO_ALGAPI >> + select LZ4K_COMPRESS >> + select LZ4K_DECOMPRESS >> + help >> + This is the LZ4K algorithm. >> + >> config CRYPTO_ZSTD >> tristate "Zstd compression algorithm" >> select CRYPTO_ALGAPI >> diff --git a/crypto/Makefile b/crypto/Makefile >> index 9d1191f2b741..5c3b0a0839c5 100644 >> --- a/crypto/Makefile >> +++ b/crypto/Makefile >> @@ -161,6 +161,7 @@ obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o >> authencesn.o >> obj-$(CONFIG_CRYPTO_LZO) += lzo.o lzo-rle.o >> obj-$(CONFIG_CRYPTO_LZ4) += lz4.o >> obj-$(CONFIG_CRYPTO_LZ4HC) += lz4hc.o >> +obj-$(CONFIG_CRYPTO_LZ4K) += lz4k.o >> obj-$(CONFIG_CRYPTO_XXHASH) += xxhash_generic.o >> obj-$(CONFIG_CRYPTO_842) += 842.o >> obj-$(CONFIG_CRYPTO_RNG2) += rng.o >> diff --git a/crypto/lz4k.c b/crypto/lz4k.c >> new file mode 100644 >> index 000000000000..8daceab269ef >> --- /dev/null >> +++ b/crypto/lz4k.c >> @@ -0,0 +1,97 @@ >> +/* >> + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights >> reserved. >> + * Description: LZ4K compression algorithm for ZRAM >> + * Author: Arkhipov Denis arkhipov.denis(a)huawei.com >> + * Create: 2020-03-25 >> + */ >> + >> +#include <linux/init.h> >> +#include <linux/module.h> >> +#include <linux/crypto.h> >> +#include <linux/vmalloc.h> >> +#include <linux/lz4k.h> >> + >> + >> +struct lz4k_ctx { >> + void *lz4k_comp_mem; >> +}; >> + >> +static int lz4k_init(struct crypto_tfm *tfm) >> +{ >> + struct lz4k_ctx *ctx = crypto_tfm_ctx(tfm); >> + >> + ctx->lz4k_comp_mem = vmalloc(lz4k_encode_state_bytes_min()); >> + if (!ctx->lz4k_comp_mem) >> + return -ENOMEM; >> + >> + return 0; >> +} >> + >> +static void lz4k_exit(struct crypto_tfm *tfm) >> +{ >> + struct lz4k_ctx *ctx = crypto_tfm_ctx(tfm); >> + vfree(ctx->lz4k_comp_mem); >> +} >> + >> +static int lz4k_compress_crypto(struct crypto_tfm *tfm, const u8 >> *src, unsigned int slen, u8 *dst, unsigned int *dlen) >> +{ >> + struct lz4k_ctx *ctx = crypto_tfm_ctx(tfm); >> + int ret; >> + >> + ret = lz4k_encode(ctx->lz4k_comp_mem, src, dst, slen, *dlen, 0); >> + > 去掉空行 >> + if (ret < 0) { >> + return -EINVAL; >> + } > 去掉括号 >> + >> + if (ret) >> + *dlen = ret; >> + >> + return 0; >> +} >> + >> +static int lz4k_decompress_crypto(struct crypto_tfm *tfm, const u8 >> *src, unsigned int slen, u8 *dst, unsigned int *dlen) >> +{ >> + int ret; >> + >> + ret = lz4k_decode(src, dst, slen, *dlen); >> + > 去空行 >> + if (ret <= 0) >> + return -EINVAL; > 加空行 >> + *dlen = ret; >> + return 0; >> +} >> + >> +static struct crypto_alg alg_lz4k = { >> + .cra_name = "lz4k", >> + .cra_driver_name = "lz4k-generic", >> + .cra_flags = CRYPTO_ALG_TYPE_COMPRESS, >> + .cra_ctxsize = sizeof(struct lz4k_ctx), >> + .cra_module = THIS_MODULE, >> + .cra_list = LIST_HEAD_INIT(alg_lz4k.cra_list), >> + .cra_init = lz4k_init, >> + .cra_exit = lz4k_exit, >> + .cra_u = { >> + .compress = { >> + .coa_compress = lz4k_compress_crypto, >> + .coa_decompress = lz4k_decompress_crypto >> + } >> + } >> +}; >> + >> +static int __init lz4k_mod_init(void) >> +{ >> + return crypto_register_alg(&alg_lz4k); >> +} >> + >> +static void __exit lz4k_mod_fini(void) >> +{ >> + crypto_unregister_alg(&alg_lz4k); >> +} >> + >> +module_init(lz4k_mod_init); >> +module_exit(lz4k_mod_fini); >> + >> +MODULE_LICENSE("GPL"); >> +MODULE_DESCRIPTION("LZ4K Compression Algorithm"); >> +MODULE_ALIAS_CRYPTO("lz4k"); >> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c >> index b08650417bf0..28bda2035326 100644 >> --- a/drivers/block/zram/zcomp.c >> +++ b/drivers/block/zram/zcomp.c >> @@ -29,6 +29,9 @@ static const char * const backends[] = { >> #if IS_ENABLED(CONFIG_CRYPTO_ZSTD) >> "zstd", >> #endif >> +#if IS_ENABLED(CONFIG_CRYPTO_LZ4K) >> + "lz4k", >> +#endif >> }; >> static void zcomp_strm_free(struct zcomp_strm *zstrm) >> diff --git a/include/linux/lz4k.h b/include/linux/lz4k.h >> new file mode 100644 >> index 000000000000..6e73161b1840 >> --- /dev/null >> +++ b/include/linux/lz4k.h >> @@ -0,0 +1,383 @@ >> +/* >> + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights >> reserved. >> + * Description: LZ4K compression algorithm >> + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com >> + * Created: 2020-03-25 >> + */ >> + >> +#ifndef _LZ4K_H >> +#define _LZ4K_H >> + >> +/* file lz4k.h >> + This file contains the platform-independent API of LZ-class >> + lossless codecs (compressors/decompressors) with complete >> + in-place documentation. The documentation is formatted >> + in accordance with DOXYGEN mark-up format. So, one can >> + generate proper documentation, e.g. in HTML format, using DOXYGEN. >> + >> + Currently, LZ-class codecs, documented here, implement following >> + algorithms for lossless data compression/decompression: >> + \li "LZ HUAWEI" proprietary codec competing with LZ4 - lz4k_encode(), >> + lz4k_encode_delta(), lz4k_decode(), lz4k_decode_delta() >> + >> + The LZ HUAWEI compressors accept any data as input and compress it >> + without loss to a smaller size if possible. >> + Compressed data produced by LZ HUAWEI compressor API lz4k_encode*(), >> + can be decompressed only by lz4k_decode() API documented below.\n >> + */ >> + >> +/* >> + lz4k_status defines simple set of status values returned by Huawei >> APIs >> + */ > > 各种Huawei 都改成 lz4k之类的算法本身不要保留 LZ HUAWEI之类的 > >> +typedef enum { >> + LZ4K_STATUS_INCOMPRESSIBLE = 0, /* !< Return when data is >> incompressible */ >> + LZ4K_STATUS_FAILED = -1, /* !< Return on general failure */ >> + LZ4K_STATUS_READ_ERROR = -2, /* !< Return when data reading >> failed */ >> + LZ4K_STATUS_WRITE_ERROR = -3 /* !< Return when data writing >> failed */ >> +} lz4k_status; >> + >> +/* >> + LZ4K_Version() returns static unmutable string with algorithm version >> + */ >> +const char *lz4k_version(void); >> + >> +/* >> + lz4k_encode_state_bytes_min() returns number of bytes for state >> parameter, >> + supplied to lz4k_encode(), lz4k_encode_delta(), >> + lz4k_update_delta_state(). >> + So, state should occupy at least lz4k_encode_state_bytes_min() for >> mentioned >> + functions to work correctly. >> + */ >> +unsigned lz4k_encode_state_bytes_min(void); > > 下面的注释风格之类要改下；或者删掉一些无用的 > >> + >> +/* >> + lz4k_encode() encodes/compresses one input buffer at *in, places >> + result of encoding into one output buffer at *out if encoded data >> + size fits specified values of out_max and out_limit. >> + It returs size of encoded data in case of success or value<=0 >> otherwise. >> + The result of successful encoding is in HUAWEI proprietary format, >> that >> + is the encoded data can be decoded only by lz4k_decode(). >> + >> + \return >> + \li positive value\n >> + if encoding was successful. The value returned is the size of >> encoded >> + (compressed) data always <=out_max. >> + \li non-positive value\n >> + if in==0||in_max==0||out==0||out_max==0 or >> + if out_max is less than needed for encoded (compressed) data. >> + \li 0 value\n >> + if encoded data size >= out_limit >> + >> + \param[in] state >> + !=0, pointer to state buffer used internally by the function. >> Size of >> + state in bytes should be at least >> lz4k_encode_state_bytes_min(). The content >> + of state buffer will be changed during encoding. >> + >> + \param[in] in >> + !=0, pointer to the input buffer to encode (compress). The >> content of >> + the input buffer does not change during encoding. >> + >> + \param[in] out >> + !=0, pointer to the output buffer where to place result of encoding >> + (compression). >> + If encoding is unsuccessful, e.g. out_max or out_limit are less >> than >> + needed for encoded data then content of out buffer may be >> arbitrary. >> + >> + \param[in] in_max >> + !=0, size in bytes of the input buffer at *in >> + >> + \param[in] out_max >> + !=0, size in bytes of the output buffer at *out >> + >> + \param[in] out_limit >> + encoded data size soft limit in bytes. Due to performance >> reasons it is >> + not guaranteed that >> + lz4k_encode will always detect that resulting encoded data size is >> + bigger than out_limit. >> + Hovewer, when reaching out_limit is detected, lz4k_encode() returns >> + earlier and spares CPU cycles. Caller code should recheck result >> + returned by lz4k_encode() (value greater than 0) if it is really >> + less or equal than out_limit. >> + out_limit is ignored if it is equal to 0. >> + */ >> +int lz4k_encode( >> + void *const state, >> + const void *const in, >> + void *out, >> + unsigned in_max, >> + unsigned out_max, >> + unsigned out_limit); >> + >> +/* >> + lz4k_encode_max_cr() encodes/compresses one input buffer at *in, >> places >> + result of encoding into one output buffer at *out if encoded data >> + size fits specified value of out_max. >> + It returs size of encoded data in case of success or value<=0 >> otherwise. >> + The result of successful encoding is in HUAWEI proprietary format, >> that >> + is the encoded data can be decoded only by lz4k_decode(). >> + >> + \return >> + \li positive value\n >> + if encoding was successful. The value returned is the size of >> encoded >> + (compressed) data always <=out_max. >> + \li non-positive value\n >> + if in==0||in_max==0||out==0||out_max==0 or >> + if out_max is less than needed for encoded (compressed) data. >> + >> + \param[in] state >> + !=0, pointer to state buffer used internally by the function. >> Size of >> + state in bytes should be at least >> lz4k_encode_state_bytes_min(). The content >> + of state buffer will be changed during encoding. >> + >> + \param[in] in >> + !=0, pointer to the input buffer to encode (compress). The >> content of >> + the input buffer does not change during encoding. >> + >> + \param[in] out >> + !=0, pointer to the output buffer where to place result of encoding >> + (compression). >> + If encoding is unsuccessful, e.g. out_max is less than >> + needed for encoded data then content of out buffer may be >> arbitrary. >> + >> + \param[in] in_max >> + !=0, size in bytes of the input buffer at *in >> + >> + \param[in] out_max >> + !=0, size in bytes of the output buffer at *out >> + >> + \param[in] out_limit >> + encoded data size soft limit in bytes. Due to performance >> reasons it is >> + not guaranteed that >> + lz4k_encode will always detect that resulting encoded data size is >> + bigger than out_limit. >> + Hovewer, when reaching out_limit is detected, lz4k_encode() returns >> + earlier and spares CPU cycles. Caller code should recheck result >> + returned by lz4k_encode() (value greater than 0) if it is really >> + less or equal than out_limit. >> + out_limit is ignored if it is equal to 0. >> + */ >> +int lz4k_encode_max_cr( >> + void *const state, >> + const void *const in, >> + void *out, >> + unsigned in_max, >> + unsigned out_max, >> + unsigned out_limit); >> + >> +/* >> + lz4k_update_delta_state() fills/updates state (hash table) in the >> same way as >> + lz4k_encode does while encoding (compressing). >> + The state and its content can then be used by lz4k_encode_delta() >> + to encode (compress) data more efficiently. >> + By other words, effect of lz4k_update_delta_state() is the same as >> + lz4k_encode() with all encoded output discarded. >> + >> + Example sequence of calls for lz4k_update_delta_state and >> + lz4k_encode_delta: >> + //dictionary (1st) block >> + int result0=lz4k_update_delta_state(state, in0, in0, in_max0); >> +//delta (2nd) block >> + int result1=lz4k_encode_delta(state, in0, in, out, in_max, >> + out_max); >> + >> + \param[in] state >> + !=0, pointer to state buffer used internally by lz4k_encode*. >> + Size of state in bytes should be at least >> lz4k_encode_state_bytes_min(). >> + The content of state buffer is zeroed at the beginning of >> + lz4k_update_delta_state ONLY when in0==in. >> + The content of state buffer will be changed inside >> + lz4k_update_delta_state. >> + >> + \param[in] in0 >> + !=0, pointer to the reference/dictionary input buffer that was used >> + as input to preceding call of lz4k_encode() or >> lz4k_update_delta_state() >> + to fill/update the state buffer. >> + The content of the reference/dictionary input buffer does not >> change >> + during encoding. >> + The in0 is needed for use-cases when there are several >> dictionary and >> + input blocks interleaved, e.g. >> + <dictionaryA><inputA><dictionaryB><inputB>..., or >> + <dictionaryA><dictionaryB><inputAB>..., etc. >> + >> + \param[in] in >> + !=0, pointer to the input buffer to fill/update state as if >> encoding >> + (compressing) this input. This input buffer is also called >> dictionary >> + input buffer. >> + The content of the input buffer does not change during encoding. >> + The two buffers - at in0 and at in - should be contiguous in >> memory. >> + That is, the last byte of buffer at in0 is located exactly >> before byte >> + at in. >> + >> + \param[in] in_max >> + !=0, size in bytes of the input buffer at in. >> + */ >> +int lz4k_update_delta_state( >> + void *const state, >> + const void *const in0, >> + const void *const in, >> + unsigned in_max); >> + >> +/* >> + lz4k_encode_delta() encodes (compresses) data from one input buffer >> + using one reference buffer as dictionary and places the result of >> + compression into one output buffer. >> + The result of successful compression is in HUAWEI proprietary >> format, so >> + that compressed data can be decompressed only by lz4k_decode_delta(). >> + Reference/dictionary buffer and input buffer should be contiguous in >> + memory. >> + >> + Example sequence of calls for lz4k_update_delta_state and >> + lz4k_encode_delta: >> +//dictionary (1st) block >> + int result0=lz4k_update_delta_state(state, in0, in0, in_max0); >> +//delta (2nd) block >> + int result1=lz4k_encode_delta(state, in0, in, out, in_max, >> + out_max); >> + >> + Example sequence of calls for lz4k_encode and lz4k_encode_delta: >> +//dictionary (1st) block >> + int result0=lz4k_encode(state, in0, out0, in_max0, out_max0); >> +//delta (2nd) block >> + int result1=lz4k_encode_delta(state, in0, in, out, in_max, >> + out_max); >> + >> + \return >> + \li positive value\n >> + if encoding was successful. The value returned is the size of >> encoded >> + (compressed) data. >> + \li non-positive value\n >> + if state==0||in0==0||in==0||in_max==0||out==0||out_max==0 or >> + if out_max is less than needed for encoded (compressed) data. >> + >> + \param[in] state >> + !=0, pointer to state buffer used internally by the function. >> Size of >> + state in bytes should be at least >> lz4k_encode_state_bytes_min(). For more >> + efficient encoding the state buffer may be filled/updated by >> calling >> + lz4k_update_delta_state() or lz4k_encode() before >> lz4k_encode_delta(). >> + The content of state buffer is zeroed at the beginning of >> + lz4k_encode_delta() ONLY when in0==in. >> + The content of state will be changed during encoding. >> + >> + \param[in] in0 >> + !=0, pointer to the reference/dictionary input buffer that was >> used as >> + input to preceding call of lz4k_encode() or >> lz4k_update_delta_state() to >> + fill/update the state buffer. >> + The content of the reference/dictionary input buffer does not >> change >> + during encoding. >> + >> + \param[in] in >> + !=0, pointer to the input buffer to encode (compress). The >> input buffer >> + is compressed using content of the reference/dictionary input >> buffer at >> + in0. The content of the input buffer does not change during >> encoding. >> + The two buffers - at *in0 and at *in - should be contiguous in >> memory. >> + That is, the last byte of buffer at *in0 is located exactly >> before byte >> + at *in. >> + >> + \param[in] out >> + !=0, pointer to the output buffer where to place result of encoding >> + (compression). If compression is unsuccessful then content of out >> + buffer may be arbitrary. >> + >> + \param[in] in_max >> + !=0, size in bytes of the input buffer at *in >> + >> + \param[in] out_max >> + !=0, size in bytes of the output buffer at *out. >> + */ >> +int lz4k_encode_delta( >> + void *const state, >> + const void *const in0, >> + const void *const in, >> + void *out, >> + unsigned in_max, >> + unsigned out_max); >> + >> +/* >> + lz4k_decode() decodes (decompresses) data from one input buffer >> and places >> + the result of decompression into one output buffer. The encoded >> data in input >> + buffer should be in HUAWEI proprietary format, produced by >> lz4k_encode() >> + or by lz4k_encode_delta(). >> + >> + \return >> + \li positive value\n >> + if decoding was successful. The value returned is the size of >> decoded >> + (decompressed) data. >> + \li non-positive value\n >> + if in==0||in_max==0||out==0||out_max==0 or >> + if out_max is less than needed for decoded (decompressed) data or >> + if input encoded data format is corrupted. >> + >> + \param[in] in >> + !=0, pointer to the input buffer to decode (decompress). The >> content of >> + the input buffer does not change during decoding. >> + >> + \param[in] out >> + !=0, pointer to the output buffer where to place result of decoding >> + (decompression). If decompression is unsuccessful then content >> of out >> + buffer may be arbitrary. >> + >> + \param[in] in_max >> + !=0, size in bytes of the input buffer at in >> + >> + \param[in] out_max >> + !=0, size in bytes of the output buffer at out >> + */ >> +int lz4k_decode( >> + const void *const in, >> + void *const out, >> + unsigned in_max, >> + unsigned out_max); >> + >> +/* >> + lz4k_decode_delta() decodes (decompresses) data from one input buffer >> + and places the result of decompression into one output buffer. The >> + compressed data in input buffer should be in format, produced by >> + lz4k_encode_delta(). >> + >> + Example sequence of calls for lz4k_decode and lz4k_decode_delta: >> +//dictionary (1st) block >> + int result0=lz4k_decode(in0, out0, in_max0, out_max0); >> +//delta (2nd) block >> + int result1=lz4k_decode_delta(in, out0, out, in_max, out_max); >> + >> + \return >> + \li positive value\n >> + if decoding was successful. The value returned is the size of >> decoded >> + (decompressed) data. >> + \li non-positive value\n >> + if in==0||in_max==0||out==0||out_max==0 or >> + if out_max is less than needed for decoded (decompressed) data or >> + if input data format is corrupted. >> + >> + \param[in] in >> + !=0, pointer to the input buffer to decode (decompress). The >> content of >> + the input buffer does not change during decoding. >> + >> + \param[in] out0 >> + !=0, pointer to the dictionary input buffer that was used as >> input to >> + lz4k_update_delta_state() to fill/update the state buffer. The >> content >> + of the dictionary input buffer does not change during decoding. >> + >> + \param[in] out >> + !=0, pointer to the output buffer where to place result of decoding >> + (decompression). If decompression is unsuccessful then content >> of out >> + buffer may be arbitrary. >> + The two buffers - at *out0 and at *out - should be contiguous in >> memory. >> + That is, the last byte of buffer at *out0 is located exactly >> before byte >> + at *out. >> + >> + \param[in] in_max >> + !=0, size in bytes of the input buffer at *in >> + >> + \param[in] out_max >> + !=0, size in bytes of the output buffer at *out >> + */ >> +int lz4k_decode_delta( >> + const void *in, >> + const void *const out0, >> + void *const out, >> + unsigned in_max, >> + unsigned out_max); >> + >> + >> +#endif /* _LZ4K_H */ >> diff --git a/lib/Kconfig b/lib/Kconfig >> index 36326864249d..4bf1c2c21157 100644 >> --- a/lib/Kconfig >> +++ b/lib/Kconfig >> @@ -310,6 +310,12 @@ config LZ4HC_COMPRESS >> config LZ4_DECOMPRESS >> tristate >> +config LZ4K_COMPRESS >> + tristate >> + >> +config LZ4K_DECOMPRESS >> + tristate >> + >> config ZSTD_COMPRESS >> select XXHASH >> tristate >> diff --git a/lib/Makefile b/lib/Makefile >> index a803e1527c4b..bd0d3635ae46 100644 >> --- a/lib/Makefile >> +++ b/lib/Makefile >> @@ -187,6 +187,8 @@ obj-$(CONFIG_LZO_DECOMPRESS) += lzo/ >> obj-$(CONFIG_LZ4_COMPRESS) += lz4/ >> obj-$(CONFIG_LZ4HC_COMPRESS) += lz4/ >> obj-$(CONFIG_LZ4_DECOMPRESS) += lz4/ >> +obj-$(CONFIG_LZ4K_COMPRESS) += lz4k/ >> +obj-$(CONFIG_LZ4K_DECOMPRESS) += lz4k/ >> obj-$(CONFIG_ZSTD_COMPRESS) += zstd/ >> obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd/ >> obj-$(CONFIG_XZ_DEC) += xz/ >> diff --git a/lib/lz4k/Makefile b/lib/lz4k/Makefile >> new file mode 100644 >> index 000000000000..6ea3578639d4 >> --- /dev/null >> +++ b/lib/lz4k/Makefile >> @@ -0,0 +1,2 @@ >> +obj-$(CONFIG_LZ4K_COMPRESS) += lz4k_encode.o >> +obj-$(CONFIG_LZ4K_DECOMPRESS) += lz4k_decode.o >> \ No newline at end of file >> diff --git a/lib/lz4k/lz4k_decode.c b/lib/lz4k/lz4k_decode.c >> new file mode 100644 >> index 000000000000..567b76b7bc51 >> --- /dev/null >> +++ b/lib/lz4k/lz4k_decode.c >> @@ -0,0 +1,308 @@ >> +/* >> + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights >> reserved. >> + * Description: LZ4K compression algorithm >> + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com >> + * Created: 2020-03-25 >> + */ >> + >> +#if !defined(__KERNEL__) >> +#include "lz4k.h" >> +#else >> +#include <linux/lz4k.h> >> +#include <linux/module.h> >> +#endif >> + >> +#include "lz4k_private.h" /* types, etc */ >> + >> +static const uint8_t *get_size( >> + uint_fast32_t *size, >> + const uint8_t *in_at, >> + const uint8_t *const in_end) >> +{ >> + uint_fast32_t u; >> + do { >> + if (unlikely(in_at >= in_end)) >> + return NULL; >> + *size += (u = *(const uint8_t*)in_at); >> + ++in_at; >> + } while (BYTE_MAX == u); >> + return in_at; >> +} >> + >> +static int end_of_block( >> + const uint_fast32_t nr_bytes_max, >> + const uint_fast32_t r_bytes_max, >> + const uint8_t *const in_at, >> + const uint8_t *const in_end, >> + const uint8_t *const out, >> + const uint8_t *const out_at) >> +{ >> + if (!nr_bytes_max) >> + return LZ4K_STATUS_FAILED; /* should be the last one in >> block */ >> + if (r_bytes_max != REPEAT_MIN) >> + return LZ4K_STATUS_FAILED; /* should be the last one in >> block */ >> + if (in_at != in_end) >> + return LZ4K_STATUS_FAILED; /* should be the last one in >> block */ >> + return (int)(out_at - out); >> +} >> + >> +enum { >> + NR_COPY_MIN = 16, >> + R_COPY_MIN = 16, >> + R_COPY_SAFE = R_COPY_MIN - 1, >> + R_COPY_SAFE_2X = (R_COPY_MIN << 1) - 1 >> +}; >> + >> +static bool out_non_repeat( >> + const uint8_t **in_at, >> + uint8_t **out_at, >> + uint_fast32_t nr_bytes_max, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end) >> +{ >> + const uint8_t *const in_copy_end = *in_at + nr_bytes_max; >> + uint8_t *const out_copy_end = *out_at + nr_bytes_max; >> + if (likely(nr_bytes_max <= NR_COPY_MIN)) { >> + if (likely(*in_at <= in_end - NR_COPY_MIN && >> + *out_at <= out_end - NR_COPY_MIN)) >> + m_copy(*out_at, *in_at, NR_COPY_MIN); >> + else if (in_copy_end <= in_end && out_copy_end <= out_end) >> + m_copy(*out_at, *in_at, nr_bytes_max); >> + else >> + return false; >> + } else { >> + if (likely(in_copy_end <= in_end - NR_COPY_MIN && >> + out_copy_end <= out_end - NR_COPY_MIN)) { >> + m_copy(*out_at, *in_at, NR_COPY_MIN); >> + copy_x_while_lt(*out_at + NR_COPY_MIN, >> + *in_at + NR_COPY_MIN, >> + out_copy_end, NR_COPY_MIN); >> + } else if (in_copy_end <= in_end && out_copy_end <= out_end) { >> + m_copy(*out_at, *in_at, nr_bytes_max); >> + } else { /* in_copy_end > in_end || out_copy_end > out_end */ >> + return false; >> + } >> + } >> + *in_at = in_copy_end; >> + *out_at = out_copy_end; >> + return true; >> +} >> + >> +static void out_repeat_overlap( >> + uint_fast32_t offset, >> + uint8_t *out_at, >> + const uint8_t *out_from, >> + const uint8_t *const out_copy_end) >> +{ /* (1 < offset < R_COPY_MIN/2) && out_copy_end + R_COPY_SAFE_2X >> <= out_end */ >> + enum { >> + COPY_MIN = R_COPY_MIN >> 1, >> + OFFSET_LIMIT = COPY_MIN >> 1 >> + }; >> + m_copy(out_at, out_from, COPY_MIN); >> + out_at += offset; >> + if (offset <= OFFSET_LIMIT) >> + offset <<= 1; >> + do { >> + m_copy(out_at, out_from, COPY_MIN); >> + out_at += offset; >> + if (offset <= OFFSET_LIMIT) >> + offset <<= 1; >> + } while (out_at - out_from < R_COPY_MIN); >> + while_lt_copy_2x_as_x2(out_at, out_from, out_copy_end, R_COPY_MIN); >> +} >> + >> +static bool out_repeat_slow( >> + uint_fast32_t r_bytes_max, >> + uint_fast32_t offset, >> + uint8_t *out_at, >> + const uint8_t *out_from, >> + const uint8_t *const out_copy_end, >> + const uint8_t *const out_end) >> +{ >> + if (offset > 1 && out_copy_end <= out_end - R_COPY_SAFE_2X) { >> + out_repeat_overlap(offset, out_at, out_from, out_copy_end); >> + } else { >> + if (unlikely(out_copy_end > out_end)) >> + return false; >> + if (offset == 1) { >> + m_set(out_at, *out_from, r_bytes_max); >> + } else { >> + do >> + *out_at++ = *out_from++; >> + while (out_at < out_copy_end); >> + } >> + } >> + return true; >> +} >> + >> +static int decode( >> + const uint8_t *const in, >> + const uint8_t *const out0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2) >> +{ >> + const uint_fast32_t r_log2 = TAG_BITS_MAX - (off_log2 + nr_log2); >> + const uint8_t *in_at = in; >> + const uint8_t *const in_end_minus_x = in_end - TAG_BYTES_MAX; >> + uint8_t *out_at = out; >> + while (likely(in_at <= in_end_minus_x)) { >> + const uint_fast32_t utag = read4_at(in_at - 1) >> BYTE_BITS; >> + const uint_fast32_t offset = utag & mask(off_log2); >> + uint_fast32_t nr_bytes_max = utag >> (off_log2 + r_log2), >> + r_bytes_max = ((utag >> off_log2) & mask(r_log2)) + >> + REPEAT_MIN; >> + const uint8_t *out_from = 0; >> + uint8_t *out_copy_end = 0; >> + in_at += TAG_BYTES_MAX; >> + if (unlikely(nr_bytes_max == mask(nr_log2))) { >> + in_at = get_size(&nr_bytes_max, in_at, in_end); >> + if (in_at == NULL) >> + return LZ4K_STATUS_READ_ERROR; >> + } >> + if (!out_non_repeat(&in_at, &out_at, nr_bytes_max, in_end, >> out_end)) >> + return LZ4K_STATUS_FAILED; >> + if (unlikely(r_bytes_max == mask(r_log2) + REPEAT_MIN)) { >> + in_at = get_size(&r_bytes_max, in_at, in_end); >> + if (in_at == NULL) >> + return LZ4K_STATUS_READ_ERROR; >> + } >> + out_from = out_at - offset; >> + if (unlikely(out_from < out0)) >> + return LZ4K_STATUS_FAILED; >> + out_copy_end = out_at + r_bytes_max; >> + if (likely(offset >= R_COPY_MIN && >> + out_copy_end <= out_end - R_COPY_SAFE_2X)) { >> + copy_2x_as_x2_while_lt(out_at, out_from, out_copy_end, >> + R_COPY_MIN); >> + } else if (likely(offset >= (R_COPY_MIN >> 1) && >> + out_copy_end <= out_end - R_COPY_SAFE_2X)) { >> + m_copy(out_at, out_from, R_COPY_MIN); >> + out_at += offset; >> + while_lt_copy_x(out_at, out_from, out_copy_end, >> R_COPY_MIN); >> + /* faster than 2x */ >> + } else if (likely(offset > 0)) { >> + if (!out_repeat_slow(r_bytes_max, offset, out_at, out_from, >> + out_copy_end, out_end)) >> + return LZ4K_STATUS_FAILED; >> + } else { /* offset == 0: EOB, last literal */ >> + return end_of_block(nr_bytes_max, r_bytes_max, in_at, >> + in_end, out, out_at); >> + } >> + out_at = out_copy_end; >> + } >> + return in_at == in_end ? (int)(out_at - out) : LZ4K_STATUS_FAILED; >> +} >> + >> +static int decode4kb( >> + const uint8_t *const in, >> + const uint8_t *const out0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end) >> +{ >> + enum { >> + NR_LOG2 = 6 >> + }; >> + return decode(in, out0, out, in_end, out_end, NR_LOG2, >> BLOCK_4KB_LOG2); >> +} >> + >> +static int decode8kb( >> + const uint8_t *const in, >> + const uint8_t *const out0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end) >> +{ >> + enum { >> + NR_LOG2 = 5 >> + }; >> + return decode(in, out0, out, in_end, out_end, NR_LOG2, >> BLOCK_8KB_LOG2); >> +} >> + >> +static int decode16kb( >> + const uint8_t *const in, >> + const uint8_t *const out0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end) >> +{ >> + enum { >> + NR_LOG2 = 5 >> + }; >> + return decode(in, out0, out, in_end, out_end, NR_LOG2, >> BLOCK_16KB_LOG2); >> +} >> + >> +static int decode32kb( >> + const uint8_t *const in, >> + const uint8_t *const out0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end) >> +{ >> + enum { >> + NR_LOG2 = 4 >> + }; >> + return decode(in, out0, out, in_end, out_end, NR_LOG2, >> BLOCK_32KB_LOG2); >> +} >> + >> +static int decode64kb( >> + const uint8_t *const in, >> + const uint8_t *const out0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + const uint8_t *const out_end) >> +{ >> + enum { >> + NR_LOG2 = 4 >> + }; >> + return decode(in, out0, out, in_end, out_end, NR_LOG2, >> BLOCK_64KB_LOG2); >> +} >> + >> +static inline const void *u8_inc(const uint8_t *a) >> +{ >> + return a+1; >> +} >> + >> +int lz4k_decode( >> + const void *in, >> + void *const out, >> + unsigned in_max, >> + unsigned out_max) >> +{ >> + /* ++use volatile pointers to prevent compiler optimizations */ >> + const uint8_t *volatile in_end = (const uint8_t*)in + in_max; >> + const uint8_t *volatile out_end = (uint8_t*)out + out_max; >> + uint8_t in_log2 = 0; >> + if (unlikely(in == NULL || out == NULL || in_max <= 4 || out_max >> <= 0)) >> + return LZ4K_STATUS_FAILED; >> + in_log2 = (uint8_t)(BLOCK_4KB_LOG2 + *(const uint8_t*)in); >> + /* invalid buffer size or pointer overflow */ >> + if (unlikely((const uint8_t*)in >= in_end || (uint8_t*)out >= >> out_end)) >> + return LZ4K_STATUS_FAILED; >> + /* -- */ >> + in = u8_inc((const uint8_t*)in); >> + --in_max; >> + if (in_log2 < BLOCK_8KB_LOG2) >> + return decode4kb((const uint8_t*)in, (uint8_t*)out, >> + (uint8_t*)out, in_end, out_end); >> + if (in_log2 == BLOCK_8KB_LOG2) >> + return decode8kb((const uint8_t*)in, (uint8_t*)out, >> + (uint8_t*)out, in_end, out_end); >> + if (in_log2 == BLOCK_16KB_LOG2) >> + return decode16kb((const uint8_t*)in, (uint8_t*)out, >> + (uint8_t*)out, in_end, out_end); >> + if (in_log2 == BLOCK_32KB_LOG2) >> + return decode32kb((const uint8_t*)in, (uint8_t*)out, >> + (uint8_t*)out, in_end, out_end); >> + if (in_log2 == BLOCK_64KB_LOG2) >> + return decode64kb((const uint8_t*)in, (uint8_t*)out, >> + (uint8_t*)out, in_end, out_end); >> + return LZ4K_STATUS_FAILED; >> +} >> +EXPORT_SYMBOL(lz4k_decode); >> + >> +MODULE_LICENSE("Dual BSD/GPL"); >> +MODULE_DESCRIPTION("LZ4K decoder"); >> diff --git a/lib/lz4k/lz4k_encode.c b/lib/lz4k/lz4k_encode.c >> new file mode 100644 >> index 000000000000..a425d3a0b827 >> --- /dev/null >> +++ b/lib/lz4k/lz4k_encode.c >> @@ -0,0 +1,539 @@ >> +/* >> + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights >> reserved. >> + * Description: LZ4K compression algorithm >> + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com >> + * Created: 2020-03-25 >> + */ >> + >> +#if !defined(__KERNEL__) >> +#include "lz4k.h" >> +#else >> +#include <linux/lz4k.h> >> +#include <linux/module.h> >> +#endif >> + >> +#include "lz4k_private.h" >> +#include "lz4k_encode_private.h" >> + >> +static uint8_t *out_size_bytes(uint8_t *out_at, uint_fast32_t u) >> +{ >> + for (; unlikely(u >= BYTE_MAX); u -= BYTE_MAX) >> + *out_at++ = (uint8_t)BYTE_MAX; >> + *out_at++ = (uint8_t)u; >> + return out_at; >> +} >> + >> +static inline uint8_t *out_utag_then_bytes_left( >> + uint8_t *out_at, >> + uint_fast32_t utag, >> + uint_fast32_t bytes_left) >> +{ >> + m_copy(out_at, &utag, TAG_BYTES_MAX); >> + return out_size_bytes(out_at + TAG_BYTES_MAX, bytes_left); >> +} >> + >> +static int out_tail( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + const uint8_t *const out, >> + const uint8_t *const nr0, >> + const uint8_t *const in_end, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + bool check_out) >> +{ >> + const uint_fast32_t nr_mask = mask(nr_log2); >> + const uint_fast32_t r_log2 = TAG_BITS_MAX - (off_log2 + nr_log2); >> + const uint_fast32_t nr_bytes_max = u_32(in_end - nr0); >> + if (encoded_bytes_min(nr_log2, nr_bytes_max) > u_32(out_end - >> out_at)) >> + return check_out ? LZ4K_STATUS_WRITE_ERROR : >> + LZ4K_STATUS_INCOMPRESSIBLE; >> + if (nr_bytes_max < nr_mask) { >> + /* caller guarantees at least one nr-byte */ >> + uint_fast32_t utag = (nr_bytes_max << (off_log2 + r_log2)); >> + m_copy(out_at, &utag, TAG_BYTES_MAX); >> + out_at += TAG_BYTES_MAX; >> + } else { >> + uint_fast32_t bytes_left = nr_bytes_max - nr_mask; >> + uint_fast32_t utag = (nr_mask << (off_log2 + r_log2)); >> + out_at = out_utag_then_bytes_left(out_at, utag, bytes_left); >> + } >> + m_copy(out_at, nr0, nr_bytes_max); >> + return (int)(out_at + nr_bytes_max - out); >> +} >> + >> +int lz4k_out_tail( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + const uint8_t *const out, >> + const uint8_t *const nr0, >> + const uint8_t *const in_end, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + bool check_out) >> +{ >> + return out_tail(out_at, out_end, out, nr0, in_end, >> + nr_log2, off_log2, check_out); >> +} >> + >> +static uint8_t *out_non_repeat( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t utag, >> + const uint8_t *const nr0, >> + const uint8_t *const r, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + bool check_out) >> +{ >> + const uint_fast32_t nr_bytes_max = u_32(r - nr0); >> + const uint_fast32_t nr_mask = mask(nr_log2), >> + r_log2 = TAG_BITS_MAX - (off_log2 + nr_log2); >> + if (likely(nr_bytes_max < nr_mask)) { >> + if (unlikely(check_out && >> + TAG_BYTES_MAX + nr_bytes_max > u_32(out_end - out_at))) >> + return NULL; >> + utag |= (nr_bytes_max << (off_log2 + r_log2)); >> + m_copy(out_at, &utag, TAG_BYTES_MAX); >> + out_at += TAG_BYTES_MAX; >> + } else { >> + uint_fast32_t bytes_left = nr_bytes_max - nr_mask; >> + if (unlikely(check_out && >> + TAG_BYTES_MAX + size_bytes_count(bytes_left) + >> nr_bytes_max > >> + u_32(out_end - out_at))) >> + return NULL; >> + utag |= (nr_mask << (off_log2 + r_log2)); >> + out_at = out_utag_then_bytes_left(out_at, utag, bytes_left); >> + } >> + if (unlikely(check_out)) >> + m_copy(out_at, nr0, nr_bytes_max); >> + else >> + copy_x_while_total(out_at, nr0, nr_bytes_max, NR_COPY_MIN); >> + out_at += nr_bytes_max; >> + return out_at; >> +} >> + >> +uint8_t *lz4k_out_non_repeat( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t utag, >> + const uint8_t *const nr0, >> + const uint8_t *const r, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + bool check_out) >> +{ >> + return out_non_repeat(out_at, out_end, utag, nr0, r, >> + nr_log2, off_log2, check_out); >> +} >> + >> +static uint8_t *out_r_bytes_left( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t r_bytes_max, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out) /* =false when >> +out_max>=encoded_bytes_max(in_max), =true otherwise */ >> +{ >> + const uint_fast32_t r_mask = mask(TAG_BITS_MAX - (off_log2 + >> nr_log2)); >> + if (unlikely(r_bytes_max - REPEAT_MIN >= r_mask)) { >> + uint_fast32_t bytes_left = r_bytes_max - REPEAT_MIN - r_mask; >> + if (unlikely(check_out && >> + size_bytes_count(bytes_left) > u_32(out_end - out_at))) >> + return NULL; >> + out_at = out_size_bytes(out_at, bytes_left); >> + } >> + return out_at; /* SUCCESS: continue compression */ >> +} >> + >> +uint8_t *lz4k_out_r_bytes_left( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t r_bytes_max, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out) >> +{ >> + return out_r_bytes_left(out_at, out_end, r_bytes_max, >> + nr_log2, off_log2, check_out); >> +} >> + >> +static uint8_t *out_repeat( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t utag, >> + uint_fast32_t r_bytes_max, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out) /* =false when >> +out_max>=encoded_bytes_max(in_max), =true otherwise */ >> +{ >> + const uint_fast32_t r_mask = mask(TAG_BITS_MAX - (off_log2 + >> nr_log2)); >> + if (likely(r_bytes_max - REPEAT_MIN < r_mask)) { >> + if (unlikely(check_out && TAG_BYTES_MAX > u_32(out_end - >> out_at))) >> + return NULL; >> + utag |= ((r_bytes_max - REPEAT_MIN) << off_log2); >> + m_copy(out_at, &utag, TAG_BYTES_MAX); >> + out_at += TAG_BYTES_MAX; >> + } else { >> + uint_fast32_t bytes_left = r_bytes_max - REPEAT_MIN - r_mask; >> + if (unlikely(check_out && >> + TAG_BYTES_MAX + size_bytes_count(bytes_left) > >> + u_32(out_end - out_at))) >> + return NULL; >> + utag |= (r_mask << off_log2); >> + out_at = out_utag_then_bytes_left(out_at, utag, bytes_left); >> + } >> + return out_at; /* SUCCESS: continue compression */ >> +} >> + >> +uint8_t *lz4k_out_repeat( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t utag, >> + uint_fast32_t r_bytes_max, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out) >> +{ >> + return out_repeat(out_at, out_end, utag, r_bytes_max, >> + nr_log2, off_log2, check_out); >> +} >> + >> +static const uint8_t *repeat_end( >> + const uint8_t *q, >> + const uint8_t *r, >> + const uint8_t *const in_end_safe, >> + const uint8_t *const in_end) >> +{ >> + q += REPEAT_MIN; >> + r += REPEAT_MIN; >> + /* caller guarantees r+12<=in_end */ >> + do { >> + const uint64_t x = read8_at(q) ^ read8_at(r); >> + if (x) { >> + const uint16_t ctz = (uint16_t)__builtin_ctzl(x); >> + return r + (ctz >> BYTE_BITS_LOG2); >> + } >> + /* some bytes differ:+ count of trailing 0-bits/bytes */ >> + q += sizeof(uint64_t), r += sizeof(uint64_t); >> + } while (likely(r <= in_end_safe)); /* once, at input block end */ >> + do { >> + if (*q != *r) return r; >> + ++q; >> + ++r; >> + } while (r < in_end); >> + return r; >> +} >> + >> +const uint8_t *lz4k_repeat_end( >> + const uint8_t *q, >> + const uint8_t *r, >> + const uint8_t *const in_end_safe, >> + const uint8_t *const in_end) >> +{ >> + return repeat_end(q, r, in_end_safe, in_end); >> +} >> + >> +enum { >> + HT_BYTES_LOG2 = HT_LOG2 + 1 >> +}; >> + >> +inline unsigned encode_state_bytes_min(void) >> +{ >> + unsigned bytes_total = (1U << HT_BYTES_LOG2); >> + return bytes_total; >> +} >> + >> +#if !defined(LZ4K_DELTA) && !defined(LZ4K_MAX_CR) >> + >> +unsigned lz4k_encode_state_bytes_min(void) >> +{ >> + return encode_state_bytes_min(); >> +} >> +EXPORT_SYMBOL(lz4k_encode_state_bytes_min); >> + >> +#endif /* !defined(LZ4K_DELTA) && !defined(LZ4K_MAX_CR) */ >> + >> +/* CR increase order: +STEP, have OFFSETS, use _5b */ >> +/* *_6b to compete with LZ4 */ >> +static inline uint_fast32_t hash0_v(const uint64_t r, uint32_t shift) >> +{ >> + return hash64v_6b(r, shift); >> +} >> + >> +static inline uint_fast32_t hash0(const uint8_t *r, uint32_t shift) >> +{ >> + return hash64_6b(r, shift); >> +} >> + >> +/* >> + * Proof that 'r' increments are safe-NO pointer overflows are >> possible: >> + * >> + * While using STEP_LOG2=5, step_start=1<<STEP_LOG2 == 32 we >> increment s >> + * 32 times by 1, 32 times by 2, 32 times by 3, and so on: >> + * 32*1+32*2+32*3+...+32*31 == 32*SUM(1..31) == 32*((1+31)*15+16). >> + * So, we can safely increment s by at most 31 for input block size <= >> + * 1<<13 < 15872. >> + * >> + * More precisely, STEP_LIMIT == x for any input block calculated as >> follows: >> + * 1<<off_log2 >= (1<<STEP_LOG2)*((x+1)(x-1)/2+x/2) ==> >> + * 1<<(off_log2-STEP_LOG2+1) >= x^2+x-1 ==> >> + * x^2+x-1-1<<(off_log2-STEP_LOG2+1) == 0, which is solved by standard >> + * method. >> + * To avoid overhead here conservative approximate value of x is >> calculated >> + * as average of two nearest square roots, see STEP_LIMIT above. >> + */ >> + >> +enum { >> + STEP_LOG2 = 5 /* increase for better CR */ >> +}; >> + >> +static int encode_any( >> + uint16_t *const ht0, >> + const uint8_t *const in0, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + uint8_t *const out_end, /* ==out_limit for !check_out */ >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out) >> +{ /* caller guarantees off_log2 <=16 */ >> + uint8_t *out_at = out; >> + const uint8_t *const in_end_safe = in_end - NR_COPY_MIN; >> + const uint8_t *r = in0; >> + const uint8_t *nr0 = r++; >> + uint_fast32_t step = 1 << STEP_LOG2; >> + for (;;) { >> + uint_fast32_t utag = 0; >> + const uint8_t *r_end = 0; >> + uint_fast32_t r_bytes_max = 0; >> + const uint8_t *const q = hashed(in0, ht0, hash0(r, HT_LOG2), >> r); >> + if (!equal4(q, r)) { >> + r += (++step >> STEP_LOG2); >> + if (unlikely(r > in_end_safe)) >> + return out_tail(out_at, out_end, out, nr0, in_end, >> + nr_log2, off_log2, check_out); >> + continue; >> + } >> + utag = u_32(r - q); >> + r_end = repeat_end(q, r, in_end_safe, in_end); >> + r = repeat_start(q, r, nr0, in0); >> + r_bytes_max = u_32(r_end - r); >> + if (nr0 == r) { >> + out_at = out_repeat(out_at, out_end, utag, r_bytes_max, >> + nr_log2, off_log2, check_out); >> + } else { >> + update_utag(r_bytes_max, &utag, nr_log2, off_log2); >> + out_at = out_non_repeat(out_at, out_end, utag, nr0, r, >> + nr_log2, off_log2, check_out); >> + if (unlikely(check_out && out_at == NULL)) >> + return LZ4K_STATUS_WRITE_ERROR; >> + out_at = out_r_bytes_left(out_at, out_end, r_bytes_max, >> + nr_log2, off_log2, check_out); >> + } >> + if (unlikely(check_out && out_at == NULL)) >> + return LZ4K_STATUS_WRITE_ERROR; >> + nr0 = (r += r_bytes_max); >> + if (unlikely(r > in_end_safe)) >> + return r == in_end ? (int)(out_at - out) : >> + out_tail(out_at, out_end, out, r, in_end, >> + nr_log2, off_log2, check_out); >> + ht0[hash0(r - 1 - 1, HT_LOG2)] = (uint16_t)(r - 1 - 1 - in0); >> + step = 1 << STEP_LOG2; >> + } >> +} >> + >> +static int encode_fast( >> + uint16_t *const ht, >> + const uint8_t *const in, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + uint8_t *const out_end, /* ==out_limit for !check_out */ >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2) >> +{ /* caller guarantees off_log2 <=16 */ >> + return encode_any(ht, in, out, in_end, out_end, nr_log2, off_log2, >> + false); /* !check_out */ >> +} >> + >> +static int encode_slow( >> + uint16_t *const ht, >> + const uint8_t *const in, >> + uint8_t *const out, >> + const uint8_t *const in_end, >> + uint8_t *const out_end, /* ==out_limit for !check_out */ >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2) >> +{ /* caller guarantees off_log2 <=16 */ >> + return encode_any(ht, in, out, in_end, out_end, nr_log2, off_log2, >> + true); /* check_out */ >> +} >> + >> +static int encode4kb( >> + uint16_t *const state, >> + const uint8_t *const in, >> + uint8_t *out, >> + const uint_fast32_t in_max, >> + const uint_fast32_t out_max, >> + const uint_fast32_t out_limit) >> +{ >> + enum { >> + NR_LOG2 = 6 >> + }; >> + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? >> + encode_slow(state, in, out, in + in_max, out + out_max, >> + NR_LOG2, BLOCK_4KB_LOG2) : >> + encode_fast(state, in, out, in + in_max, out + out_limit, >> + NR_LOG2, BLOCK_4KB_LOG2); >> + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ >> +} >> + >> +static int encode8kb( >> + uint16_t *const state, >> + const uint8_t *const in, >> + uint8_t *out, >> + const uint_fast32_t in_max, >> + const uint_fast32_t out_max, >> + const uint_fast32_t out_limit) >> +{ >> + enum { >> + NR_LOG2 = 5 >> + }; >> + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? >> + encode_slow(state, in, out, in + in_max, out + out_max, >> + NR_LOG2, BLOCK_8KB_LOG2) : >> + encode_fast(state, in, out, in + in_max, out + out_limit, >> + NR_LOG2, BLOCK_8KB_LOG2); >> + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ >> +} >> + >> +static int encode16kb( >> + uint16_t *const state, >> + const uint8_t *const in, >> + uint8_t *out, >> + const uint_fast32_t in_max, >> + const uint_fast32_t out_max, >> + const uint_fast32_t out_limit) >> +{ >> + enum { >> + NR_LOG2 = 5 >> + }; >> + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? >> + encode_slow(state, in, out, in + in_max, out + out_max, >> + NR_LOG2, BLOCK_16KB_LOG2) : >> + encode_fast(state, in, out, in + in_max, out + out_limit, >> + NR_LOG2, BLOCK_16KB_LOG2); >> + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ >> +} >> + >> +static int encode32kb( >> + uint16_t *const state, >> + const uint8_t *const in, >> + uint8_t *out, >> + const uint_fast32_t in_max, >> + const uint_fast32_t out_max, >> + const uint_fast32_t out_limit) >> +{ >> + enum { >> + NR_LOG2 = 4 >> + }; >> + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? >> + encode_slow(state, in, out, in + in_max, out + out_max, >> + NR_LOG2, BLOCK_32KB_LOG2) : >> + encode_fast(state, in, out, in + in_max, out + out_limit, >> + NR_LOG2, BLOCK_32KB_LOG2); >> + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ >> +} >> + >> +static int encode64kb( >> + uint16_t *const state, >> + const uint8_t *const in, >> + uint8_t *out, >> + const uint_fast32_t in_max, >> + const uint_fast32_t out_max, >> + const uint_fast32_t out_limit) >> +{ >> + enum { >> + NR_LOG2 = 4 >> + }; >> + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? >> + encode_slow(state, in, out, in + in_max, out + out_max, >> + NR_LOG2, BLOCK_64KB_LOG2) : >> + encode_fast(state, in, out, in + in_max, out + out_limit, >> + NR_LOG2, BLOCK_64KB_LOG2); >> + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ >> +} >> + >> +static int encode( >> + uint16_t *const state, >> + const uint8_t *const in, >> + uint8_t *out, >> + uint_fast32_t in_max, >> + uint_fast32_t out_max, >> + uint_fast32_t out_limit) >> +{ >> + const uint8_t in_log2 = (uint8_t)(most_significant_bit_of( >> + round_up_to_power_of2(in_max - REPEAT_MIN))); >> + m_set(state, 0, encode_state_bytes_min()); >> + *out = in_log2 > BLOCK_4KB_LOG2 ? (uint8_t)(in_log2 - >> BLOCK_4KB_LOG2) : 0; >> + ++out; >> + --out_max; >> + --out_limit; >> + if (in_log2 < BLOCK_8KB_LOG2) >> + return encode4kb(state, in, out, in_max, out_max, out_limit); >> + if (in_log2 == BLOCK_8KB_LOG2) >> + return encode8kb(state, in, out, in_max, out_max, out_limit); >> + if (in_log2 == BLOCK_16KB_LOG2) >> + return encode16kb(state, in, out, in_max, out_max, out_limit); >> + if (in_log2 == BLOCK_32KB_LOG2) >> + return encode32kb(state, in, out, in_max, out_max, out_limit); >> + if (in_log2 == BLOCK_64KB_LOG2) >> + return encode64kb(state, in, out, in_max, out_max, out_limit); >> + return LZ4K_STATUS_FAILED; >> +} >> + >> +int lz4k_encode( >> + void *const state, >> + const void *const in, >> + void *out, >> + unsigned in_max, >> + unsigned out_max, >> + unsigned out_limit) >> +{ >> + const unsigned gain_max = 64 > (in_max >> 6) ? 64 : (in_max >> 6); >> + const unsigned out_limit_min = in_max < out_max ? in_max : out_max; >> + const uint8_t *volatile in_end = (const uint8_t*)in + in_max; >> + const uint8_t *volatile out_end = (uint8_t*)out + out_max; >> + const void *volatile state_end = >> + (uint8_t*)state + encode_state_bytes_min(); >> + if (unlikely(state == NULL)) >> + return LZ4K_STATUS_FAILED; >> + if (unlikely(in == NULL || out == NULL)) >> + return LZ4K_STATUS_FAILED; >> + if (unlikely(in_max <= gain_max)) >> + return LZ4K_STATUS_INCOMPRESSIBLE; >> + if (unlikely(out_max <= gain_max)) /* need 1 byte for in_log2 */ >> + return LZ4K_STATUS_FAILED; >> + /* ++use volatile pointers to prevent compiler optimizations */ >> + if (unlikely((const uint8_t*)in >= in_end || (uint8_t*)out >= >> out_end)) >> + return LZ4K_STATUS_FAILED; >> + if (unlikely(state >= state_end)) >> + return LZ4K_STATUS_FAILED; /* pointer overflow */ >> + if (!out_limit || out_limit >= out_limit_min) >> + out_limit = out_limit_min - gain_max; >> + return encode((uint16_t*)state, (const uint8_t*)in, (uint8_t*)out, >> + in_max, out_max, out_limit); >> +} >> +EXPORT_SYMBOL(lz4k_encode); >> + >> +const char *lz4k_version(void) >> +{ >> + static const char *version = "2020.07.07"; >> + return version; >> +} >> +EXPORT_SYMBOL(lz4k_version); >> + >> +MODULE_LICENSE("Dual BSD/GPL"); >> +MODULE_DESCRIPTION("LZ4K encoder"); >> diff --git a/lib/lz4k/lz4k_encode_private.h >> b/lib/lz4k/lz4k_encode_private.h >> new file mode 100644 >> index 000000000000..eb5cd162468f >> --- /dev/null >> +++ b/lib/lz4k/lz4k_encode_private.h >> @@ -0,0 +1,137 @@ >> +/* >> + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights >> reserved. >> + * Description: LZ4K compression algorithm >> + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com >> + * Created: 2020-03-25 >> + */ >> + >> +#ifndef _LZ4K_ENCODE_PRIVATE_H >> +#define _LZ4K_ENCODE_PRIVATE_H >> + >> +#include "lz4k_private.h" >> + >> +/* <nrSize bytes for whole block>+<1 terminating 0 byte> */ >> +static inline uint_fast32_t size_bytes_count(uint_fast32_t u) >> +{ >> + return (u + BYTE_MAX - 1) / BYTE_MAX; >> +} >> + >> +/* minimum encoded size for non-compressible data */ >> +static inline uint_fast32_t encoded_bytes_min( >> + uint_fast32_t nr_log2, >> + uint_fast32_t in_max) >> +{ >> + return in_max < mask(nr_log2) ? >> + TAG_BYTES_MAX + in_max : >> + TAG_BYTES_MAX + size_bytes_count(in_max - mask(nr_log2)) + >> in_max; >> +} >> + >> +enum { >> + NR_COPY_LOG2 = 4, >> + NR_COPY_MIN = 1 << NR_COPY_LOG2 >> +}; >> + >> +static inline uint_fast32_t u_32(int64_t i) >> +{ >> + return (uint_fast32_t)i; >> +} >> + >> +/* maximum encoded size for non-comprressible data if "fast" encoder >> is used */ >> +static inline uint_fast32_t encoded_bytes_max( >> + uint_fast32_t nr_log2, >> + uint_fast32_t in_max) >> +{ >> + uint_fast32_t r = TAG_BYTES_MAX + >> (uint32_t)round_up_to_log2(in_max, NR_COPY_LOG2); >> + return in_max < mask(nr_log2) ? r : r + size_bytes_count(in_max >> - mask(nr_log2)); >> +} >> + >> +enum { >> + HT_LOG2 = 12 >> +}; >> + >> +/* >> + * Compressed data format (where {} means 0 or more occurrences, [] >> means >> + * optional): >> + * <24bits tag: (off_log2 rOffset| r_log2 rSize|nr_log2 nrSize)> >> + * {<nrSize byte>}[<nr bytes>]{<rSize byte>} >> + * <rSize byte> and <nrSize byte> bytes are terminated by byte != 255 >> + * >> + */ >> + >> +static inline void update_utag( >> + uint_fast32_t r_bytes_max, >> + uint_fast32_t *utag, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2) >> +{ >> + const uint_fast32_t r_mask = mask(TAG_BITS_MAX - (off_log2 + >> nr_log2)); >> + *utag |= likely(r_bytes_max - REPEAT_MIN < r_mask) ? >> + ((r_bytes_max - REPEAT_MIN) << off_log2) : (r_mask << >> off_log2); >> +} >> + >> +static inline const uint8_t *hashed( >> + const uint8_t *const in0, >> + uint16_t *const ht, >> + uint_fast32_t h, >> + const uint8_t *r) >> +{ >> + const uint8_t *q = in0 + ht[h]; >> + ht[h] = (uint16_t)(r - in0); >> + return q; >> +} >> + >> +static inline const uint8_t *repeat_start( >> + const uint8_t *q, >> + const uint8_t *r, >> + const uint8_t *const nr0, >> + const uint8_t *const in0) >> +{ >> + for (; r > nr0 && likely(q > in0) && unlikely(q[-1] == r[-1]); >> --q, --r); >> + return r; >> +} >> + >> +int lz4k_out_tail( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + const uint8_t *const out, >> + const uint8_t *const nr0, >> + const uint8_t *const in_end, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + bool check_out); >> + >> +uint8_t *lz4k_out_non_repeat( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t utag, >> + const uint8_t *const nr0, >> + const uint8_t *const r, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + bool check_out); >> + >> +uint8_t *lz4k_out_r_bytes_left( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t r_bytes_max, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out); >> + >> +uint8_t *lz4k_out_repeat( >> + uint8_t *out_at, >> + uint8_t *const out_end, >> + uint_fast32_t utag, >> + uint_fast32_t r_bytes_max, >> + const uint_fast32_t nr_log2, >> + const uint_fast32_t off_log2, >> + const bool check_out); >> + >> +const uint8_t *lz4k_repeat_end( >> + const uint8_t *q, >> + const uint8_t *r, >> + const uint8_t *const in_end_safe, >> + const uint8_t *const in_end); >> + >> +#endif /* _LZ4K_ENCODE_PRIVATE_H */ >> + >> diff --git a/lib/lz4k/lz4k_private.h b/lib/lz4k/lz4k_private.h >> new file mode 100644 >> index 000000000000..2a8f4b37dc74 >> --- /dev/null >> +++ b/lib/lz4k/lz4k_private.h >> @@ -0,0 +1,269 @@ >> +/* >> + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights >> reserved. >> + * Description: LZ4K compression algorithm >> + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com >> + * Created: 2020-03-25 >> + */ >> + >> +#ifndef _LZ4K_PRIVATE_H >> +#define _LZ4K_PRIVATE_H >> + >> +#if !defined(__KERNEL__) >> + >> +#include "lz4k.h" >> +#include <stdint.h> /* uint*_t */ >> +#define __STDC_WANT_LIB_EXT1__ 1 >> +#include <string.h> /* memcpy() */ >> + >> +#define likely(e) __builtin_expect(e, 1) >> +#define unlikely(e) __builtin_expect(e, 0) >> + >> +#else /* __KERNEL__ */ >> + >> +#include <linux/lz4k.h> >> +#define __STDC_WANT_LIB_EXT1__ 1 >> +#include <linux/string.h> /* memcpy() */ >> +#include <linux/types.h> /* uint8_t, int8_t, uint16_t, int16_t, >> +uint32_t, int32_t, uint64_t, int64_t */ >> +#include <stddef.h> >> + >> +typedef uint64_t uint_fast32_t; >> +typedef int64_t int_fast32_t; >> + >> +#endif /* __KERNEL__ */ >> + >> +#if defined(__GNUC__) && (__GNUC__>=4) >> +#define LZ4K_WITH_GCC_INTRINSICS >> +#endif >> + >> +#if !defined(__GNUC__) >> +#define __builtin_expect(e, v) (e) >> +#endif /* defined(__GNUC__) */ >> + >> +enum { >> + BYTE_BITS = 8, >> + BYTE_BITS_LOG2 = 3, >> + BYTE_MAX = 255U, >> + REPEAT_MIN = 4, >> + TAG_BYTES_MAX = 3, >> + TAG_BITS_MAX = TAG_BYTES_MAX * 8, >> + BLOCK_4KB_LOG2 = 12, >> + BLOCK_8KB_LOG2 = 13, >> + BLOCK_16KB_LOG2 = 14, >> + BLOCK_32KB_LOG2 = 15, >> + BLOCK_64KB_LOG2 = 16 >> +}; >> + >> +static inline uint32_t mask(uint_fast32_t log2) >> +{ >> + return (1U << log2) - 1U; >> +} >> + >> +static inline uint64_t mask64(uint_fast32_t log2) >> +{ >> + return (1ULL << log2) - 1ULL; >> +} >> + >> +#if defined LZ4K_WITH_GCC_INTRINSICS >> +static inline int most_significant_bit_of(uint64_t u) >> +{ >> + return (int)(__builtin_expect((u) == 0, false) ? >> + -1 : (int)(31 ^ (uint32_t)__builtin_clz((unsigned)(u)))); >> +} >> +#else /* #!defined LZ4K_WITH_GCC_INTRINSICS */ >> +#error undefined most_significant_bit_of(unsigned u) >> +#endif /* #if defined LZ4K_WITH_GCC_INTRINSICS */ >> + >> +static inline uint64_t round_up_to_log2(uint64_t u, uint8_t log2) >> +{ >> + return (uint64_t)((u + mask64(log2)) & ~mask64(log2)); >> +} >> + >> +static inline uint64_t round_up_to_power_of2(uint64_t u) >> +{ >> + const int_fast32_t msb = most_significant_bit_of(u); >> + return round_up_to_log2(u, (uint8_t)msb); >> +} >> + >> +static inline void m_copy(void *dst, const void *src, size_t total) >> +{ >> +#if defined(__STDC_LIB_EXT1__) >> + (void)memcpy_s(dst, total, src, (total * 2) >> 1); /* *2 >> 1 to >> avoid bot errors */ >> +#else >> + (void)__builtin_memcpy(dst, src, total); >> +#endif >> +} >> + >> +static inline void m_set(void *dst, uint8_t value, size_t total) >> +{ >> +#if defined(__STDC_LIB_EXT1__) >> + (void)memset_s(dst, total, value, (total * 2) >> 1); /* *2 >> 1 >> to avoid bot errors */ >> +#else >> + (void)__builtin_memset(dst, value, total); >> +#endif >> +} >> + >> +static inline uint32_t read4_at(const void *p) >> +{ >> + uint32_t result; >> + m_copy(&result, p, sizeof(result)); >> + return result; >> +} >> + >> +static inline uint64_t read8_at(const void *p) >> +{ >> + uint64_t result; >> + m_copy(&result, p, sizeof(result)); >> + return result; >> +} >> + >> +static inline bool equal4(const uint8_t *const q, const uint8_t >> *const r) >> +{ >> + return read4_at(q) == read4_at(r); >> +} >> + >> +static inline bool equal3(const uint8_t *const q, const uint8_t >> *const r) >> +{ >> + return (read4_at(q) << BYTE_BITS) == (read4_at(r) << BYTE_BITS); >> +} >> + >> +static inline uint_fast32_t hash24v(const uint64_t r, uint32_t shift) >> +{ >> + const uint32_t m = 3266489917U; >> + return (((uint32_t)r << BYTE_BITS) * m) >> (32 - shift); >> +} >> + >> +static inline uint_fast32_t hash24(const uint8_t *r, uint32_t shift) >> +{ >> + return hash24v(read4_at(r), shift); >> +} >> + >> +static inline uint_fast32_t hash32v_2(const uint64_t r, uint32_t shift) >> +{ >> + const uint32_t m = 3266489917U; >> + return ((uint32_t)r * m) >> (32 - shift); >> +} >> + >> +static inline uint_fast32_t hash32_2(const uint8_t *r, uint32_t shift) >> +{ >> + return hash32v_2(read4_at(r), shift); >> +} >> + >> +static inline uint_fast32_t hash32v(const uint64_t r, uint32_t shift) >> +{ >> + const uint32_t m = 2654435761U; >> + return ((uint32_t)r * m) >> (32 - shift); >> +} >> + >> +static inline uint_fast32_t hash32(const uint8_t *r, uint32_t shift) >> +{ >> + return hash32v(read4_at(r), shift); >> +} >> + >> +static inline uint_fast32_t hash64v_5b(const uint64_t r, uint32_t >> shift) >> +{ >> + const uint64_t m = 889523592379ULL; >> + return (uint32_t)(((r << 24) * m) >> (64 - shift)); >> +} >> + >> +static inline uint_fast32_t hash64_5b(const uint8_t *r, uint32_t shift) >> +{ >> + return hash64v_5b(read8_at(r), shift); >> +} >> + >> +static inline uint_fast32_t hash64v_6b(const uint64_t r, uint32_t >> shift) >> +{ >> + const uint64_t m = 227718039650203ULL; >> + return (uint32_t)(((r << 16) * m) >> (64 - shift)); >> +} >> + >> +static inline uint_fast32_t hash64_6b(const uint8_t *r, uint32_t shift) >> +{ >> + return hash64v_6b(read8_at(r), shift); >> +} >> + >> +static inline uint_fast32_t hash64v_7b(const uint64_t r, uint32_t >> shift) >> +{ >> + const uint64_t m = 58295818150454627ULL; >> + return (uint32_t)(((r << 8) * m) >> (64 - shift)); >> +} >> + >> +static inline uint_fast32_t hash64_7b(const uint8_t *r, uint32_t shift) >> +{ >> + return hash64v_7b(read8_at(r), shift); >> +} >> + >> +static inline uint_fast32_t hash64v_8b(const uint64_t r, uint32_t >> shift) >> +{ >> + const uint64_t m = 2870177450012600261ULL; >> + return (uint32_t)((r * m) >> (64 - shift)); >> +} >> + >> +static inline uint_fast32_t hash64_8b(const uint8_t *r, uint32_t shift) >> +{ >> + return hash64v_8b(read8_at(r), shift); >> +} >> + >> +static inline void while_lt_copy_x( >> + uint8_t *dst, >> + const uint8_t *src, >> + const uint8_t *dst_end, >> + const size_t copy_min) >> +{ >> + for (; dst < dst_end; dst += copy_min, src += copy_min) >> + m_copy(dst, src, copy_min); >> +} >> + >> +static inline void copy_x_while_lt( >> + uint8_t *dst, >> + const uint8_t *src, >> + const uint8_t *dst_end, >> + const size_t copy_min) >> +{ >> + m_copy(dst, src, copy_min); >> + while (dst + copy_min < dst_end) >> + m_copy(dst += copy_min, src += copy_min, copy_min); >> +} >> + >> +static inline void copy_x_while_total( >> + uint8_t *dst, >> + const uint8_t *src, >> + size_t total, >> + const size_t copy_min) >> +{ >> + m_copy(dst, src, copy_min); >> + for (; total > copy_min; total-= copy_min) >> + m_copy(dst += copy_min, src += copy_min, copy_min); >> +} >> + >> +static inline void copy_2x( >> + uint8_t *dst, >> + const uint8_t *src, >> + const size_t copy_min) >> +{ >> + m_copy(dst, src, copy_min); >> + m_copy(dst + copy_min, src + copy_min, copy_min); >> +} >> + >> +static inline void copy_2x_as_x2_while_lt( >> + uint8_t *dst, >> + const uint8_t *src, >> + const uint8_t *dst_end, >> + const size_t copy_min) >> +{ >> + copy_2x(dst, src, copy_min); >> + while (dst + (copy_min << 1) < dst_end) >> + copy_2x(dst += (copy_min << 1), src += (copy_min << 1), >> copy_min); >> +} >> + >> +static inline void while_lt_copy_2x_as_x2( >> + uint8_t *dst, >> + const uint8_t *src, >> + const uint8_t *dst_end, >> + const size_t copy_min) >> +{ >> + for (; dst < dst_end; dst += (copy_min << 1), src += (copy_min >> << 1)) >> + copy_2x(dst, src, copy_min); >> +} >> + >> +#endif /* _LZ4K_PRIVATE_H */

1 0

[PATCH OLK-5.10] mm/zram: add lz4k algorithm
by Tu Jinjiang 30 Jun '23

30 Jun '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7H9IA CVE: NA ------------------------------------------- Add lz4k algorithm support for zram. Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> Signed-off-by: Tu Jinjiang <tujinjiang(a)huawei.com> --- crypto/Kconfig | 8 + crypto/Makefile | 1 + crypto/lz4k.c | 97 ++++++ drivers/block/zram/zcomp.c | 3 + include/linux/lz4k.h | 383 +++++++++++++++++++++++ lib/Kconfig | 6 + lib/Makefile | 2 + lib/lz4k/Makefile | 2 + lib/lz4k/lz4k_decode.c | 308 +++++++++++++++++++ lib/lz4k/lz4k_encode.c | 539 +++++++++++++++++++++++++++++++++ lib/lz4k/lz4k_encode_private.h | 137 +++++++++ lib/lz4k/lz4k_private.h | 269 ++++++++++++++++ 12 files changed, 1755 insertions(+) create mode 100644 crypto/lz4k.c create mode 100644 include/linux/lz4k.h create mode 100644 lib/lz4k/Makefile create mode 100644 lib/lz4k/lz4k_decode.c create mode 100644 lib/lz4k/lz4k_encode.c create mode 100644 lib/lz4k/lz4k_encode_private.h create mode 100644 lib/lz4k/lz4k_private.h diff --git a/crypto/Kconfig b/crypto/Kconfig index 64cb304f5103..35223cff7c8a 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1871,6 +1871,14 @@ config CRYPTO_LZ4HC help This is the LZ4 high compression mode algorithm. +config CRYPTO_LZ4K + tristate "LZ4K compression algorithm" + select CRYPTO_ALGAPI + select LZ4K_COMPRESS + select LZ4K_DECOMPRESS + help + This is the LZ4K algorithm. + config CRYPTO_ZSTD tristate "Zstd compression algorithm" select CRYPTO_ALGAPI diff --git a/crypto/Makefile b/crypto/Makefile index 9d1191f2b741..5c3b0a0839c5 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -161,6 +161,7 @@ obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o authencesn.o obj-$(CONFIG_CRYPTO_LZO) += lzo.o lzo-rle.o obj-$(CONFIG_CRYPTO_LZ4) += lz4.o obj-$(CONFIG_CRYPTO_LZ4HC) += lz4hc.o +obj-$(CONFIG_CRYPTO_LZ4K) += lz4k.o obj-$(CONFIG_CRYPTO_XXHASH) += xxhash_generic.o obj-$(CONFIG_CRYPTO_842) += 842.o obj-$(CONFIG_CRYPTO_RNG2) += rng.o diff --git a/crypto/lz4k.c b/crypto/lz4k.c new file mode 100644 index 000000000000..8daceab269ef --- /dev/null +++ b/crypto/lz4k.c @@ -0,0 +1,97 @@ +/* + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights reserved. + * Description: LZ4K compression algorithm for ZRAM + * Author: Arkhipov Denis arkhipov.denis(a)huawei.com + * Create: 2020-03-25 + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/crypto.h> +#include <linux/vmalloc.h> +#include <linux/lz4k.h> + + +struct lz4k_ctx { + void *lz4k_comp_mem; +}; + +static int lz4k_init(struct crypto_tfm *tfm) +{ + struct lz4k_ctx *ctx = crypto_tfm_ctx(tfm); + + ctx->lz4k_comp_mem = vmalloc(lz4k_encode_state_bytes_min()); + if (!ctx->lz4k_comp_mem) + return -ENOMEM; + + return 0; +} + +static void lz4k_exit(struct crypto_tfm *tfm) +{ + struct lz4k_ctx *ctx = crypto_tfm_ctx(tfm); + vfree(ctx->lz4k_comp_mem); +} + +static int lz4k_compress_crypto(struct crypto_tfm *tfm, const u8 *src, unsigned int slen, u8 *dst, unsigned int *dlen) +{ + struct lz4k_ctx *ctx = crypto_tfm_ctx(tfm); + int ret; + + ret = lz4k_encode(ctx->lz4k_comp_mem, src, dst, slen, *dlen, 0); + + if (ret < 0) { + return -EINVAL; + } + + if (ret) + *dlen = ret; + + return 0; +} + +static int lz4k_decompress_crypto(struct crypto_tfm *tfm, const u8 *src, unsigned int slen, u8 *dst, unsigned int *dlen) +{ + int ret; + + ret = lz4k_decode(src, dst, slen, *dlen); + + if (ret <= 0) + return -EINVAL; + *dlen = ret; + return 0; +} + +static struct crypto_alg alg_lz4k = { + .cra_name = "lz4k", + .cra_driver_name = "lz4k-generic", + .cra_flags = CRYPTO_ALG_TYPE_COMPRESS, + .cra_ctxsize = sizeof(struct lz4k_ctx), + .cra_module = THIS_MODULE, + .cra_list = LIST_HEAD_INIT(alg_lz4k.cra_list), + .cra_init = lz4k_init, + .cra_exit = lz4k_exit, + .cra_u = { + .compress = { + .coa_compress = lz4k_compress_crypto, + .coa_decompress = lz4k_decompress_crypto + } + } +}; + +static int __init lz4k_mod_init(void) +{ + return crypto_register_alg(&alg_lz4k); +} + +static void __exit lz4k_mod_fini(void) +{ + crypto_unregister_alg(&alg_lz4k); +} + +module_init(lz4k_mod_init); +module_exit(lz4k_mod_fini); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("LZ4K Compression Algorithm"); +MODULE_ALIAS_CRYPTO("lz4k"); diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index b08650417bf0..28bda2035326 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -29,6 +29,9 @@ static const char * const backends[] = { #if IS_ENABLED(CONFIG_CRYPTO_ZSTD) "zstd", #endif +#if IS_ENABLED(CONFIG_CRYPTO_LZ4K) + "lz4k", +#endif }; static void zcomp_strm_free(struct zcomp_strm *zstrm) diff --git a/include/linux/lz4k.h b/include/linux/lz4k.h new file mode 100644 index 000000000000..6e73161b1840 --- /dev/null +++ b/include/linux/lz4k.h @@ -0,0 +1,383 @@ +/* + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights reserved. + * Description: LZ4K compression algorithm + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com + * Created: 2020-03-25 + */ + +#ifndef _LZ4K_H +#define _LZ4K_H + +/* file lz4k.h + This file contains the platform-independent API of LZ-class + lossless codecs (compressors/decompressors) with complete + in-place documentation. The documentation is formatted + in accordance with DOXYGEN mark-up format. So, one can + generate proper documentation, e.g. in HTML format, using DOXYGEN. + + Currently, LZ-class codecs, documented here, implement following + algorithms for lossless data compression/decompression: + \li "LZ HUAWEI" proprietary codec competing with LZ4 - lz4k_encode(), + lz4k_encode_delta(), lz4k_decode(), lz4k_decode_delta() + + The LZ HUAWEI compressors accept any data as input and compress it + without loss to a smaller size if possible. + Compressed data produced by LZ HUAWEI compressor API lz4k_encode*(), + can be decompressed only by lz4k_decode() API documented below.\n + */ + +/* + lz4k_status defines simple set of status values returned by Huawei APIs + */ +typedef enum { + LZ4K_STATUS_INCOMPRESSIBLE = 0, /* !< Return when data is incompressible */ + LZ4K_STATUS_FAILED = -1, /* !< Return on general failure */ + LZ4K_STATUS_READ_ERROR = -2, /* !< Return when data reading failed */ + LZ4K_STATUS_WRITE_ERROR = -3 /* !< Return when data writing failed */ +} lz4k_status; + +/* + LZ4K_Version() returns static unmutable string with algorithm version + */ +const char *lz4k_version(void); + +/* + lz4k_encode_state_bytes_min() returns number of bytes for state parameter, + supplied to lz4k_encode(), lz4k_encode_delta(), + lz4k_update_delta_state(). + So, state should occupy at least lz4k_encode_state_bytes_min() for mentioned + functions to work correctly. + */ +unsigned lz4k_encode_state_bytes_min(void); + +/* + lz4k_encode() encodes/compresses one input buffer at *in, places + result of encoding into one output buffer at *out if encoded data + size fits specified values of out_max and out_limit. + It returs size of encoded data in case of success or value<=0 otherwise. + The result of successful encoding is in HUAWEI proprietary format, that + is the encoded data can be decoded only by lz4k_decode(). + + \return + \li positive value\n + if encoding was successful. The value returned is the size of encoded + (compressed) data always <=out_max. + \li non-positive value\n + if in==0||in_max==0||out==0||out_max==0 or + if out_max is less than needed for encoded (compressed) data. + \li 0 value\n + if encoded data size >= out_limit + + \param[in] state + !=0, pointer to state buffer used internally by the function. Size of + state in bytes should be at least lz4k_encode_state_bytes_min(). The content + of state buffer will be changed during encoding. + + \param[in] in + !=0, pointer to the input buffer to encode (compress). The content of + the input buffer does not change during encoding. + + \param[in] out + !=0, pointer to the output buffer where to place result of encoding + (compression). + If encoding is unsuccessful, e.g. out_max or out_limit are less than + needed for encoded data then content of out buffer may be arbitrary. + + \param[in] in_max + !=0, size in bytes of the input buffer at *in + + \param[in] out_max + !=0, size in bytes of the output buffer at *out + + \param[in] out_limit + encoded data size soft limit in bytes. Due to performance reasons it is + not guaranteed that + lz4k_encode will always detect that resulting encoded data size is + bigger than out_limit. + Hovewer, when reaching out_limit is detected, lz4k_encode() returns + earlier and spares CPU cycles. Caller code should recheck result + returned by lz4k_encode() (value greater than 0) if it is really + less or equal than out_limit. + out_limit is ignored if it is equal to 0. + */ +int lz4k_encode( + void *const state, + const void *const in, + void *out, + unsigned in_max, + unsigned out_max, + unsigned out_limit); + +/* + lz4k_encode_max_cr() encodes/compresses one input buffer at *in, places + result of encoding into one output buffer at *out if encoded data + size fits specified value of out_max. + It returs size of encoded data in case of success or value<=0 otherwise. + The result of successful encoding is in HUAWEI proprietary format, that + is the encoded data can be decoded only by lz4k_decode(). + + \return + \li positive value\n + if encoding was successful. The value returned is the size of encoded + (compressed) data always <=out_max. + \li non-positive value\n + if in==0||in_max==0||out==0||out_max==0 or + if out_max is less than needed for encoded (compressed) data. + + \param[in] state + !=0, pointer to state buffer used internally by the function. Size of + state in bytes should be at least lz4k_encode_state_bytes_min(). The content + of state buffer will be changed during encoding. + + \param[in] in + !=0, pointer to the input buffer to encode (compress). The content of + the input buffer does not change during encoding. + + \param[in] out + !=0, pointer to the output buffer where to place result of encoding + (compression). + If encoding is unsuccessful, e.g. out_max is less than + needed for encoded data then content of out buffer may be arbitrary. + + \param[in] in_max + !=0, size in bytes of the input buffer at *in + + \param[in] out_max + !=0, size in bytes of the output buffer at *out + + \param[in] out_limit + encoded data size soft limit in bytes. Due to performance reasons it is + not guaranteed that + lz4k_encode will always detect that resulting encoded data size is + bigger than out_limit. + Hovewer, when reaching out_limit is detected, lz4k_encode() returns + earlier and spares CPU cycles. Caller code should recheck result + returned by lz4k_encode() (value greater than 0) if it is really + less or equal than out_limit. + out_limit is ignored if it is equal to 0. + */ +int lz4k_encode_max_cr( + void *const state, + const void *const in, + void *out, + unsigned in_max, + unsigned out_max, + unsigned out_limit); + +/* + lz4k_update_delta_state() fills/updates state (hash table) in the same way as + lz4k_encode does while encoding (compressing). + The state and its content can then be used by lz4k_encode_delta() + to encode (compress) data more efficiently. + By other words, effect of lz4k_update_delta_state() is the same as + lz4k_encode() with all encoded output discarded. + + Example sequence of calls for lz4k_update_delta_state and + lz4k_encode_delta: + //dictionary (1st) block + int result0=lz4k_update_delta_state(state, in0, in0, in_max0); +//delta (2nd) block + int result1=lz4k_encode_delta(state, in0, in, out, in_max, + out_max); + + \param[in] state + !=0, pointer to state buffer used internally by lz4k_encode*. + Size of state in bytes should be at least lz4k_encode_state_bytes_min(). + The content of state buffer is zeroed at the beginning of + lz4k_update_delta_state ONLY when in0==in. + The content of state buffer will be changed inside + lz4k_update_delta_state. + + \param[in] in0 + !=0, pointer to the reference/dictionary input buffer that was used + as input to preceding call of lz4k_encode() or lz4k_update_delta_state() + to fill/update the state buffer. + The content of the reference/dictionary input buffer does not change + during encoding. + The in0 is needed for use-cases when there are several dictionary and + input blocks interleaved, e.g. + <dictionaryA><inputA><dictionaryB><inputB>..., or + <dictionaryA><dictionaryB><inputAB>..., etc. + + \param[in] in + !=0, pointer to the input buffer to fill/update state as if encoding + (compressing) this input. This input buffer is also called dictionary + input buffer. + The content of the input buffer does not change during encoding. + The two buffers - at in0 and at in - should be contiguous in memory. + That is, the last byte of buffer at in0 is located exactly before byte + at in. + + \param[in] in_max + !=0, size in bytes of the input buffer at in. + */ +int lz4k_update_delta_state( + void *const state, + const void *const in0, + const void *const in, + unsigned in_max); + +/* + lz4k_encode_delta() encodes (compresses) data from one input buffer + using one reference buffer as dictionary and places the result of + compression into one output buffer. + The result of successful compression is in HUAWEI proprietary format, so + that compressed data can be decompressed only by lz4k_decode_delta(). + Reference/dictionary buffer and input buffer should be contiguous in + memory. + + Example sequence of calls for lz4k_update_delta_state and + lz4k_encode_delta: +//dictionary (1st) block + int result0=lz4k_update_delta_state(state, in0, in0, in_max0); +//delta (2nd) block + int result1=lz4k_encode_delta(state, in0, in, out, in_max, + out_max); + + Example sequence of calls for lz4k_encode and lz4k_encode_delta: +//dictionary (1st) block + int result0=lz4k_encode(state, in0, out0, in_max0, out_max0); +//delta (2nd) block + int result1=lz4k_encode_delta(state, in0, in, out, in_max, + out_max); + + \return + \li positive value\n + if encoding was successful. The value returned is the size of encoded + (compressed) data. + \li non-positive value\n + if state==0||in0==0||in==0||in_max==0||out==0||out_max==0 or + if out_max is less than needed for encoded (compressed) data. + + \param[in] state + !=0, pointer to state buffer used internally by the function. Size of + state in bytes should be at least lz4k_encode_state_bytes_min(). For more + efficient encoding the state buffer may be filled/updated by calling + lz4k_update_delta_state() or lz4k_encode() before lz4k_encode_delta(). + The content of state buffer is zeroed at the beginning of + lz4k_encode_delta() ONLY when in0==in. + The content of state will be changed during encoding. + + \param[in] in0 + !=0, pointer to the reference/dictionary input buffer that was used as + input to preceding call of lz4k_encode() or lz4k_update_delta_state() to + fill/update the state buffer. + The content of the reference/dictionary input buffer does not change + during encoding. + + \param[in] in + !=0, pointer to the input buffer to encode (compress). The input buffer + is compressed using content of the reference/dictionary input buffer at + in0. The content of the input buffer does not change during encoding. + The two buffers - at *in0 and at *in - should be contiguous in memory. + That is, the last byte of buffer at *in0 is located exactly before byte + at *in. + + \param[in] out + !=0, pointer to the output buffer where to place result of encoding + (compression). If compression is unsuccessful then content of out + buffer may be arbitrary. + + \param[in] in_max + !=0, size in bytes of the input buffer at *in + + \param[in] out_max + !=0, size in bytes of the output buffer at *out. + */ +int lz4k_encode_delta( + void *const state, + const void *const in0, + const void *const in, + void *out, + unsigned in_max, + unsigned out_max); + +/* + lz4k_decode() decodes (decompresses) data from one input buffer and places + the result of decompression into one output buffer. The encoded data in input + buffer should be in HUAWEI proprietary format, produced by lz4k_encode() + or by lz4k_encode_delta(). + + \return + \li positive value\n + if decoding was successful. The value returned is the size of decoded + (decompressed) data. + \li non-positive value\n + if in==0||in_max==0||out==0||out_max==0 or + if out_max is less than needed for decoded (decompressed) data or + if input encoded data format is corrupted. + + \param[in] in + !=0, pointer to the input buffer to decode (decompress). The content of + the input buffer does not change during decoding. + + \param[in] out + !=0, pointer to the output buffer where to place result of decoding + (decompression). If decompression is unsuccessful then content of out + buffer may be arbitrary. + + \param[in] in_max + !=0, size in bytes of the input buffer at in + + \param[in] out_max + !=0, size in bytes of the output buffer at out + */ +int lz4k_decode( + const void *const in, + void *const out, + unsigned in_max, + unsigned out_max); + +/* + lz4k_decode_delta() decodes (decompresses) data from one input buffer + and places the result of decompression into one output buffer. The + compressed data in input buffer should be in format, produced by + lz4k_encode_delta(). + + Example sequence of calls for lz4k_decode and lz4k_decode_delta: +//dictionary (1st) block + int result0=lz4k_decode(in0, out0, in_max0, out_max0); +//delta (2nd) block + int result1=lz4k_decode_delta(in, out0, out, in_max, out_max); + + \return + \li positive value\n + if decoding was successful. The value returned is the size of decoded + (decompressed) data. + \li non-positive value\n + if in==0||in_max==0||out==0||out_max==0 or + if out_max is less than needed for decoded (decompressed) data or + if input data format is corrupted. + + \param[in] in + !=0, pointer to the input buffer to decode (decompress). The content of + the input buffer does not change during decoding. + + \param[in] out0 + !=0, pointer to the dictionary input buffer that was used as input to + lz4k_update_delta_state() to fill/update the state buffer. The content + of the dictionary input buffer does not change during decoding. + + \param[in] out + !=0, pointer to the output buffer where to place result of decoding + (decompression). If decompression is unsuccessful then content of out + buffer may be arbitrary. + The two buffers - at *out0 and at *out - should be contiguous in memory. + That is, the last byte of buffer at *out0 is located exactly before byte + at *out. + + \param[in] in_max + !=0, size in bytes of the input buffer at *in + + \param[in] out_max + !=0, size in bytes of the output buffer at *out + */ +int lz4k_decode_delta( + const void *in, + const void *const out0, + void *const out, + unsigned in_max, + unsigned out_max); + + +#endif /* _LZ4K_H */ diff --git a/lib/Kconfig b/lib/Kconfig index 36326864249d..4bf1c2c21157 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -310,6 +310,12 @@ config LZ4HC_COMPRESS config LZ4_DECOMPRESS tristate +config LZ4K_COMPRESS + tristate + +config LZ4K_DECOMPRESS + tristate + config ZSTD_COMPRESS select XXHASH tristate diff --git a/lib/Makefile b/lib/Makefile index a803e1527c4b..bd0d3635ae46 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -187,6 +187,8 @@ obj-$(CONFIG_LZO_DECOMPRESS) += lzo/ obj-$(CONFIG_LZ4_COMPRESS) += lz4/ obj-$(CONFIG_LZ4HC_COMPRESS) += lz4/ obj-$(CONFIG_LZ4_DECOMPRESS) += lz4/ +obj-$(CONFIG_LZ4K_COMPRESS) += lz4k/ +obj-$(CONFIG_LZ4K_DECOMPRESS) += lz4k/ obj-$(CONFIG_ZSTD_COMPRESS) += zstd/ obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd/ obj-$(CONFIG_XZ_DEC) += xz/ diff --git a/lib/lz4k/Makefile b/lib/lz4k/Makefile new file mode 100644 index 000000000000..6ea3578639d4 --- /dev/null +++ b/lib/lz4k/Makefile @@ -0,0 +1,2 @@ +obj-$(CONFIG_LZ4K_COMPRESS) += lz4k_encode.o +obj-$(CONFIG_LZ4K_DECOMPRESS) += lz4k_decode.o \ No newline at end of file diff --git a/lib/lz4k/lz4k_decode.c b/lib/lz4k/lz4k_decode.c new file mode 100644 index 000000000000..567b76b7bc51 --- /dev/null +++ b/lib/lz4k/lz4k_decode.c @@ -0,0 +1,308 @@ +/* + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights reserved. + * Description: LZ4K compression algorithm + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com + * Created: 2020-03-25 + */ + +#if !defined(__KERNEL__) +#include "lz4k.h" +#else +#include <linux/lz4k.h> +#include <linux/module.h> +#endif + +#include "lz4k_private.h" /* types, etc */ + +static const uint8_t *get_size( + uint_fast32_t *size, + const uint8_t *in_at, + const uint8_t *const in_end) +{ + uint_fast32_t u; + do { + if (unlikely(in_at >= in_end)) + return NULL; + *size += (u = *(const uint8_t*)in_at); + ++in_at; + } while (BYTE_MAX == u); + return in_at; +} + +static int end_of_block( + const uint_fast32_t nr_bytes_max, + const uint_fast32_t r_bytes_max, + const uint8_t *const in_at, + const uint8_t *const in_end, + const uint8_t *const out, + const uint8_t *const out_at) +{ + if (!nr_bytes_max) + return LZ4K_STATUS_FAILED; /* should be the last one in block */ + if (r_bytes_max != REPEAT_MIN) + return LZ4K_STATUS_FAILED; /* should be the last one in block */ + if (in_at != in_end) + return LZ4K_STATUS_FAILED; /* should be the last one in block */ + return (int)(out_at - out); +} + +enum { + NR_COPY_MIN = 16, + R_COPY_MIN = 16, + R_COPY_SAFE = R_COPY_MIN - 1, + R_COPY_SAFE_2X = (R_COPY_MIN << 1) - 1 +}; + +static bool out_non_repeat( + const uint8_t **in_at, + uint8_t **out_at, + uint_fast32_t nr_bytes_max, + const uint8_t *const in_end, + const uint8_t *const out_end) +{ + const uint8_t *const in_copy_end = *in_at + nr_bytes_max; + uint8_t *const out_copy_end = *out_at + nr_bytes_max; + if (likely(nr_bytes_max <= NR_COPY_MIN)) { + if (likely(*in_at <= in_end - NR_COPY_MIN && + *out_at <= out_end - NR_COPY_MIN)) + m_copy(*out_at, *in_at, NR_COPY_MIN); + else if (in_copy_end <= in_end && out_copy_end <= out_end) + m_copy(*out_at, *in_at, nr_bytes_max); + else + return false; + } else { + if (likely(in_copy_end <= in_end - NR_COPY_MIN && + out_copy_end <= out_end - NR_COPY_MIN)) { + m_copy(*out_at, *in_at, NR_COPY_MIN); + copy_x_while_lt(*out_at + NR_COPY_MIN, + *in_at + NR_COPY_MIN, + out_copy_end, NR_COPY_MIN); + } else if (in_copy_end <= in_end && out_copy_end <= out_end) { + m_copy(*out_at, *in_at, nr_bytes_max); + } else { /* in_copy_end > in_end || out_copy_end > out_end */ + return false; + } + } + *in_at = in_copy_end; + *out_at = out_copy_end; + return true; +} + +static void out_repeat_overlap( + uint_fast32_t offset, + uint8_t *out_at, + const uint8_t *out_from, + const uint8_t *const out_copy_end) +{ /* (1 < offset < R_COPY_MIN/2) && out_copy_end + R_COPY_SAFE_2X <= out_end */ + enum { + COPY_MIN = R_COPY_MIN >> 1, + OFFSET_LIMIT = COPY_MIN >> 1 + }; + m_copy(out_at, out_from, COPY_MIN); + out_at += offset; + if (offset <= OFFSET_LIMIT) + offset <<= 1; + do { + m_copy(out_at, out_from, COPY_MIN); + out_at += offset; + if (offset <= OFFSET_LIMIT) + offset <<= 1; + } while (out_at - out_from < R_COPY_MIN); + while_lt_copy_2x_as_x2(out_at, out_from, out_copy_end, R_COPY_MIN); +} + +static bool out_repeat_slow( + uint_fast32_t r_bytes_max, + uint_fast32_t offset, + uint8_t *out_at, + const uint8_t *out_from, + const uint8_t *const out_copy_end, + const uint8_t *const out_end) +{ + if (offset > 1 && out_copy_end <= out_end - R_COPY_SAFE_2X) { + out_repeat_overlap(offset, out_at, out_from, out_copy_end); + } else { + if (unlikely(out_copy_end > out_end)) + return false; + if (offset == 1) { + m_set(out_at, *out_from, r_bytes_max); + } else { + do + *out_at++ = *out_from++; + while (out_at < out_copy_end); + } + } + return true; +} + +static int decode( + const uint8_t *const in, + const uint8_t *const out0, + uint8_t *const out, + const uint8_t *const in_end, + const uint8_t *const out_end, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2) +{ + const uint_fast32_t r_log2 = TAG_BITS_MAX - (off_log2 + nr_log2); + const uint8_t *in_at = in; + const uint8_t *const in_end_minus_x = in_end - TAG_BYTES_MAX; + uint8_t *out_at = out; + while (likely(in_at <= in_end_minus_x)) { + const uint_fast32_t utag = read4_at(in_at - 1) >> BYTE_BITS; + const uint_fast32_t offset = utag & mask(off_log2); + uint_fast32_t nr_bytes_max = utag >> (off_log2 + r_log2), + r_bytes_max = ((utag >> off_log2) & mask(r_log2)) + + REPEAT_MIN; + const uint8_t *out_from = 0; + uint8_t *out_copy_end = 0; + in_at += TAG_BYTES_MAX; + if (unlikely(nr_bytes_max == mask(nr_log2))) { + in_at = get_size(&nr_bytes_max, in_at, in_end); + if (in_at == NULL) + return LZ4K_STATUS_READ_ERROR; + } + if (!out_non_repeat(&in_at, &out_at, nr_bytes_max, in_end, out_end)) + return LZ4K_STATUS_FAILED; + if (unlikely(r_bytes_max == mask(r_log2) + REPEAT_MIN)) { + in_at = get_size(&r_bytes_max, in_at, in_end); + if (in_at == NULL) + return LZ4K_STATUS_READ_ERROR; + } + out_from = out_at - offset; + if (unlikely(out_from < out0)) + return LZ4K_STATUS_FAILED; + out_copy_end = out_at + r_bytes_max; + if (likely(offset >= R_COPY_MIN && + out_copy_end <= out_end - R_COPY_SAFE_2X)) { + copy_2x_as_x2_while_lt(out_at, out_from, out_copy_end, + R_COPY_MIN); + } else if (likely(offset >= (R_COPY_MIN >> 1) && + out_copy_end <= out_end - R_COPY_SAFE_2X)) { + m_copy(out_at, out_from, R_COPY_MIN); + out_at += offset; + while_lt_copy_x(out_at, out_from, out_copy_end, R_COPY_MIN); + /* faster than 2x */ + } else if (likely(offset > 0)) { + if (!out_repeat_slow(r_bytes_max, offset, out_at, out_from, + out_copy_end, out_end)) + return LZ4K_STATUS_FAILED; + } else { /* offset == 0: EOB, last literal */ + return end_of_block(nr_bytes_max, r_bytes_max, in_at, + in_end, out, out_at); + } + out_at = out_copy_end; + } + return in_at == in_end ? (int)(out_at - out) : LZ4K_STATUS_FAILED; +} + +static int decode4kb( + const uint8_t *const in, + const uint8_t *const out0, + uint8_t *const out, + const uint8_t *const in_end, + const uint8_t *const out_end) +{ + enum { + NR_LOG2 = 6 + }; + return decode(in, out0, out, in_end, out_end, NR_LOG2, BLOCK_4KB_LOG2); +} + +static int decode8kb( + const uint8_t *const in, + const uint8_t *const out0, + uint8_t *const out, + const uint8_t *const in_end, + const uint8_t *const out_end) +{ + enum { + NR_LOG2 = 5 + }; + return decode(in, out0, out, in_end, out_end, NR_LOG2, BLOCK_8KB_LOG2); +} + +static int decode16kb( + const uint8_t *const in, + const uint8_t *const out0, + uint8_t *const out, + const uint8_t *const in_end, + const uint8_t *const out_end) +{ + enum { + NR_LOG2 = 5 + }; + return decode(in, out0, out, in_end, out_end, NR_LOG2, BLOCK_16KB_LOG2); +} + +static int decode32kb( + const uint8_t *const in, + const uint8_t *const out0, + uint8_t *const out, + const uint8_t *const in_end, + const uint8_t *const out_end) +{ + enum { + NR_LOG2 = 4 + }; + return decode(in, out0, out, in_end, out_end, NR_LOG2, BLOCK_32KB_LOG2); +} + +static int decode64kb( + const uint8_t *const in, + const uint8_t *const out0, + uint8_t *const out, + const uint8_t *const in_end, + const uint8_t *const out_end) +{ + enum { + NR_LOG2 = 4 + }; + return decode(in, out0, out, in_end, out_end, NR_LOG2, BLOCK_64KB_LOG2); +} + +static inline const void *u8_inc(const uint8_t *a) +{ + return a+1; +} + +int lz4k_decode( + const void *in, + void *const out, + unsigned in_max, + unsigned out_max) +{ + /* ++use volatile pointers to prevent compiler optimizations */ + const uint8_t *volatile in_end = (const uint8_t*)in + in_max; + const uint8_t *volatile out_end = (uint8_t*)out + out_max; + uint8_t in_log2 = 0; + if (unlikely(in == NULL || out == NULL || in_max <= 4 || out_max <= 0)) + return LZ4K_STATUS_FAILED; + in_log2 = (uint8_t)(BLOCK_4KB_LOG2 + *(const uint8_t*)in); + /* invalid buffer size or pointer overflow */ + if (unlikely((const uint8_t*)in >= in_end || (uint8_t*)out >= out_end)) + return LZ4K_STATUS_FAILED; + /* -- */ + in = u8_inc((const uint8_t*)in); + --in_max; + if (in_log2 < BLOCK_8KB_LOG2) + return decode4kb((const uint8_t*)in, (uint8_t*)out, + (uint8_t*)out, in_end, out_end); + if (in_log2 == BLOCK_8KB_LOG2) + return decode8kb((const uint8_t*)in, (uint8_t*)out, + (uint8_t*)out, in_end, out_end); + if (in_log2 == BLOCK_16KB_LOG2) + return decode16kb((const uint8_t*)in, (uint8_t*)out, + (uint8_t*)out, in_end, out_end); + if (in_log2 == BLOCK_32KB_LOG2) + return decode32kb((const uint8_t*)in, (uint8_t*)out, + (uint8_t*)out, in_end, out_end); + if (in_log2 == BLOCK_64KB_LOG2) + return decode64kb((const uint8_t*)in, (uint8_t*)out, + (uint8_t*)out, in_end, out_end); + return LZ4K_STATUS_FAILED; +} +EXPORT_SYMBOL(lz4k_decode); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_DESCRIPTION("LZ4K decoder"); diff --git a/lib/lz4k/lz4k_encode.c b/lib/lz4k/lz4k_encode.c new file mode 100644 index 000000000000..a425d3a0b827 --- /dev/null +++ b/lib/lz4k/lz4k_encode.c @@ -0,0 +1,539 @@ +/* + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights reserved. + * Description: LZ4K compression algorithm + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com + * Created: 2020-03-25 + */ + +#if !defined(__KERNEL__) +#include "lz4k.h" +#else +#include <linux/lz4k.h> +#include <linux/module.h> +#endif + +#include "lz4k_private.h" +#include "lz4k_encode_private.h" + +static uint8_t *out_size_bytes(uint8_t *out_at, uint_fast32_t u) +{ + for (; unlikely(u >= BYTE_MAX); u -= BYTE_MAX) + *out_at++ = (uint8_t)BYTE_MAX; + *out_at++ = (uint8_t)u; + return out_at; +} + +static inline uint8_t *out_utag_then_bytes_left( + uint8_t *out_at, + uint_fast32_t utag, + uint_fast32_t bytes_left) +{ + m_copy(out_at, &utag, TAG_BYTES_MAX); + return out_size_bytes(out_at + TAG_BYTES_MAX, bytes_left); +} + +static int out_tail( + uint8_t *out_at, + uint8_t *const out_end, + const uint8_t *const out, + const uint8_t *const nr0, + const uint8_t *const in_end, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + bool check_out) +{ + const uint_fast32_t nr_mask = mask(nr_log2); + const uint_fast32_t r_log2 = TAG_BITS_MAX - (off_log2 + nr_log2); + const uint_fast32_t nr_bytes_max = u_32(in_end - nr0); + if (encoded_bytes_min(nr_log2, nr_bytes_max) > u_32(out_end - out_at)) + return check_out ? LZ4K_STATUS_WRITE_ERROR : + LZ4K_STATUS_INCOMPRESSIBLE; + if (nr_bytes_max < nr_mask) { + /* caller guarantees at least one nr-byte */ + uint_fast32_t utag = (nr_bytes_max << (off_log2 + r_log2)); + m_copy(out_at, &utag, TAG_BYTES_MAX); + out_at += TAG_BYTES_MAX; + } else { + uint_fast32_t bytes_left = nr_bytes_max - nr_mask; + uint_fast32_t utag = (nr_mask << (off_log2 + r_log2)); + out_at = out_utag_then_bytes_left(out_at, utag, bytes_left); + } + m_copy(out_at, nr0, nr_bytes_max); + return (int)(out_at + nr_bytes_max - out); +} + +int lz4k_out_tail( + uint8_t *out_at, + uint8_t *const out_end, + const uint8_t *const out, + const uint8_t *const nr0, + const uint8_t *const in_end, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + bool check_out) +{ + return out_tail(out_at, out_end, out, nr0, in_end, + nr_log2, off_log2, check_out); +} + +static uint8_t *out_non_repeat( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t utag, + const uint8_t *const nr0, + const uint8_t *const r, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + bool check_out) +{ + const uint_fast32_t nr_bytes_max = u_32(r - nr0); + const uint_fast32_t nr_mask = mask(nr_log2), + r_log2 = TAG_BITS_MAX - (off_log2 + nr_log2); + if (likely(nr_bytes_max < nr_mask)) { + if (unlikely(check_out && + TAG_BYTES_MAX + nr_bytes_max > u_32(out_end - out_at))) + return NULL; + utag |= (nr_bytes_max << (off_log2 + r_log2)); + m_copy(out_at, &utag, TAG_BYTES_MAX); + out_at += TAG_BYTES_MAX; + } else { + uint_fast32_t bytes_left = nr_bytes_max - nr_mask; + if (unlikely(check_out && + TAG_BYTES_MAX + size_bytes_count(bytes_left) + nr_bytes_max > + u_32(out_end - out_at))) + return NULL; + utag |= (nr_mask << (off_log2 + r_log2)); + out_at = out_utag_then_bytes_left(out_at, utag, bytes_left); + } + if (unlikely(check_out)) + m_copy(out_at, nr0, nr_bytes_max); + else + copy_x_while_total(out_at, nr0, nr_bytes_max, NR_COPY_MIN); + out_at += nr_bytes_max; + return out_at; +} + +uint8_t *lz4k_out_non_repeat( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t utag, + const uint8_t *const nr0, + const uint8_t *const r, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + bool check_out) +{ + return out_non_repeat(out_at, out_end, utag, nr0, r, + nr_log2, off_log2, check_out); +} + +static uint8_t *out_r_bytes_left( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t r_bytes_max, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out) /* =false when +out_max>=encoded_bytes_max(in_max), =true otherwise */ +{ + const uint_fast32_t r_mask = mask(TAG_BITS_MAX - (off_log2 + nr_log2)); + if (unlikely(r_bytes_max - REPEAT_MIN >= r_mask)) { + uint_fast32_t bytes_left = r_bytes_max - REPEAT_MIN - r_mask; + if (unlikely(check_out && + size_bytes_count(bytes_left) > u_32(out_end - out_at))) + return NULL; + out_at = out_size_bytes(out_at, bytes_left); + } + return out_at; /* SUCCESS: continue compression */ +} + +uint8_t *lz4k_out_r_bytes_left( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t r_bytes_max, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out) +{ + return out_r_bytes_left(out_at, out_end, r_bytes_max, + nr_log2, off_log2, check_out); +} + +static uint8_t *out_repeat( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t utag, + uint_fast32_t r_bytes_max, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out) /* =false when +out_max>=encoded_bytes_max(in_max), =true otherwise */ +{ + const uint_fast32_t r_mask = mask(TAG_BITS_MAX - (off_log2 + nr_log2)); + if (likely(r_bytes_max - REPEAT_MIN < r_mask)) { + if (unlikely(check_out && TAG_BYTES_MAX > u_32(out_end - out_at))) + return NULL; + utag |= ((r_bytes_max - REPEAT_MIN) << off_log2); + m_copy(out_at, &utag, TAG_BYTES_MAX); + out_at += TAG_BYTES_MAX; + } else { + uint_fast32_t bytes_left = r_bytes_max - REPEAT_MIN - r_mask; + if (unlikely(check_out && + TAG_BYTES_MAX + size_bytes_count(bytes_left) > + u_32(out_end - out_at))) + return NULL; + utag |= (r_mask << off_log2); + out_at = out_utag_then_bytes_left(out_at, utag, bytes_left); + } + return out_at; /* SUCCESS: continue compression */ +} + +uint8_t *lz4k_out_repeat( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t utag, + uint_fast32_t r_bytes_max, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out) +{ + return out_repeat(out_at, out_end, utag, r_bytes_max, + nr_log2, off_log2, check_out); +} + +static const uint8_t *repeat_end( + const uint8_t *q, + const uint8_t *r, + const uint8_t *const in_end_safe, + const uint8_t *const in_end) +{ + q += REPEAT_MIN; + r += REPEAT_MIN; + /* caller guarantees r+12<=in_end */ + do { + const uint64_t x = read8_at(q) ^ read8_at(r); + if (x) { + const uint16_t ctz = (uint16_t)__builtin_ctzl(x); + return r + (ctz >> BYTE_BITS_LOG2); + } + /* some bytes differ:+ count of trailing 0-bits/bytes */ + q += sizeof(uint64_t), r += sizeof(uint64_t); + } while (likely(r <= in_end_safe)); /* once, at input block end */ + do { + if (*q != *r) return r; + ++q; + ++r; + } while (r < in_end); + return r; +} + +const uint8_t *lz4k_repeat_end( + const uint8_t *q, + const uint8_t *r, + const uint8_t *const in_end_safe, + const uint8_t *const in_end) +{ + return repeat_end(q, r, in_end_safe, in_end); +} + +enum { + HT_BYTES_LOG2 = HT_LOG2 + 1 +}; + +inline unsigned encode_state_bytes_min(void) +{ + unsigned bytes_total = (1U << HT_BYTES_LOG2); + return bytes_total; +} + +#if !defined(LZ4K_DELTA) && !defined(LZ4K_MAX_CR) + +unsigned lz4k_encode_state_bytes_min(void) +{ + return encode_state_bytes_min(); +} +EXPORT_SYMBOL(lz4k_encode_state_bytes_min); + +#endif /* !defined(LZ4K_DELTA) && !defined(LZ4K_MAX_CR) */ + +/* CR increase order: +STEP, have OFFSETS, use _5b */ +/* *_6b to compete with LZ4 */ +static inline uint_fast32_t hash0_v(const uint64_t r, uint32_t shift) +{ + return hash64v_6b(r, shift); +} + +static inline uint_fast32_t hash0(const uint8_t *r, uint32_t shift) +{ + return hash64_6b(r, shift); +} + +/* + * Proof that 'r' increments are safe-NO pointer overflows are possible: + * + * While using STEP_LOG2=5, step_start=1<<STEP_LOG2 == 32 we increment s + * 32 times by 1, 32 times by 2, 32 times by 3, and so on: + * 32*1+32*2+32*3+...+32*31 == 32*SUM(1..31) == 32*((1+31)*15+16). + * So, we can safely increment s by at most 31 for input block size <= + * 1<<13 < 15872. + * + * More precisely, STEP_LIMIT == x for any input block calculated as follows: + * 1<<off_log2 >= (1<<STEP_LOG2)*((x+1)(x-1)/2+x/2) ==> + * 1<<(off_log2-STEP_LOG2+1) >= x^2+x-1 ==> + * x^2+x-1-1<<(off_log2-STEP_LOG2+1) == 0, which is solved by standard + * method. + * To avoid overhead here conservative approximate value of x is calculated + * as average of two nearest square roots, see STEP_LIMIT above. + */ + +enum { + STEP_LOG2 = 5 /* increase for better CR */ +}; + +static int encode_any( + uint16_t *const ht0, + const uint8_t *const in0, + uint8_t *const out, + const uint8_t *const in_end, + uint8_t *const out_end, /* ==out_limit for !check_out */ + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out) +{ /* caller guarantees off_log2 <=16 */ + uint8_t *out_at = out; + const uint8_t *const in_end_safe = in_end - NR_COPY_MIN; + const uint8_t *r = in0; + const uint8_t *nr0 = r++; + uint_fast32_t step = 1 << STEP_LOG2; + for (;;) { + uint_fast32_t utag = 0; + const uint8_t *r_end = 0; + uint_fast32_t r_bytes_max = 0; + const uint8_t *const q = hashed(in0, ht0, hash0(r, HT_LOG2), r); + if (!equal4(q, r)) { + r += (++step >> STEP_LOG2); + if (unlikely(r > in_end_safe)) + return out_tail(out_at, out_end, out, nr0, in_end, + nr_log2, off_log2, check_out); + continue; + } + utag = u_32(r - q); + r_end = repeat_end(q, r, in_end_safe, in_end); + r = repeat_start(q, r, nr0, in0); + r_bytes_max = u_32(r_end - r); + if (nr0 == r) { + out_at = out_repeat(out_at, out_end, utag, r_bytes_max, + nr_log2, off_log2, check_out); + } else { + update_utag(r_bytes_max, &utag, nr_log2, off_log2); + out_at = out_non_repeat(out_at, out_end, utag, nr0, r, + nr_log2, off_log2, check_out); + if (unlikely(check_out && out_at == NULL)) + return LZ4K_STATUS_WRITE_ERROR; + out_at = out_r_bytes_left(out_at, out_end, r_bytes_max, + nr_log2, off_log2, check_out); + } + if (unlikely(check_out && out_at == NULL)) + return LZ4K_STATUS_WRITE_ERROR; + nr0 = (r += r_bytes_max); + if (unlikely(r > in_end_safe)) + return r == in_end ? (int)(out_at - out) : + out_tail(out_at, out_end, out, r, in_end, + nr_log2, off_log2, check_out); + ht0[hash0(r - 1 - 1, HT_LOG2)] = (uint16_t)(r - 1 - 1 - in0); + step = 1 << STEP_LOG2; + } +} + +static int encode_fast( + uint16_t *const ht, + const uint8_t *const in, + uint8_t *const out, + const uint8_t *const in_end, + uint8_t *const out_end, /* ==out_limit for !check_out */ + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2) +{ /* caller guarantees off_log2 <=16 */ + return encode_any(ht, in, out, in_end, out_end, nr_log2, off_log2, + false); /* !check_out */ +} + +static int encode_slow( + uint16_t *const ht, + const uint8_t *const in, + uint8_t *const out, + const uint8_t *const in_end, + uint8_t *const out_end, /* ==out_limit for !check_out */ + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2) +{ /* caller guarantees off_log2 <=16 */ + return encode_any(ht, in, out, in_end, out_end, nr_log2, off_log2, + true); /* check_out */ +} + +static int encode4kb( + uint16_t *const state, + const uint8_t *const in, + uint8_t *out, + const uint_fast32_t in_max, + const uint_fast32_t out_max, + const uint_fast32_t out_limit) +{ + enum { + NR_LOG2 = 6 + }; + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? + encode_slow(state, in, out, in + in_max, out + out_max, + NR_LOG2, BLOCK_4KB_LOG2) : + encode_fast(state, in, out, in + in_max, out + out_limit, + NR_LOG2, BLOCK_4KB_LOG2); + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ +} + +static int encode8kb( + uint16_t *const state, + const uint8_t *const in, + uint8_t *out, + const uint_fast32_t in_max, + const uint_fast32_t out_max, + const uint_fast32_t out_limit) +{ + enum { + NR_LOG2 = 5 + }; + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? + encode_slow(state, in, out, in + in_max, out + out_max, + NR_LOG2, BLOCK_8KB_LOG2) : + encode_fast(state, in, out, in + in_max, out + out_limit, + NR_LOG2, BLOCK_8KB_LOG2); + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ +} + +static int encode16kb( + uint16_t *const state, + const uint8_t *const in, + uint8_t *out, + const uint_fast32_t in_max, + const uint_fast32_t out_max, + const uint_fast32_t out_limit) +{ + enum { + NR_LOG2 = 5 + }; + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? + encode_slow(state, in, out, in + in_max, out + out_max, + NR_LOG2, BLOCK_16KB_LOG2) : + encode_fast(state, in, out, in + in_max, out + out_limit, + NR_LOG2, BLOCK_16KB_LOG2); + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ +} + +static int encode32kb( + uint16_t *const state, + const uint8_t *const in, + uint8_t *out, + const uint_fast32_t in_max, + const uint_fast32_t out_max, + const uint_fast32_t out_limit) +{ + enum { + NR_LOG2 = 4 + }; + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? + encode_slow(state, in, out, in + in_max, out + out_max, + NR_LOG2, BLOCK_32KB_LOG2) : + encode_fast(state, in, out, in + in_max, out + out_limit, + NR_LOG2, BLOCK_32KB_LOG2); + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ +} + +static int encode64kb( + uint16_t *const state, + const uint8_t *const in, + uint8_t *out, + const uint_fast32_t in_max, + const uint_fast32_t out_max, + const uint_fast32_t out_limit) +{ + enum { + NR_LOG2 = 4 + }; + const int result = (encoded_bytes_max(NR_LOG2, in_max) > out_max) ? + encode_slow(state, in, out, in + in_max, out + out_max, + NR_LOG2, BLOCK_64KB_LOG2) : + encode_fast(state, in, out, in + in_max, out + out_limit, + NR_LOG2, BLOCK_64KB_LOG2); + return result <= 0 ? result : result + 1; /* +1 for in_log2 */ +} + +static int encode( + uint16_t *const state, + const uint8_t *const in, + uint8_t *out, + uint_fast32_t in_max, + uint_fast32_t out_max, + uint_fast32_t out_limit) +{ + const uint8_t in_log2 = (uint8_t)(most_significant_bit_of( + round_up_to_power_of2(in_max - REPEAT_MIN))); + m_set(state, 0, encode_state_bytes_min()); + *out = in_log2 > BLOCK_4KB_LOG2 ? (uint8_t)(in_log2 - BLOCK_4KB_LOG2) : 0; + ++out; + --out_max; + --out_limit; + if (in_log2 < BLOCK_8KB_LOG2) + return encode4kb(state, in, out, in_max, out_max, out_limit); + if (in_log2 == BLOCK_8KB_LOG2) + return encode8kb(state, in, out, in_max, out_max, out_limit); + if (in_log2 == BLOCK_16KB_LOG2) + return encode16kb(state, in, out, in_max, out_max, out_limit); + if (in_log2 == BLOCK_32KB_LOG2) + return encode32kb(state, in, out, in_max, out_max, out_limit); + if (in_log2 == BLOCK_64KB_LOG2) + return encode64kb(state, in, out, in_max, out_max, out_limit); + return LZ4K_STATUS_FAILED; +} + +int lz4k_encode( + void *const state, + const void *const in, + void *out, + unsigned in_max, + unsigned out_max, + unsigned out_limit) +{ + const unsigned gain_max = 64 > (in_max >> 6) ? 64 : (in_max >> 6); + const unsigned out_limit_min = in_max < out_max ? in_max : out_max; + const uint8_t *volatile in_end = (const uint8_t*)in + in_max; + const uint8_t *volatile out_end = (uint8_t*)out + out_max; + const void *volatile state_end = + (uint8_t*)state + encode_state_bytes_min(); + if (unlikely(state == NULL)) + return LZ4K_STATUS_FAILED; + if (unlikely(in == NULL || out == NULL)) + return LZ4K_STATUS_FAILED; + if (unlikely(in_max <= gain_max)) + return LZ4K_STATUS_INCOMPRESSIBLE; + if (unlikely(out_max <= gain_max)) /* need 1 byte for in_log2 */ + return LZ4K_STATUS_FAILED; + /* ++use volatile pointers to prevent compiler optimizations */ + if (unlikely((const uint8_t*)in >= in_end || (uint8_t*)out >= out_end)) + return LZ4K_STATUS_FAILED; + if (unlikely(state >= state_end)) + return LZ4K_STATUS_FAILED; /* pointer overflow */ + if (!out_limit || out_limit >= out_limit_min) + out_limit = out_limit_min - gain_max; + return encode((uint16_t*)state, (const uint8_t*)in, (uint8_t*)out, + in_max, out_max, out_limit); +} +EXPORT_SYMBOL(lz4k_encode); + +const char *lz4k_version(void) +{ + static const char *version = "2020.07.07"; + return version; +} +EXPORT_SYMBOL(lz4k_version); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_DESCRIPTION("LZ4K encoder"); diff --git a/lib/lz4k/lz4k_encode_private.h b/lib/lz4k/lz4k_encode_private.h new file mode 100644 index 000000000000..eb5cd162468f --- /dev/null +++ b/lib/lz4k/lz4k_encode_private.h @@ -0,0 +1,137 @@ +/* + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights reserved. + * Description: LZ4K compression algorithm + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com + * Created: 2020-03-25 + */ + +#ifndef _LZ4K_ENCODE_PRIVATE_H +#define _LZ4K_ENCODE_PRIVATE_H + +#include "lz4k_private.h" + +/* <nrSize bytes for whole block>+<1 terminating 0 byte> */ +static inline uint_fast32_t size_bytes_count(uint_fast32_t u) +{ + return (u + BYTE_MAX - 1) / BYTE_MAX; +} + +/* minimum encoded size for non-compressible data */ +static inline uint_fast32_t encoded_bytes_min( + uint_fast32_t nr_log2, + uint_fast32_t in_max) +{ + return in_max < mask(nr_log2) ? + TAG_BYTES_MAX + in_max : + TAG_BYTES_MAX + size_bytes_count(in_max - mask(nr_log2)) + in_max; +} + +enum { + NR_COPY_LOG2 = 4, + NR_COPY_MIN = 1 << NR_COPY_LOG2 +}; + +static inline uint_fast32_t u_32(int64_t i) +{ + return (uint_fast32_t)i; +} + +/* maximum encoded size for non-comprressible data if "fast" encoder is used */ +static inline uint_fast32_t encoded_bytes_max( + uint_fast32_t nr_log2, + uint_fast32_t in_max) +{ + uint_fast32_t r = TAG_BYTES_MAX + (uint32_t)round_up_to_log2(in_max, NR_COPY_LOG2); + return in_max < mask(nr_log2) ? r : r + size_bytes_count(in_max - mask(nr_log2)); +} + +enum { + HT_LOG2 = 12 +}; + +/* + * Compressed data format (where {} means 0 or more occurrences, [] means + * optional): + * <24bits tag: (off_log2 rOffset| r_log2 rSize|nr_log2 nrSize)> + * {<nrSize byte>}[<nr bytes>]{<rSize byte>} + * <rSize byte> and <nrSize byte> bytes are terminated by byte != 255 + * + */ + +static inline void update_utag( + uint_fast32_t r_bytes_max, + uint_fast32_t *utag, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2) +{ + const uint_fast32_t r_mask = mask(TAG_BITS_MAX - (off_log2 + nr_log2)); + *utag |= likely(r_bytes_max - REPEAT_MIN < r_mask) ? + ((r_bytes_max - REPEAT_MIN) << off_log2) : (r_mask << off_log2); +} + +static inline const uint8_t *hashed( + const uint8_t *const in0, + uint16_t *const ht, + uint_fast32_t h, + const uint8_t *r) +{ + const uint8_t *q = in0 + ht[h]; + ht[h] = (uint16_t)(r - in0); + return q; +} + +static inline const uint8_t *repeat_start( + const uint8_t *q, + const uint8_t *r, + const uint8_t *const nr0, + const uint8_t *const in0) +{ + for (; r > nr0 && likely(q > in0) && unlikely(q[-1] == r[-1]); --q, --r); + return r; +} + +int lz4k_out_tail( + uint8_t *out_at, + uint8_t *const out_end, + const uint8_t *const out, + const uint8_t *const nr0, + const uint8_t *const in_end, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + bool check_out); + +uint8_t *lz4k_out_non_repeat( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t utag, + const uint8_t *const nr0, + const uint8_t *const r, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + bool check_out); + +uint8_t *lz4k_out_r_bytes_left( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t r_bytes_max, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out); + +uint8_t *lz4k_out_repeat( + uint8_t *out_at, + uint8_t *const out_end, + uint_fast32_t utag, + uint_fast32_t r_bytes_max, + const uint_fast32_t nr_log2, + const uint_fast32_t off_log2, + const bool check_out); + +const uint8_t *lz4k_repeat_end( + const uint8_t *q, + const uint8_t *r, + const uint8_t *const in_end_safe, + const uint8_t *const in_end); + +#endif /* _LZ4K_ENCODE_PRIVATE_H */ + diff --git a/lib/lz4k/lz4k_private.h b/lib/lz4k/lz4k_private.h new file mode 100644 index 000000000000..2a8f4b37dc74 --- /dev/null +++ b/lib/lz4k/lz4k_private.h @@ -0,0 +1,269 @@ +/* + * Copyright (c) Huawei Technologies Co., Ltd. 2012-2020. All rights reserved. + * Description: LZ4K compression algorithm + * Author: Aleksei Romanovskii aleksei.romanovskii(a)huawei.com + * Created: 2020-03-25 + */ + +#ifndef _LZ4K_PRIVATE_H +#define _LZ4K_PRIVATE_H + +#if !defined(__KERNEL__) + +#include "lz4k.h" +#include <stdint.h> /* uint*_t */ +#define __STDC_WANT_LIB_EXT1__ 1 +#include <string.h> /* memcpy() */ + +#define likely(e) __builtin_expect(e, 1) +#define unlikely(e) __builtin_expect(e, 0) + +#else /* __KERNEL__ */ + +#include <linux/lz4k.h> +#define __STDC_WANT_LIB_EXT1__ 1 +#include <linux/string.h> /* memcpy() */ +#include <linux/types.h> /* uint8_t, int8_t, uint16_t, int16_t, +uint32_t, int32_t, uint64_t, int64_t */ +#include <stddef.h> + +typedef uint64_t uint_fast32_t; +typedef int64_t int_fast32_t; + +#endif /* __KERNEL__ */ + +#if defined(__GNUC__) && (__GNUC__>=4) +#define LZ4K_WITH_GCC_INTRINSICS +#endif + +#if !defined(__GNUC__) +#define __builtin_expect(e, v) (e) +#endif /* defined(__GNUC__) */ + +enum { + BYTE_BITS = 8, + BYTE_BITS_LOG2 = 3, + BYTE_MAX = 255U, + REPEAT_MIN = 4, + TAG_BYTES_MAX = 3, + TAG_BITS_MAX = TAG_BYTES_MAX * 8, + BLOCK_4KB_LOG2 = 12, + BLOCK_8KB_LOG2 = 13, + BLOCK_16KB_LOG2 = 14, + BLOCK_32KB_LOG2 = 15, + BLOCK_64KB_LOG2 = 16 +}; + +static inline uint32_t mask(uint_fast32_t log2) +{ + return (1U << log2) - 1U; +} + +static inline uint64_t mask64(uint_fast32_t log2) +{ + return (1ULL << log2) - 1ULL; +} + +#if defined LZ4K_WITH_GCC_INTRINSICS +static inline int most_significant_bit_of(uint64_t u) +{ + return (int)(__builtin_expect((u) == 0, false) ? + -1 : (int)(31 ^ (uint32_t)__builtin_clz((unsigned)(u)))); +} +#else /* #!defined LZ4K_WITH_GCC_INTRINSICS */ +#error undefined most_significant_bit_of(unsigned u) +#endif /* #if defined LZ4K_WITH_GCC_INTRINSICS */ + +static inline uint64_t round_up_to_log2(uint64_t u, uint8_t log2) +{ + return (uint64_t)((u + mask64(log2)) & ~mask64(log2)); +} + +static inline uint64_t round_up_to_power_of2(uint64_t u) +{ + const int_fast32_t msb = most_significant_bit_of(u); + return round_up_to_log2(u, (uint8_t)msb); +} + +static inline void m_copy(void *dst, const void *src, size_t total) +{ +#if defined(__STDC_LIB_EXT1__) + (void)memcpy_s(dst, total, src, (total * 2) >> 1); /* *2 >> 1 to avoid bot errors */ +#else + (void)__builtin_memcpy(dst, src, total); +#endif +} + +static inline void m_set(void *dst, uint8_t value, size_t total) +{ +#if defined(__STDC_LIB_EXT1__) + (void)memset_s(dst, total, value, (total * 2) >> 1); /* *2 >> 1 to avoid bot errors */ +#else + (void)__builtin_memset(dst, value, total); +#endif +} + +static inline uint32_t read4_at(const void *p) +{ + uint32_t result; + m_copy(&result, p, sizeof(result)); + return result; +} + +static inline uint64_t read8_at(const void *p) +{ + uint64_t result; + m_copy(&result, p, sizeof(result)); + return result; +} + +static inline bool equal4(const uint8_t *const q, const uint8_t *const r) +{ + return read4_at(q) == read4_at(r); +} + +static inline bool equal3(const uint8_t *const q, const uint8_t *const r) +{ + return (read4_at(q) << BYTE_BITS) == (read4_at(r) << BYTE_BITS); +} + +static inline uint_fast32_t hash24v(const uint64_t r, uint32_t shift) +{ + const uint32_t m = 3266489917U; + return (((uint32_t)r << BYTE_BITS) * m) >> (32 - shift); +} + +static inline uint_fast32_t hash24(const uint8_t *r, uint32_t shift) +{ + return hash24v(read4_at(r), shift); +} + +static inline uint_fast32_t hash32v_2(const uint64_t r, uint32_t shift) +{ + const uint32_t m = 3266489917U; + return ((uint32_t)r * m) >> (32 - shift); +} + +static inline uint_fast32_t hash32_2(const uint8_t *r, uint32_t shift) +{ + return hash32v_2(read4_at(r), shift); +} + +static inline uint_fast32_t hash32v(const uint64_t r, uint32_t shift) +{ + const uint32_t m = 2654435761U; + return ((uint32_t)r * m) >> (32 - shift); +} + +static inline uint_fast32_t hash32(const uint8_t *r, uint32_t shift) +{ + return hash32v(read4_at(r), shift); +} + +static inline uint_fast32_t hash64v_5b(const uint64_t r, uint32_t shift) +{ + const uint64_t m = 889523592379ULL; + return (uint32_t)(((r << 24) * m) >> (64 - shift)); +} + +static inline uint_fast32_t hash64_5b(const uint8_t *r, uint32_t shift) +{ + return hash64v_5b(read8_at(r), shift); +} + +static inline uint_fast32_t hash64v_6b(const uint64_t r, uint32_t shift) +{ + const uint64_t m = 227718039650203ULL; + return (uint32_t)(((r << 16) * m) >> (64 - shift)); +} + +static inline uint_fast32_t hash64_6b(const uint8_t *r, uint32_t shift) +{ + return hash64v_6b(read8_at(r), shift); +} + +static inline uint_fast32_t hash64v_7b(const uint64_t r, uint32_t shift) +{ + const uint64_t m = 58295818150454627ULL; + return (uint32_t)(((r << 8) * m) >> (64 - shift)); +} + +static inline uint_fast32_t hash64_7b(const uint8_t *r, uint32_t shift) +{ + return hash64v_7b(read8_at(r), shift); +} + +static inline uint_fast32_t hash64v_8b(const uint64_t r, uint32_t shift) +{ + const uint64_t m = 2870177450012600261ULL; + return (uint32_t)((r * m) >> (64 - shift)); +} + +static inline uint_fast32_t hash64_8b(const uint8_t *r, uint32_t shift) +{ + return hash64v_8b(read8_at(r), shift); +} + +static inline void while_lt_copy_x( + uint8_t *dst, + const uint8_t *src, + const uint8_t *dst_end, + const size_t copy_min) +{ + for (; dst < dst_end; dst += copy_min, src += copy_min) + m_copy(dst, src, copy_min); +} + +static inline void copy_x_while_lt( + uint8_t *dst, + const uint8_t *src, + const uint8_t *dst_end, + const size_t copy_min) +{ + m_copy(dst, src, copy_min); + while (dst + copy_min < dst_end) + m_copy(dst += copy_min, src += copy_min, copy_min); +} + +static inline void copy_x_while_total( + uint8_t *dst, + const uint8_t *src, + size_t total, + const size_t copy_min) +{ + m_copy(dst, src, copy_min); + for (; total > copy_min; total-= copy_min) + m_copy(dst += copy_min, src += copy_min, copy_min); +} + +static inline void copy_2x( + uint8_t *dst, + const uint8_t *src, + const size_t copy_min) +{ + m_copy(dst, src, copy_min); + m_copy(dst + copy_min, src + copy_min, copy_min); +} + +static inline void copy_2x_as_x2_while_lt( + uint8_t *dst, + const uint8_t *src, + const uint8_t *dst_end, + const size_t copy_min) +{ + copy_2x(dst, src, copy_min); + while (dst + (copy_min << 1) < dst_end) + copy_2x(dst += (copy_min << 1), src += (copy_min << 1), copy_min); +} + +static inline void while_lt_copy_2x_as_x2( + uint8_t *dst, + const uint8_t *src, + const uint8_t *dst_end, + const size_t copy_min) +{ + for (; dst < dst_end; dst += (copy_min << 1), src += (copy_min << 1)) + copy_2x(dst, src, copy_min); +} + +#endif /* _LZ4K_PRIVATE_H */ -- 2.25.1

2 1

[PATCH OLK-5.10 0/8] xfs: fix some problems recently
by Long Li 29 Jun '23

29 Jun '23

Patchs 1-6 fix some problems recently. Patchs 7-8 backport from mainline. Darrick J. Wong (1): xfs: fix uninitialized variable access Dave Chinner (1): xfs: set XFS_FEAT_NLINK correctly Long Li (4): xfs: factor out xfs_defer_pending_abort xfs: don't leak intent item when recovery intents fail xfs: factor out xfs_destroy_perag() xfs: don't leak perag when growfs fails Ye Bin (1): xfs: fix warning in xfs_vm_writepages() yangerkun (1): xfs: fix mounting failed caused by sequencing problem in the log records fs/xfs/libxfs/xfs_defer.c | 26 +++++++++++++++++--------- fs/xfs/libxfs/xfs_defer.h | 1 + fs/xfs/libxfs/xfs_log_recover.h | 1 + fs/xfs/libxfs/xfs_sb.c | 2 ++ fs/xfs/xfs_buf_item_recover.c | 2 ++ fs/xfs/xfs_fsmap.c | 1 + fs/xfs/xfs_fsops.c | 5 ++++- fs/xfs/xfs_icache.c | 6 ++++++ fs/xfs/xfs_log_recover.c | 27 ++++++++++++++++++++++++--- fs/xfs/xfs_mount.c | 30 ++++++++++++++++++++++-------- fs/xfs/xfs_mount.h | 2 ++ 11 files changed, 82 insertions(+), 21 deletions(-) -- 2.31.1

2 9

[PATCH OLK-5.10] media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
by Chen Jiahao 29 Jun '23

29 Jun '23

From: Takashi Iwai <tiwai(a)suse.de> mainline inclusion from mainline-v6.4-rc3 commit b8c75e4a1b325ea0a9433fa8834be97b5836b946 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6YKXB CVE: CVE-2023-31084 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Using a semaphore in the wait_event*() condition is no good idea. It hits a kernel WARN_ON() at prepare_to_wait_event() like: do not call blocking ops when !TASK_RUNNING; state=1 set at prepare_to_wait_event+0x6d/0x690 For avoiding the potential deadlock, rewrite to an open-coded loop instead. Unlike the loop in wait_event*(), this uses wait_woken() after the condition check, hence the task state stays consistent. CVE-2023-31084 was assigned to this bug. Link: https://lore.kernel.org/r/CA+UBctCu7fXn4q41O_3=id1+OdyQ85tZY1x+TkT-6OVBL6KA… Link: https://lore.kernel.org/linux-media/20230512151800.1874-1-tiwai@suse.de Reported-by: Yu Hao <yhao016(a)ucr.edu> Closes: https://nvd.nist.gov/vuln/detail/CVE-2023-31084 Signed-off-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org> Signed-off-by: Chen Jiahao <chenjiahao16(a)huawei.com> --- drivers/media/dvb-core/dvb_frontend.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c index 06ea30a689d7..579cddec55e5 100644 --- a/drivers/media/dvb-core/dvb_frontend.c +++ b/drivers/media/dvb-core/dvb_frontend.c @@ -292,14 +292,22 @@ static int dvb_frontend_get_event(struct dvb_frontend *fe, } if (events->eventw == events->eventr) { - int ret; + struct wait_queue_entry wait; + int ret = 0; if (flags & O_NONBLOCK) return -EWOULDBLOCK; - ret = wait_event_interruptible(events->wait_queue, - dvb_frontend_test_event(fepriv, events)); - + init_waitqueue_entry(&wait, current); + add_wait_queue(&events->wait_queue, &wait); + while (!dvb_frontend_test_event(fepriv, events)) { + wait_woken(&wait, TASK_INTERRUPTIBLE, 0); + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + } + remove_wait_queue(&events->wait_queue, &wait); if (ret < 0) return ret; } -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS 1/2] scsi: hisi_sas: Fix normally completed I/O analysed as failed
by Yongqiang Liu 29 Jun '23

29 Jun '23

From: Xingui Yang <yangxingui(a)huawei.com> driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7GS2V CVE: NA ------------------------------------------------------------------- pio read command has no response frame and the struct iu[1024] won't be filled, and it's found that I/Os that are normally completed will be analysed as failed in sas_ata_task_done() when iu contain abnormal dirty data. So ending_fis should not be filled by iu when the response frame hasn't been written to the memory. Fixes: d380f55503ed ("scsi: hisi_sas: Don't bother clearing status buffer IU in task prep") Signed-off-by: Xingui Yang <yangxingui(a)huawei.com> Reviewed-by: Xiang Chen <chenxiang66(a)hisilicon.com> Reviewed-by: kang fenglong <kangfenglong(a)huawei.com> Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com> --- drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 23 ++++++++++++++++------- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 23 +++++++++++++++-------- 2 files changed, 31 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 2df70c4873f5..f9867176fa14 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -2033,6 +2033,11 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba, u16 dma_tx_err_type = cpu_to_le16(err_record->dma_tx_err_type); u16 sipc_rx_err_type = cpu_to_le16(err_record->sipc_rx_err_type); u32 dma_rx_err_type = cpu_to_le32(err_record->dma_rx_err_type); + struct hisi_sas_complete_v2_hdr *complete_queue = + hisi_hba->complete_hdr[slot->cmplt_queue]; + struct hisi_sas_complete_v2_hdr *complete_hdr = + &complete_queue[slot->cmplt_queue_slot]; + u32 dw0 = le32_to_cpu(complete_hdr->dw0); int error = -1; if (err_phase == 1) { @@ -2318,7 +2323,8 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba, break; } } - hisi_sas_sata_done(task, slot); + if (dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) + hisi_sas_sata_done(task, slot); } break; default: @@ -2342,6 +2348,7 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) &complete_queue[slot->cmplt_queue_slot]; unsigned long flags; bool is_internal = slot->is_internal; + u32 dw0; if (unlikely(!task || !task->lldd_task || !task->dev)) return -EINVAL; @@ -2366,7 +2373,8 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) } /* Use SAS+TMF status codes */ - switch ((complete_hdr->dw0 & CMPLT_HDR_ABORT_STAT_MSK) + dw0 = le32_to_cpu(complete_hdr->dw0); + switch ((dw0 & CMPLT_HDR_ABORT_STAT_MSK) >> CMPLT_HDR_ABORT_STAT_OFF) { case STAT_IO_ABORTED: /* this io has been aborted by abort command */ @@ -2392,9 +2400,9 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) break; } - if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) && - (!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) { - u32 err_phase = (complete_hdr->dw0 & CMPLT_HDR_ERR_PHASE_MSK) + if ((dw0 & CMPLT_HDR_ERX_MSK) && + (!(dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) { + u32 err_phase = (dw0 & CMPLT_HDR_ERR_PHASE_MSK) >> CMPLT_HDR_ERR_PHASE_OFF; u32 *error_info = hisi_sas_status_buf_addr_mem(slot); @@ -2409,7 +2417,7 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) "CQ hdr: 0x%x 0x%x 0x%x 0x%x " "Error info: 0x%x 0x%x 0x%x 0x%x\n", slot->idx, task, sas_dev->device_id, - complete_hdr->dw0, complete_hdr->dw1, + dw0, complete_hdr->dw1, complete_hdr->act, complete_hdr->dw3, error_info[0], error_info[1], error_info[2], error_info[3]); @@ -2456,7 +2464,8 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) case SAS_PROTOCOL_SATA | SAS_PROTOCOL_STP: { ts->stat = SAM_STAT_GOOD; - hisi_sas_sata_done(task, slot); + if (dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) + hisi_sas_sata_done(task, slot); break; } default: diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 5e2488fd2466..f79060fca001 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2353,7 +2353,8 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task, ts->stat = SAS_OPEN_REJECT; ts->open_rej_reason = SAS_OREJ_RSVD_RETRY; } - hisi_sas_sata_done(task, slot); + if (dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) + hisi_sas_sata_done(task, slot); break; case SAS_PROTOCOL_SMP: ts->stat = SAM_STAT_CHECK_CONDITION; @@ -2402,6 +2403,7 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) &complete_queue[slot->cmplt_queue_slot]; unsigned long flags; bool is_internal = slot->is_internal; + u32 dw0, dw1, dw3; if (unlikely(!task || !task->lldd_task || !task->dev)) return -EINVAL; @@ -2425,10 +2427,14 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) goto out; } + dw0 = le32_to_cpu(complete_hdr->dw0); + dw1 = le32_to_cpu(complete_hdr->dw1); + dw3 = le32_to_cpu(complete_hdr->dw3); + /* * Use SAS+TMF status codes */ - switch ((complete_hdr->dw0 & CMPLT_HDR_ABORT_STAT_MSK) + switch ((dw0 & CMPLT_HDR_ABORT_STAT_MSK) >> CMPLT_HDR_ABORT_STAT_OFF) { case STAT_IO_ABORTED: /* this IO has been aborted by abort command */ @@ -2452,9 +2458,9 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) } /* check for erroneous completion, 0x3 means abnormal */ - if ((complete_hdr->dw0 & CMPLT_HDR_CMPLT_MSK) == 0x3) { + if ((dw0 & CMPLT_HDR_CMPLT_MSK) == 0x3) { u32 *error_info = hisi_sas_status_buf_addr_mem(slot); - u32 device_id = (complete_hdr->dw1 & 0xffff0000) >> 16; + u32 device_id = (dw1 & 0xffff0000) >> 16; struct hisi_sas_itct *itct = &hisi_hba->itct[device_id]; set_aborted_iptt(hisi_hba, slot); @@ -2464,12 +2470,12 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) "Error info: 0x%x 0x%x 0x%x 0x%x\n", slot->idx, task, sas_dev->device_id, itct->sas_addr, - complete_hdr->dw0, complete_hdr->dw1, - complete_hdr->act, complete_hdr->dw3, + dw0, dw1, + complete_hdr->act, dw3, error_info[0], error_info[1], error_info[2], error_info[3]); - if ((complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) && + if ((dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) && (task->task_proto & SAS_PROTOCOL_SATA || task->task_proto & SAS_PROTOCOL_STP)) { struct hisi_sas_status_buffer *status_buf = @@ -2559,7 +2565,8 @@ slot_complete_v3_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot) case SAS_PROTOCOL_STP: case SAS_PROTOCOL_SATA | SAS_PROTOCOL_STP: ts->stat = SAM_STAT_GOOD; - hisi_sas_sata_done(task, slot); + if (dw0 & CMPLT_HDR_RSPNS_XFRD_MSK) + hisi_sas_sata_done(task, slot); break; default: ts->stat = SAM_STAT_CHECK_CONDITION; -- 2.25.1

1 1

[PATCH OLK-5.10] arm64: Add AMPERE1 to the Spectre-BHB affected list
by Lin Yujun 29 Jun '23

29 Jun '23

From: D Scott Phillips <scott(a)os.amperecomputing.com> stable inclusion from stable-v5.10.153 commit 52a43b82006dc88f996bd06da5a3fcfef85220c8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64YCA CVE: CVE-2023-3006 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 0e5d5ae837c8ce04d2ddb874ec5f920118bd9d31 ] Per AmpereOne erratum AC03_CPU_12, "Branch history may allow control of speculative execution across software contexts," the AMPERE1 core needs the bhb clearing loop to mitigate Spectre-BHB, with a loop iteration count of 11. Signed-off-by: D Scott Phillips <scott(a)os.amperecomputing.com> Link: https://lore.kernel.org/r/20221011022140.432370-1-scott@os.amperecomputing.… Reviewed-by: James Morse <james.morse(a)arm.com> Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> conflicts: arch/arm64/include/asm/cputype.h Signed-off-by: Lin Yujun <linyujun809(a)huawei.com> --- arch/arm64/include/asm/cputype.h | 4 ++++ arch/arm64/kernel/proton-pack.c | 6 ++++++ 2 files changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 662708c56397..812781fba3f9 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -61,6 +61,7 @@ #define ARM_CPU_IMP_HISI 0x48 #define ARM_CPU_IMP_PHYTIUM 0x70 #define ARM_CPU_IMP_APPLE 0x61 +#define ARM_CPU_IMP_AMPERE 0xC0 #define ARM_CPU_PART_AEM_V8 0xD0F #define ARM_CPU_PART_FOUNDATION 0xD00 @@ -120,6 +121,8 @@ #define APPLE_CPU_PART_M1_ICESTORM 0x022 #define APPLE_CPU_PART_M1_FIRESTORM 0x023 +#define AMPERE_CPU_PART_AMPERE1 0xAC3 + #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) @@ -165,6 +168,7 @@ #define MIDR_FT_2500 MIDR_CPU_MODEL(ARM_CPU_IMP_PHYTIUM, PHYTIUM_CPU_PART_2500) #define MIDR_APPLE_M1_ICESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM) #define MIDR_APPLE_M1_FIRESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM) +#define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1) /* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */ #define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index e807f77737e0..9c95d4955b6e 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -873,6 +873,10 @@ u8 spectre_bhb_loop_affected(int scope) MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), {}, }; + static const struct midr_range spectre_bhb_k11_list[] = { + MIDR_ALL_VERSIONS(MIDR_AMPERE1), + {}, + }; static const struct midr_range spectre_bhb_k8_list[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), MIDR_ALL_VERSIONS(MIDR_CORTEX_A57), @@ -883,6 +887,8 @@ u8 spectre_bhb_loop_affected(int scope) k = 32; else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list)) k = 24; + else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k11_list)) + k = 11; else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k8_list)) k = 8; -- 2.34.1

1 0

[PATCH openEuler-22.03-LTS] arm64: Add AMPERE1 to the Spectre-BHB affected list
by Lin Yujun 29 Jun '23

29 Jun '23

From: D Scott Phillips <scott(a)os.amperecomputing.com> stable inclusion from stable-v5.10.153 commit 52a43b82006dc88f996bd06da5a3fcfef85220c8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64YCA CVE: CVE-2023-3006 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 0e5d5ae837c8ce04d2ddb874ec5f920118bd9d31 ] Per AmpereOne erratum AC03_CPU_12, "Branch history may allow control of speculative execution across software contexts," the AMPERE1 core needs the bhb clearing loop to mitigate Spectre-BHB, with a loop iteration count of 11. Signed-off-by: D Scott Phillips <scott(a)os.amperecomputing.com> Link: https://lore.kernel.org/r/20221011022140.432370-1-scott@os.amperecomputing.… Reviewed-by: James Morse <james.morse(a)arm.com> Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> conflicts: arch/arm64/include/asm/cputype.h Signed-off-by: Lin Yujun <linyujun809(a)huawei.com> --- arch/arm64/include/asm/cputype.h | 4 ++++ arch/arm64/kernel/proton-pack.c | 6 ++++++ 2 files changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 662708c56397..812781fba3f9 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -61,6 +61,7 @@ #define ARM_CPU_IMP_HISI 0x48 #define ARM_CPU_IMP_PHYTIUM 0x70 #define ARM_CPU_IMP_APPLE 0x61 +#define ARM_CPU_IMP_AMPERE 0xC0 #define ARM_CPU_PART_AEM_V8 0xD0F #define ARM_CPU_PART_FOUNDATION 0xD00 @@ -120,6 +121,8 @@ #define APPLE_CPU_PART_M1_ICESTORM 0x022 #define APPLE_CPU_PART_M1_FIRESTORM 0x023 +#define AMPERE_CPU_PART_AMPERE1 0xAC3 + #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) @@ -165,6 +168,7 @@ #define MIDR_FT_2500 MIDR_CPU_MODEL(ARM_CPU_IMP_PHYTIUM, PHYTIUM_CPU_PART_2500) #define MIDR_APPLE_M1_ICESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM) #define MIDR_APPLE_M1_FIRESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM) +#define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1) /* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */ #define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c index e807f77737e0..9c95d4955b6e 100644 --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -873,6 +873,10 @@ u8 spectre_bhb_loop_affected(int scope) MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), {}, }; + static const struct midr_range spectre_bhb_k11_list[] = { + MIDR_ALL_VERSIONS(MIDR_AMPERE1), + {}, + }; static const struct midr_range spectre_bhb_k8_list[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A72), MIDR_ALL_VERSIONS(MIDR_CORTEX_A57), @@ -883,6 +887,8 @@ u8 spectre_bhb_loop_affected(int scope) k = 32; else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list)) k = 24; + else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k11_list)) + k = 11; else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k8_list)) k = 8; -- 2.34.1

1 0

[PATCH openEuler-1.0-LTS] drm/msm/dpu: Add check for pstates
by Yongqiang Liu 29 Jun '23

29 Jun '23

From: Xia Fukun <xiafukun(a)huawei.com> stable inclusion from stable-v4.19.287 commit c746a0b9210cebb29511f01d2becf240408327bf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7F2UT CVE: CVE-2023-3220 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=… -------------------------------- [ Upstream commit 93340e10b9c5fc86730d149636e0aa8b47bb5a34 ] As kzalloc may fail and return NULL pointer, it should be better to check pstates in order to avoid the NULL pointer dereference. Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Jiasheng Jiang <jiasheng(a)iscas.ac.cn> Reviewed-by: Abhinav Kumar <quic_abhinavk(a)quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/514160/ Link: https://lore.kernel.org/r/20221206080236.43687-1-jiasheng@iscas.ac.cn Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Xia Fukun <xiafukun(a)huawei.com> Reviewed-by: Zhang Qiao <zhangqiao22(a)huawei.com> Reviewed-by: zheng zucheng <zhengzucheng(a)huawei.com> Reviewed-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> Signed-off-by: Yongqiang Liu <liuyongqiang13(a)huawei.com> --- drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c index 4752f08f0884..5852e1d356e1 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c @@ -1477,6 +1477,8 @@ static int dpu_crtc_atomic_check(struct drm_crtc *crtc, } pstates = kzalloc(sizeof(*pstates) * DPU_STAGE_MAX * 4, GFP_KERNEL); + if (!pstates) + return -ENOMEM; dpu_crtc = to_dpu_crtc(crtc); cstate = to_dpu_crtc_state(state); -- 2.25.1

1 0