From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit b5644a3a79bf3be5f1238db1b2f241374b27b0f0 category: bugfix bugzilla: 49890 CVE: NA ---------------------------
While handling a response message from server, nbd_read_stat() will try to get request by tag, and then complete the request. However, this is problematic if nbd haven't sent a corresponding request message:
t1 t2 submit_bio nbd_queue_rq blk_mq_start_request recv_work nbd_read_stat blk_mq_tag_to_rq blk_mq_complete_request nbd_send_cmd
Thus add a new cmd flag 'NBD_CMD_INFLIGHT', it will be set in nbd_send_cmd() and checked in nbd_read_stat().
Noted that this patch can't fix that blk_mq_tag_to_rq() might return a freed request, and this will be fixed in following patches.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210916093350.1410403-2-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 6a72c07ce3cba..05153b84d5400 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -120,6 +120,12 @@ struct nbd_device { };
#define NBD_CMD_REQUEUED 1 +/* + * This flag will be set if nbd_queue_rq() succeed, and will be checked and + * cleared in completion. Both setting and clearing of the flag are protected + * by cmd->lock. + */ +#define NBD_CMD_INFLIGHT 2
struct nbd_cmd { struct nbd_device *nbd; @@ -369,6 +375,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, if (!mutex_trylock(&cmd->lock)) return BLK_EH_RESET_TIMER;
+ __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags); if (!refcount_inc_not_zero(&nbd->config_refs)) { cmd->status = BLK_STS_TIMEOUT; mutex_unlock(&cmd->lock); @@ -674,6 +681,12 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) cmd = blk_mq_rq_to_pdu(req);
mutex_lock(&cmd->lock); + if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) { + dev_err(disk_to_dev(nbd->disk), "Suspicious reply %d (status %u flags %lu)", + tag, cmd->status, cmd->flags); + ret = -ENOENT; + goto out; + } if (cmd->cmd_cookie != nbd_handle_to_cookie(handle)) { dev_err(disk_to_dev(nbd->disk), "Double reply on req %p, cmd_cookie %u, handle cookie %u\n", req, cmd->cmd_cookie, nbd_handle_to_cookie(handle)); @@ -768,6 +781,7 @@ static void nbd_clear_req(struct request *req, void *data, bool reserved) struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
mutex_lock(&cmd->lock); + __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags); cmd->status = BLK_STS_IOERR; mutex_unlock(&cmd->lock);
@@ -903,7 +917,13 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) * returns EAGAIN can be retried on a different socket. */ ret = nbd_send_cmd(nbd, cmd, index); - if (ret == -EAGAIN) { + /* + * Access to this flag is protected by cmd->lock, thus it's safe to set + * the flag after nbd_send_cmd() succeed to send request to server. + */ + if (!ret) + __set_bit(NBD_CMD_INFLIGHT, &cmd->flags); + else if (ret == -EAGAIN) { dev_err_ratelimited(disk_to_dev(nbd->disk), "Request send failed, requeueing\n"); nbd_mark_nsock_dead(nbd, nsock, 1);
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit d14b304f558f8c8f53da3a8d0c0b671f14a9c2f4 category: bugfix bugzilla: 49890 CVE: NA ---------------------------
commit cddce0116058 ("nbd: Aovid double completion of a request") try to fix that nbd_clear_que() and recv_work() can complete a request concurrently. However, the problem still exists:
t1 t2 t3
nbd_disconnect_and_put flush_workqueue recv_work blk_mq_complete_request blk_mq_complete_request_remote -> this is true WRITE_ONCE(rq->state, MQ_RQ_COMPLETE) blk_mq_raise_softirq blk_done_softirq blk_complete_reqs nbd_complete_rq blk_mq_end_request blk_mq_free_request WRITE_ONCE(rq->state, MQ_RQ_IDLE) nbd_clear_que blk_mq_tagset_busy_iter nbd_clear_req __blk_mq_free_request blk_mq_put_tag blk_mq_complete_request -> complete again
There are three places where request can be completed in nbd: recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they all hold cmd->lock before completing the request, it's easy to avoid the problem by setting and checking a cmd flag.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210916093350.1410403-3-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict: drivers/block/nbd.c Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 05153b84d5400..b13939e832449 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -375,7 +375,11 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, if (!mutex_trylock(&cmd->lock)) return BLK_EH_RESET_TIMER;
- __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags); + if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) { + mutex_unlock(&cmd->lock); + return BLK_EH_DONE; + } + if (!refcount_inc_not_zero(&nbd->config_refs)) { cmd->status = BLK_STS_TIMEOUT; mutex_unlock(&cmd->lock); @@ -781,7 +785,10 @@ static void nbd_clear_req(struct request *req, void *data, bool reserved) struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
mutex_lock(&cmd->lock); - __clear_bit(NBD_CMD_INFLIGHT, &cmd->flags); + if (!__test_and_clear_bit(NBD_CMD_INFLIGHT, &cmd->flags)) { + mutex_unlock(&cmd->lock); + return; + } cmd->status = BLK_STS_IOERR; mutex_unlock(&cmd->lock);
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit dbd73178da676945d8bbcf6afe731623f683ce0a category: bugfix bugzilla: 49890 CVE: NA ---------------------------
The sock that clent send request in nbd_send_cmd() and receive reply in nbd_read_stat() should be the same.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210916093350.1410403-4-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index b13939e832449..89f2d91923d43 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -691,6 +691,10 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) ret = -ENOENT; goto out; } + if (cmd->index != index) { + dev_err(disk_to_dev(nbd->disk), "Unexpected reply %d from different sock %d (expected %d)", + tag, index, cmd->index); + } if (cmd->cmd_cookie != nbd_handle_to_cookie(handle)) { dev_err(disk_to_dev(nbd->disk), "Double reply on req %p, cmd_cookie %u, handle cookie %u\n", req, cmd->cmd_cookie, nbd_handle_to_cookie(handle));
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit a83fdc85365586dc5c0f3ff91680e18e37a66f19 category: bugfix bugzilla: 49890 CVE: NA ---------------------------
commit 6a468d5990ec ("nbd: don't start req until after the dead connection logic") move blk_mq_start_request() from nbd_queue_rq() to nbd_handle_cmd() to skip starting request if the connection is dead. However, request is still started in other error paths.
Currently, blk_mq_end_request() will be called immediately if nbd_queue_rq() failed, thus start request in such situation is useless. So remove blk_mq_start_request() from error paths in nbd_handle_cmd().
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210916093350.1410403-5-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 89f2d91923d43..afa1633cec9ca 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -873,7 +873,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) if (!refcount_inc_not_zero(&nbd->config_refs)) { dev_err_ratelimited(disk_to_dev(nbd->disk), "Socks array is empty\n"); - blk_mq_start_request(req); return -EINVAL; } config = nbd->config; @@ -882,7 +881,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) dev_err_ratelimited(disk_to_dev(nbd->disk), "Attempted send on invalid socket\n"); nbd_config_put(nbd); - blk_mq_start_request(req); return -EINVAL; } cmd->status = BLK_STS_OK; @@ -906,7 +904,6 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) */ sock_shutdown(nbd); nbd_config_put(nbd); - blk_mq_start_request(req); return -EIO; } goto again;
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit 6157a8f489909db00151a4e361903b9099b03b75 category: bugfix bugzilla: 49890 CVE: NA ---------------------------
Check if sock_xmit() return 0 is useless because it'll never return 0, comment it and remove such checkings.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210916093350.1410403-6-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict:drivers/block/nbd.c Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index afa1633cec9ca..71c1fbaff10bc 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -431,7 +431,8 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, }
/* - * Send or receive packet. + * Send or receive packet. Return a positive value on success and + * negtive value on failue, and never return 0. */ static int sock_xmit(struct nbd_device *nbd, int index, int send, struct iov_iter *iter, int msg_flags, int *sent) @@ -562,7 +563,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index) (unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req)); result = sock_xmit(nbd, index, 1, &from, (type == NBD_CMD_WRITE) ? MSG_MORE : 0, &sent); - if (result <= 0) { + if (result < 0) { if (was_interrupted(result)) { /* If we havne't sent anything we can just return BUSY, * however if we have sent something we need to make @@ -607,7 +608,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index) skip = 0; } result = sock_xmit(nbd, index, 1, &from, flags, &sent); - if (result <= 0) { + if (result < 0) { if (was_interrupted(result)) { /* We've already sent the header, we * have no choice but to set pending and @@ -658,7 +659,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) reply.magic = 0; iov_iter_kvec(&to, READ | ITER_KVEC, &iov, 1, sizeof(reply)); result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL); - if (result <= 0) { + if (result < 0) { if (!nbd_disconnected(config)) dev_err(disk_to_dev(nbd->disk), "Receive control failed (result %d)\n", result); @@ -729,7 +730,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) iov_iter_bvec(&to, ITER_BVEC | READ, &bvec, 1, bvec.bv_len); result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL); - if (result <= 0) { + if (result < 0) { dev_err(disk_to_dev(nbd->disk), "Receive data failed (result %d)\n", result); /* @@ -1168,7 +1169,7 @@ static void send_disconnects(struct nbd_device *nbd) iov_iter_kvec(&from, WRITE | ITER_KVEC, &iov, 1, sizeof(request)); mutex_lock(&nsock->tx_lock); ret = sock_xmit(nbd, i, 1, &from, 0, NULL); - if (ret <= 0) + if (ret < 0) dev_err(disk_to_dev(nbd->disk), "Send disconnect failed %d\n", ret); mutex_unlock(&nsock->tx_lock);
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit 961e9f50be9bb47835b0ac7e08d55d2d0a45e493 category: bugfix bugzilla: 49890 CVE: NA ---------------------------
Prepare to fix uaf in nbd_read_stat(), no functional changes.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20210916093350.1410403-7-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk
Conflict: drivers/block/nbd.c Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 76 +++++++++++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 31 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 71c1fbaff10bc..8c2e3224cdd65 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -641,38 +641,45 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index) return 0; }
-/* NULL returned = something went wrong, inform userspace */ -static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) +static int nbd_read_reply(struct nbd_device *nbd, int index, + struct nbd_reply *reply) { - struct nbd_config *config = nbd->config; - int result; - struct nbd_reply reply; - struct nbd_cmd *cmd; - struct request *req = NULL; - u64 handle; - u16 hwq; - u32 tag; - struct kvec iov = {.iov_base = &reply, .iov_len = sizeof(reply)}; + struct kvec iov = {.iov_base = reply, .iov_len = sizeof(*reply)}; struct iov_iter to; - int ret = 0; + int result;
- reply.magic = 0; - iov_iter_kvec(&to, READ | ITER_KVEC, &iov, 1, sizeof(reply)); + reply->magic = 0; + iov_iter_kvec(&to, READ | ITER_KVEC, &iov, 1, sizeof(*reply)); result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL); if (result < 0) { - if (!nbd_disconnected(config)) + if (!nbd_disconnected(nbd->config)) dev_err(disk_to_dev(nbd->disk), "Receive control failed (result %d)\n", result); - return ERR_PTR(result); + return result; }
- if (ntohl(reply.magic) != NBD_REPLY_MAGIC) { + if (ntohl(reply->magic) != NBD_REPLY_MAGIC) { dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n", - (unsigned long)ntohl(reply.magic)); - return ERR_PTR(-EPROTO); + (unsigned long)ntohl(reply->magic)); + return -EPROTO; }
- memcpy(&handle, reply.handle, sizeof(handle)); + return 0; +} + +/* NULL returned = something went wrong, inform userspace */ +static struct nbd_cmd *nbd_handle_reply(struct nbd_device *nbd, int index, + struct nbd_reply *reply) +{ + int result; + struct nbd_cmd *cmd; + struct request *req = NULL; + u64 handle; + u16 hwq; + u32 tag; + int ret = 0; + + memcpy(&handle, reply->handle, sizeof(handle)); tag = nbd_handle_to_tag(handle); hwq = blk_mq_unique_tag_to_hwq(tag); if (hwq < nbd->tag_set.nr_hw_queues) @@ -714,9 +721,9 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) ret = -ENOENT; goto out; } - if (ntohl(reply.error)) { + if (ntohl(reply->error)) { dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n", - ntohl(reply.error)); + ntohl(reply->error)); cmd->status = BLK_STS_IOERR; goto out; } @@ -725,6 +732,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) if (rq_data_dir(req) != WRITE) { struct req_iterator iter; struct bio_vec bvec; + struct iov_iter to;
rq_for_each_segment(bvec, req, iter) { iov_iter_bvec(&to, ITER_BVEC | READ, @@ -740,8 +748,8 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) * and let the timeout stuff handle resubmitting * this request onto another connection. */ - if (nbd_disconnected(config) || - config->num_connections <= 1) { + if (nbd_disconnected(nbd->config) || + nbd->config->num_connections <= 1) { cmd->status = BLK_STS_IOERR; goto out; } @@ -764,21 +772,27 @@ static void recv_work(struct work_struct *work) work); struct nbd_device *nbd = args->nbd; struct nbd_config *config = nbd->config; + struct nbd_sock *nsock; struct nbd_cmd *cmd;
while (1) { - cmd = nbd_read_stat(nbd, args->index); - if (IS_ERR(cmd)) { - struct nbd_sock *nsock = config->socks[args->index]; + struct nbd_reply reply;
- mutex_lock(&nsock->tx_lock); - nbd_mark_nsock_dead(nbd, nsock, 1); - mutex_unlock(&nsock->tx_lock); + if (nbd_read_reply(nbd, args->index, &reply)) + break; + + cmd = nbd_handle_reply(nbd, args->index, &reply); + if (IS_ERR(cmd)) break; - }
blk_mq_complete_request(blk_mq_rq_from_pdu(cmd)); } + + nsock = config->socks[args->index]; + mutex_lock(&nsock->tx_lock); + nbd_mark_nsock_dead(nbd, nsock, 1); + mutex_unlock(&nsock->tx_lock); + nbd_config_put(nbd); atomic_dec(&config->recv_threads); wake_up(&config->recv_wq);
From: Yu Kuai yukuai3@huawei.com
mainline inclusion from mainline-next-20211018 commit 52c90e0184f67eecb00b53b79bfdf75e0274f8fd category: bugfix bugzilla: 49890 CVE: NA ---------------------------
There is a problem that nbd_handle_reply() might access freed request:
1) At first, a normal io is submitted and completed with scheduler:
internel_tag = blk_mq_get_tag -> get tag from sched_tags blk_mq_rq_ctx_init sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag] ... blk_mq_get_driver_tag __blk_mq_get_driver_tag -> get tag from tags tags->rq[tag] = sched_tag->static_rq[internel_tag]
So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing to the request: sched_tags->static_rq[internal_tag]. Even if the io is finished.
2) nbd server send a reply with random tag directly:
recv_work nbd_handle_reply blk_mq_tag_to_rq(tags, tag) rq = tags->rq[tag]
3) if the sched_tags->static_rq is freed:
blk_mq_sched_free_requests blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i) -> step 2) access rq before clearing rq mapping blk_mq_clear_rq_mapping(set, tags, hctx_idx); __free_pages() -> rq is freed here
4) Then, nbd continue to use the freed request in nbd_handle_reply
Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(), thus request is ensured not to be freed because 'q_usage_counter' is not zero.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20210916141810.2325276-1-yukuai3@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Reviewed-by: Jason Yan yanaijie@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/block/nbd.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 8c2e3224cdd65..45e6ae6add382 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -772,6 +772,7 @@ static void recv_work(struct work_struct *work) work); struct nbd_device *nbd = args->nbd; struct nbd_config *config = nbd->config; + struct request_queue *q = nbd->disk->queue; struct nbd_sock *nsock; struct nbd_cmd *cmd;
@@ -781,11 +782,26 @@ static void recv_work(struct work_struct *work) if (nbd_read_reply(nbd, args->index, &reply)) break;
+ /* + * Grab .q_usage_counter so request pool won't go away, then no + * request use-after-free is possible during nbd_handle_reply(). + * If queue is frozen, there won't be any inflight requests, we + * needn't to handle the incoming garbage message. + */ + if (!percpu_ref_tryget(&q->q_usage_counter)) { + dev_err(disk_to_dev(nbd->disk), "%s: no io inflight\n", + __func__); + break; + } + cmd = nbd_handle_reply(nbd, args->index, &reply); - if (IS_ERR(cmd)) + if (IS_ERR(cmd)) { + percpu_ref_put(&q->q_usage_counter); break; + }
blk_mq_complete_request(blk_mq_rq_from_pdu(cmd)); + percpu_ref_put(&q->q_usage_counter); }
nsock = config->socks[args->index];
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: 182754 CVE: NA ---------------------------
The block number in the quota tree on disk should be smaller than the v2_disk_dqinfo.dqi_blocks. If the quota file was corrupted, we may be allocating an 'allocated' block and that would lead to a loop in a tree, which will probably trigger oops later. This patch adds a check for the block number in the quota tree to prevent such potential issue.
Link: https://lore.kernel.org/r/20211008093821.1001186-2-yi.zhang@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Cc: stable@kernel.org Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/quota/quota_tree.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/fs/quota/quota_tree.c b/fs/quota/quota_tree.c index 656f9ff63edda..fe5fe9551910f 100644 --- a/fs/quota/quota_tree.c +++ b/fs/quota/quota_tree.c @@ -487,6 +487,13 @@ static int remove_tree(struct qtree_mem_dqinfo *info, struct dquot *dquot, goto out_buf; } newblk = le32_to_cpu(ref[get_index(info, dquot->dq_id, depth)]); + if (newblk < QT_TREEOFF || newblk >= info->dqi_blocks) { + quota_error(dquot->dq_sb, "Getting block too big (%u >= %u)", + newblk, info->dqi_blocks); + ret = -EUCLEAN; + goto out_buf; + } + if (depth == info->dqi_qtree_depth - 1) { ret = free_dqentry(info, dquot, newblk); newblk = 0; @@ -586,6 +593,13 @@ static loff_t find_tree_dqentry(struct qtree_mem_dqinfo *info, blk = le32_to_cpu(ref[get_index(info, dquot->dq_id, depth)]); if (!blk) /* No reference? */ goto out_buf; + if (blk < QT_TREEOFF || blk >= info->dqi_blocks) { + quota_error(dquot->dq_sb, "Getting block too big (%u >= %u)", + blk, info->dqi_blocks); + ret = -EUCLEAN; + goto out_buf; + } + if (depth < info->dqi_qtree_depth - 1) ret = find_tree_dqentry(info, dquot, blk, depth+1); else
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: 182754 CVE: NA ---------------------------
Fix the error path in free_dqentry(), pass out the error number if the block to free is not correct.
Fixes: 1ccd14b9c271 ("quota: Split off quota tree handling into a separate file") Link: https://lore.kernel.org/r/20211008093821.1001186-3-yi.zhang@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Cc: stable@kernel.org Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/quota/quota_tree.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/quota/quota_tree.c b/fs/quota/quota_tree.c index fe5fe9551910f..833cd3e3758bf 100644 --- a/fs/quota/quota_tree.c +++ b/fs/quota/quota_tree.c @@ -422,6 +422,7 @@ static int free_dqentry(struct qtree_mem_dqinfo *info, struct dquot *dquot, quota_error(dquot->dq_sb, "Quota structure has offset to " "other block (%u) than it should (%u)", blk, (uint)(dquot->dq_off >> info->dqi_blocksize_bits)); + ret = -EIO; goto out_buf; } ret = read_blk(info, blk, buf);
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: 109205 CVE: NA ---------------------------
After commit 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()"), we can check out the overlapping extent entry in leaf extent blocks. But the out-of-order extent entry in index extent blocks could also trigger bad things if the filesystem is inconsistent. So this patch add a check to figure out the out-of-order index extents and return error.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Theodore Ts'o tytso@mit.edu Link: https://lore.kernel.org/r/20210908120850.4012324-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/extents.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index f4a3b814fc66e..bb7024e0d5509 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -405,6 +405,9 @@ static int ext4_valid_extent_entries(struct inode *inode, ext4_fsblk_t *pblk, int depth) { unsigned short entries; + ext4_lblk_t lblock = 0; + ext4_lblk_t prev = 0; + if (eh->eh_entries == 0) return 1;
@@ -413,31 +416,35 @@ static int ext4_valid_extent_entries(struct inode *inode, if (depth == 0) { /* leaf entries */ struct ext4_extent *ext = EXT_FIRST_EXTENT(eh); - ext4_lblk_t lblock = 0; - ext4_lblk_t prev = 0; - int len = 0; while (entries) { if (!ext4_valid_extent(inode, ext)) return 0;
/* Check for overlapping extents */ lblock = le32_to_cpu(ext->ee_block); - len = ext4_ext_get_actual_len(ext); if ((lblock <= prev) && prev) { *pblk = ext4_ext_pblock(ext); return 0; } + prev = lblock + ext4_ext_get_actual_len(ext) - 1; ext++; entries--; - prev = lblock + len - 1; } } else { struct ext4_extent_idx *ext_idx = EXT_FIRST_INDEX(eh); while (entries) { if (!ext4_valid_extent_idx(inode, ext_idx)) return 0; + + /* Check for overlapping index extents */ + lblock = le32_to_cpu(ext_idx->ei_block); + if ((lblock <= prev) && prev) { + *pblk = ext4_idx_pblock(ext_idx); + return 0; + } ext_idx++; entries--; + prev = lblock; } } return 1;
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: 109205 CVE: NA ---------------------------
Now that we can check out overlapping extents in leaf block and out-of-order index extents in index block. But the .ee_block in the first extent of one leaf block should equal to the .ei_block in it's parent index extent entry. This patch add a check to verify such inconsistent between the index and leaf block.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Link: https://lore.kernel.org/r/20210908120850.4012324-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu
conflict: fs/ext4/extents.c Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/extents.c | 59 +++++++++++++++++++++++++++++------------------ 1 file changed, 36 insertions(+), 23 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index bb7024e0d5509..a828e7ca4ec7c 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -402,7 +402,8 @@ static int ext4_valid_extent_idx(struct inode *inode,
static int ext4_valid_extent_entries(struct inode *inode, struct ext4_extent_header *eh, - ext4_fsblk_t *pblk, int depth) + ext4_lblk_t lblk, ext4_fsblk_t *pblk, + int depth) { unsigned short entries; ext4_lblk_t lblock = 0; @@ -416,6 +417,14 @@ static int ext4_valid_extent_entries(struct inode *inode, if (depth == 0) { /* leaf entries */ struct ext4_extent *ext = EXT_FIRST_EXTENT(eh); + + /* + * The logical block in the first entry should equal to + * the number in the index block. + */ + if (depth != ext_depth(inode) && + lblk != le32_to_cpu(ext->ee_block)) + return 0; while (entries) { if (!ext4_valid_extent(inode, ext)) return 0; @@ -432,6 +441,14 @@ static int ext4_valid_extent_entries(struct inode *inode, } } else { struct ext4_extent_idx *ext_idx = EXT_FIRST_INDEX(eh); + + /* + * The logical block in the first entry should equal to + * the number in the parent index block. + */ + if (depth != ext_depth(inode) && + lblk != le32_to_cpu(ext_idx->ei_block)) + return 0; while (entries) { if (!ext4_valid_extent_idx(inode, ext_idx)) return 0; @@ -452,7 +469,7 @@ static int ext4_valid_extent_entries(struct inode *inode,
static int __ext4_ext_check(const char *function, unsigned int line, struct inode *inode, struct ext4_extent_header *eh, - int depth, ext4_fsblk_t pblk) + int depth, ext4_fsblk_t pblk, ext4_lblk_t lblk) { const char *error_msg; int max = 0, err = -EFSCORRUPTED; @@ -478,7 +495,7 @@ static int __ext4_ext_check(const char *function, unsigned int line, error_msg = "invalid eh_entries"; goto corrupted; } - if (!ext4_valid_extent_entries(inode, eh, &pblk, depth)) { + if (!ext4_valid_extent_entries(inode, eh, lblk, &pblk, depth)) { error_msg = "invalid extent entries"; goto corrupted; } @@ -508,7 +525,7 @@ static int __ext4_ext_check(const char *function, unsigned int line, }
#define ext4_ext_check(inode, eh, depth, pblk) \ - __ext4_ext_check(__func__, __LINE__, (inode), (eh), (depth), (pblk)) + __ext4_ext_check(__func__, __LINE__, (inode), (eh), (depth), (pblk), 0)
int ext4_ext_check_inode(struct inode *inode) { @@ -541,12 +558,14 @@ static void ext4_cache_extents(struct inode *inode,
static struct buffer_head * __read_extent_tree_block(const char *function, unsigned int line, - struct inode *inode, ext4_fsblk_t pblk, int depth, - int flags) + struct inode *inode, struct ext4_extent_idx *idx, + int depth, int flags) { struct buffer_head *bh; int err; + ext4_fsblk_t pblk;
+ pblk = ext4_idx_pblock(idx); bh = sb_getblk_gfp(inode->i_sb, pblk, __GFP_MOVABLE | GFP_NOFS); if (unlikely(!bh)) return ERR_PTR(-ENOMEM); @@ -559,8 +578,8 @@ __read_extent_tree_block(const char *function, unsigned int line, } if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE)) return bh; - err = __ext4_ext_check(function, line, inode, - ext_block_hdr(bh), depth, pblk); + err = __ext4_ext_check(function, line, inode, ext_block_hdr(bh), + depth, pblk, le32_to_cpu(idx->ei_block)); if (err) goto errout; set_buffer_verified(bh); @@ -578,8 +597,8 @@ __read_extent_tree_block(const char *function, unsigned int line,
}
-#define read_extent_tree_block(inode, pblk, depth, flags) \ - __read_extent_tree_block(__func__, __LINE__, (inode), (pblk), \ +#define read_extent_tree_block(inode, idx, depth, flags) \ + __read_extent_tree_block(__func__, __LINE__, (inode), (idx), \ (depth), (flags))
/* @@ -626,8 +645,7 @@ int ext4_ext_precache(struct inode *inode) i--; continue; } - bh = read_extent_tree_block(inode, - ext4_idx_pblock(path[i].p_idx++), + bh = read_extent_tree_block(inode, path[i].p_idx++, depth - i - 1, EXT4_EX_FORCE_CACHE); if (IS_ERR(bh)) { @@ -930,8 +948,7 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block, path[ppos].p_depth = i; path[ppos].p_ext = NULL;
- bh = read_extent_tree_block(inode, path[ppos].p_block, --i, - flags); + bh = read_extent_tree_block(inode, path[ppos].p_idx, --i, flags); if (IS_ERR(bh)) { ret = PTR_ERR(bh); goto err; @@ -1530,7 +1547,6 @@ static int ext4_ext_search_right(struct inode *inode, struct ext4_extent_header *eh; struct ext4_extent_idx *ix; struct ext4_extent *ex; - ext4_fsblk_t block; int depth; /* Note, NOT eh_depth; depth from top of tree */ int ee_len;
@@ -1597,20 +1613,17 @@ static int ext4_ext_search_right(struct inode *inode, * follow it and find the closest allocated * block to the right */ ix++; - block = ext4_idx_pblock(ix); while (++depth < path->p_depth) { /* subtract from p_depth to get proper eh_depth */ - bh = read_extent_tree_block(inode, block, - path->p_depth - depth, 0); + bh = read_extent_tree_block(inode, ix, path->p_depth - depth, 0); if (IS_ERR(bh)) return PTR_ERR(bh); eh = ext_block_hdr(bh); ix = EXT_FIRST_INDEX(eh); - block = ext4_idx_pblock(ix); put_bh(bh); }
- bh = read_extent_tree_block(inode, block, path->p_depth - depth, 0); + bh = read_extent_tree_block(inode, ix, path->p_depth - depth, 0); if (IS_ERR(bh)) return PTR_ERR(bh); eh = ext_block_hdr(bh); @@ -3041,9 +3054,9 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, ext_debug("move to level %d (block %llu)\n", i + 1, ext4_idx_pblock(path[i].p_idx)); memset(path + i + 1, 0, sizeof(*path)); - bh = read_extent_tree_block(inode, - ext4_idx_pblock(path[i].p_idx), depth - i - 1, - EXT4_EX_NOCACHE); + bh = read_extent_tree_block(inode, path[i].p_idx, + depth - i - 1, + EXT4_EX_NOCACHE); if (IS_ERR(bh)) { /* should we reset i_size? */ err = PTR_ERR(bh);
From: Zhang Yi yi.zhang@huawei.com
hulk inclusion category: bugfix bugzilla: 109205 CVE: NA ---------------------------
In the most error path of current extents updating operations are not roll back partial updates properly when some bad things happens(.e.g in ext4_ext_insert_extent()). So we may get an inconsistent extents tree if journal has been aborted due to IO error, which may probability lead to BUGON later when we accessing these extent entries in errors=continue mode. This patch drop extent buffer's verify flag before updatng the contents in ext4_ext_get_access(), and reset it after updating in __ext4_ext_dirty(). After this patch we could force to check the extent buffer if extents tree updating was break off, make sure the extents are consistent.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Theodore Ts'o tytso@mit.edu Link: https://lore.kernel.org/r/20210908120850.4012324-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu
conflict: fs/ext4/extents.c Reviewed-by: Yang Erkun yangerkun@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/extents.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index a828e7ca4ec7c..e2803e6c2b17d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -145,14 +145,24 @@ int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode, static int ext4_ext_get_access(handle_t *handle, struct inode *inode, struct ext4_ext_path *path) { + int err = 0; + if (path->p_bh) { /* path points to block */ BUFFER_TRACE(path->p_bh, "get_write_access"); - return ext4_journal_get_write_access(handle, path->p_bh); + err = ext4_journal_get_write_access(handle, path->p_bh); + /* + * The extent buffer's verified bit will be set again in + * __ext4_ext_dirty(). We could leave an inconsistent + * buffer if the extents updating procudure break off du + * to some error happens, force to check it again. + */ + if (!err) + clear_buffer_verified(path->p_bh); } /* path points to leaf/index in inode body */ /* we use in-core data, no need to protect them */ - return 0; + return err; }
/* @@ -172,6 +182,9 @@ int __ext4_ext_dirty(const char *where, unsigned int line, handle_t *handle, /* path points to block */ err = __ext4_handle_dirty_metadata(where, line, handle, inode, path->p_bh); + /* Extents updating done, re-set verified flag */ + if (!err) + set_buffer_verified(path->p_bh); } else { /* path points to leaf/index in inode body */ err = ext4_mark_inode_dirty(handle, inode);
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 109246 CVE: NA ---------------------------
Buffer with verified means that it has been checked before. No need verify and call set_buffer_verified again.
Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/extents.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index e2803e6c2b17d..d3936a78dc39c 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -589,13 +589,16 @@ __read_extent_tree_block(const char *function, unsigned int line, if (err < 0) goto errout; } - if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE)) - return bh; - err = __ext4_ext_check(function, line, inode, ext_block_hdr(bh), - depth, pblk, le32_to_cpu(idx->ei_block)); - if (err) - goto errout; - set_buffer_verified(bh); + if (buffer_verified(bh)) { + if (!(flags & EXT4_EX_FORCE_CACHE)) + return bh; + } else { + err = __ext4_ext_check(function, line, inode, ext_block_hdr(bh), + depth, pblk, le32_to_cpu(idx->ei_block)); + if (err) + goto errout; + set_buffer_verified(bh); + } /* * If this is a leaf block, cache all of its entries */
From: yangerkun yangerkun@huawei.com
hulk inclusion category: bugfix bugzilla: 109246 CVE: NA ---------------------------
Our stress testing with IO error can trigger follow OOB with a very low probability.
[59898.282466] BUG: KASAN: slab-out-of-bounds in ext4_find_extent+0x2e4/0x480 ... [59898.287162] Call Trace: [59898.287575] dump_stack+0x8b/0xb9 [59898.288070] print_address_description+0x73/0x280 [59898.289903] ext4_find_extent+0x2e4/0x480 [59898.290553] ext4_ext_map_blocks+0x125/0x1470 [59898.295481] ext4_map_blocks+0x5ee/0x940 [59898.315984] ext4_mpage_readpages+0x63c/0xdb0 [59898.320231] read_pages+0xe6/0x370 [59898.321589] __do_page_cache_readahead+0x233/0x2a0 [59898.321594] ondemand_readahead+0x157/0x450 [59898.321598] generic_file_read_iter+0xcb2/0x1550 [59898.328828] __vfs_read+0x233/0x360 [59898.328840] vfs_read+0xa5/0x190 [59898.330126] ksys_read+0xa5/0x150 [59898.331405] do_syscall_64+0x6d/0x1f0 [59898.331418] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Digging deep and we found it's actually a xattr block which can happened with follow steps:
1. extent update for file1 and will remove a leaf extent block(block A) 2. we need update the idx extent block too 3. block A has been allocated as a xattr block and will set verified 3. io error happened for this idx block and will the buffer has been released late 4. extent find for file1 will read the idx block and see block A again 5. since the buffer of block A is already verified, we will use it directly, which can lead the upper OOB
Same as __ext4_xattr_check_block, we can check magic even the buffer is verified to fix the problem.
Signed-off-by: yangerkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/ext4/extents.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index d3936a78dc39c..fc00a78163117 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -590,6 +590,14 @@ __read_extent_tree_block(const char *function, unsigned int line, goto errout; } if (buffer_verified(bh)) { + if (unlikely(ext_block_hdr(bh)->eh_magic != EXT4_EXT_MAGIC)) { + err = -EFSCORRUPTED; + ext4_error_inode(inode, function, line, 0, + "invalid magic for verified extent block %llu", + (unsigned long long)bh->b_blocknr); + goto errout; + } + if (!(flags & EXT4_EX_FORCE_CACHE)) return bh; } else {
From: Shijie Luo luoshijie1@huawei.com
mainline inclusion from mainline-v5.9-rc2 commit 00a3fff0712cd9cc4112ecf6da0916f8503e2a86 category: bugfix bugzilla: 45093 CVE: NA
-----------------------------------------------
Remove the unnecessary chksum_err and checksum_seen variables as well as some redundant code to make the function easier to understand.
[ With changes suggested by jack@ and tytso@ ]
Signed-off-by: Shijie Luo luoshijie1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/recovery.c | 46 ++++++++++++---------------------------------- 1 file changed, 12 insertions(+), 34 deletions(-)
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index a4967b27ffb63..26e640adc66f5 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -690,14 +690,11 @@ static int do_one_pass(journal_t *journal, * number. */ if (pass == PASS_SCAN && jbd2_has_feature_checksum(journal)) { - int chksum_err, chksum_seen; struct commit_header *cbh = (struct commit_header *)bh->b_data; unsigned found_chksum = be32_to_cpu(cbh->h_chksum[0]);
- chksum_err = chksum_seen = 0; - if (info->end_transaction) { journal->j_failed_commit = info->end_transaction; @@ -705,42 +702,23 @@ static int do_one_pass(journal_t *journal, break; }
- if (crc32_sum == found_chksum && - cbh->h_chksum_type == JBD2_CRC32_CHKSUM && - cbh->h_chksum_size == - JBD2_CRC32_CHKSUM_SIZE) - chksum_seen = 1; - else if (!(cbh->h_chksum_type == 0 && - cbh->h_chksum_size == 0 && - found_chksum == 0 && - !chksum_seen)) - /* - * If fs is mounted using an old kernel and then - * kernel with journal_chksum is used then we - * get a situation where the journal flag has - * checksum flag set but checksums are not - * present i.e chksum = 0, in the individual - * commit blocks. - * Hence to avoid checksum failures, in this - * situation, this extra check is added. - */ - chksum_err = 1; - - if (chksum_err) { - info->end_transaction = next_commit_ID; - - if (!jbd2_has_feature_async_commit(journal)) { - journal->j_failed_commit = - next_commit_ID; - brelse(bh); - break; - } - } + /* Neither checksum match nor unused? */ + if (!((crc32_sum == found_chksum && + cbh->h_chksum_type == + JBD2_CRC32_CHKSUM && + cbh->h_chksum_size == + JBD2_CRC32_CHKSUM_SIZE) || + (cbh->h_chksum_type == 0 && + cbh->h_chksum_size == 0 && + found_chksum == 0))) + goto chksum_error; + crc32_sum = ~0; } if (pass == PASS_SCAN && !jbd2_commit_block_csum_verify(journal, bh->b_data)) { + chksum_error: info->end_transaction = next_commit_ID;
if (!jbd2_has_feature_async_commit(journal)) {
From: changfengnan fengnanchang@foxmail.com
mainline inclusion from mainline-v5.10-rc1 commit fc750a3b44bdccb9fb96d6abbc48a9b8e480ce7b category: bugfix bugzilla: 45093 CVE: NA
-----------------------------------------------
When ext4 is formatted with lazy_journal_init=1 and transactions from the previous filesystem are still on disk, it is possible that they are considered during a recovery after a crash. Because the checksum seed has changed, the CRC check will fail, and the journal recovery fails with checksum error although the journal is otherwise perfectly valid. Fix the problem by checking commit block time stamps to determine whether the data in the journal block is just stale or whether it is indeed corrupt.
Reported-by: kernel test robot lkp@intel.com Reviewed-by: Andreas Dilger adilger@dilger.ca Signed-off-by: Fengnan Chang changfengnan@hikvision.com Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20201012164900.20197-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- fs/jbd2/recovery.c | 78 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 66 insertions(+), 12 deletions(-)
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index 26e640adc66f5..b758865690f82 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -428,6 +428,8 @@ static int do_one_pass(journal_t *journal, __u32 crc32_sum = ~0; /* Transactional Checksums */ int descr_csum_size = 0; int block_error = 0; + bool need_check_commit_time = false; + __u64 last_trans_commit_time = 0, commit_time;
/* * First thing is to establish what we expect to find in the log @@ -520,12 +522,21 @@ static int do_one_pass(journal_t *journal, if (descr_csum_size > 0 && !jbd2_descriptor_block_csum_verify(journal, bh->b_data)) { - printk(KERN_ERR "JBD2: Invalid checksum " - "recovering block %lu in log\n", - next_log_block); - err = -EFSBADCRC; - brelse(bh); - goto failed; + /* + * PASS_SCAN can see stale blocks due to lazy + * journal init. Don't error out on those yet. + */ + if (pass != PASS_SCAN) { + pr_err("JBD2: Invalid checksum recovering block %lu in log\n", + next_log_block); + err = -EFSBADCRC; + brelse(bh); + goto failed; + } + need_check_commit_time = true; + jbd_debug(1, + "invalid descriptor block found in %lu\n", + next_log_block); }
/* If it is a valid descriptor block, replay it @@ -535,6 +546,7 @@ static int do_one_pass(journal_t *journal, if (pass != PASS_REPLAY) { if (pass == PASS_SCAN && jbd2_has_feature_checksum(journal) && + !need_check_commit_time && !info->end_transaction) { if (calc_chksums(journal, bh, &next_log_block, @@ -683,11 +695,41 @@ static int do_one_pass(journal_t *journal, * mentioned conditions. Hence assume * "Interrupted Commit".) */ + commit_time = be64_to_cpu( + ((struct commit_header *)bh->b_data)->h_commit_sec); + /* + * If need_check_commit_time is set, it means we are in + * PASS_SCAN and csum verify failed before. If + * commit_time is increasing, it's the same journal, + * otherwise it is stale journal block, just end this + * recovery. + */ + if (need_check_commit_time) { + if (commit_time >= last_trans_commit_time) { + pr_err("JBD2: Invalid checksum found in transaction %u\n", + next_commit_ID); + err = -EFSBADCRC; + brelse(bh); + goto failed; + } + ignore_crc_mismatch: + /* + * It likely does not belong to same journal, + * just end this recovery with success. + */ + jbd_debug(1, "JBD2: Invalid checksum ignored in transaction %u, likely stale data\n", + next_commit_ID); + err = 0; + brelse(bh); + goto done; + }
- /* Found an expected commit block: if checksums - * are present verify them in PASS_SCAN; else not + /* + * Found an expected commit block: if checksums + * are present, verify them in PASS_SCAN; else not * much to do other than move on to the next sequence - * number. */ + * number. + */ if (pass == PASS_SCAN && jbd2_has_feature_checksum(journal)) { struct commit_header *cbh = @@ -719,6 +761,8 @@ static int do_one_pass(journal_t *journal, !jbd2_commit_block_csum_verify(journal, bh->b_data)) { chksum_error: + if (commit_time < last_trans_commit_time) + goto ignore_crc_mismatch; info->end_transaction = next_commit_ID;
if (!jbd2_has_feature_async_commit(journal)) { @@ -728,11 +772,24 @@ static int do_one_pass(journal_t *journal, break; } } + if (pass == PASS_SCAN) + last_trans_commit_time = commit_time; brelse(bh); next_commit_ID++; continue;
case JBD2_REVOKE_BLOCK: + /* + * Check revoke block crc in pass_scan, if csum verify + * failed, check commit block time later. + */ + if (pass == PASS_SCAN && + !jbd2_descriptor_block_csum_verify(journal, + bh->b_data)) { + jbd_debug(1, "JBD2: invalid revoke block found in %lu\n", + next_log_block); + need_check_commit_time = true; + } /* If we aren't in the REVOKE pass, then we can * just skip over this block. */ if (pass != PASS_REVOKE) { @@ -800,9 +857,6 @@ static int scan_revoke_records(journal_t *journal, struct buffer_head *bh, offset = sizeof(jbd2_journal_revoke_header_t); rcount = be32_to_cpu(header->r_count);
- if (!jbd2_descriptor_block_csum_verify(journal, header)) - return -EFSBADCRC; - if (jbd2_journal_has_csum_v2or3(journal)) csum_size = sizeof(struct jbd2_journal_block_tail); if (rcount > journal->j_blocksize - csum_size)
From: yanghui yanghui.def@bytedance.com
mainline inclusion from mainline-v5.15-rc1 commit 276aeee1c5fc00df700f0782060beae126600472 category: bugfix bugzilla: 181417 CVE: NA
-----------------------------------------------
Servers happened below panic:
Kernel version:5.4.56 BUG: unable to handle page fault for address: 0000000000002c48 RIP: 0010:__next_zones_zonelist+0x1d/0x40 Call Trace: __alloc_pages_nodemask+0x277/0x310 alloc_page_interleave+0x13/0x70 handle_mm_fault+0xf99/0x1390 __do_page_fault+0x288/0x500 do_page_fault+0x30/0x110 page_fault+0x3e/0x50
The reason for the panic is that MAX_NUMNODES is passed in the third parameter in __alloc_pages_nodemask(preferred_nid). So access to zonelist->zoneref->zone_idx in __next_zones_zonelist will cause a panic.
In offset_il_node(), first_node() returns nid from pol->v.nodes, after this other threads may chang pol->v.nodes before next_node(). This race condition will let next_node return MAX_NUMNODES. So put pol->nodes in a local variable.
The race condition is between offset_il_node and cpuset_change_task_nodemask:
CPU0: CPU1: alloc_pages_vma() interleave_nid(pol,) offset_il_node(pol,) first_node(pol->v.nodes) cpuset_change_task_nodemask //nodes==0xc mpol_rebind_task mpol_rebind_policy mpol_rebind_nodemask(pol,nodes) //nodes==0x3 next_node(nid, pol->v.nodes)//return MAX_NUMNODES
Link: https://lkml.kernel.org/r/20210906034658.48721-1-yanghui.def@bytedance.com Signed-off-by: yanghui yanghui.def@bytedance.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- mm/mempolicy.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 9e7be03c78a2e..76a577cfc2778 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1941,17 +1941,26 @@ unsigned int mempolicy_slab_node(void) */ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) { - unsigned nnodes = nodes_weight(pol->v.nodes); - unsigned target; + nodemask_t nodemask = pol->v.nodes; + unsigned int target, nnodes; int i; int nid; + /* + * The barrier will stabilize the nodemask in a register or on + * the stack so that it will stop changing under the code. + * + * Between first_node() and next_node(), pol->v.nodes could be changed + * by other threads. So we put pol->v.nodes in a local stack. + */ + barrier();
+ nnodes = nodes_weight(nodemask); if (!nnodes) return numa_node_id(); target = (unsigned int)n % nnodes; - nid = first_node(pol->v.nodes); + nid = first_node(nodemask); for (i = 0; i < target; i++) - nid = next_node(nid, pol->v.nodes); + nid = next_node(nid, nodemask); return nid; }
From: Yang Xingui yangxingui@huawei.com
driver inclusion category: bugfix bugzilla: NA CVE: NA
Debugfs dump should be executed before FLR run for we have to dump some registers before reset by FLR. So it's wrong to queue debugfs dump work when running FLR work for these two work queue in same workqueue. It mean that Debugfs dump work is alway execute after FLR and get data which is reset.
Signed-off-by: Yang Xingui yangxingui@huawei.com Reviewed-by: Kangfenglong kangfenglong@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- drivers/scsi/hisi_sas/hisi_sas.h | 1 + drivers/scsi/hisi_sas/hisi_sas_main.c | 8 ++++---- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + 3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h index 742ffcaeaa95c..3fd32606ecb00 100644 --- a/drivers/scsi/hisi_sas/hisi_sas.h +++ b/drivers/scsi/hisi_sas/hisi_sas.h @@ -322,6 +322,7 @@ struct hisi_sas_hw { void (*snapshot_restore)(struct hisi_hba *hisi_hba); const struct cpumask *(*get_managed_irq_aff)(struct hisi_hba *hisi_hba, int queue); + void (*debugfs_work_handler)(struct work_struct *work); int max_command_entries; int complete_hdr_size; struct scsi_host_template *sht; diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 67befcc033126..bde4307596234 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -1596,16 +1596,16 @@ static int hisi_sas_controller_reset(struct hisi_hba *hisi_hba) struct Scsi_Host *shost = hisi_hba->shost; int rc;
- if (hisi_sas_debugfs_enable && hisi_hba->debugfs_itct && - !hisi_hba->debugfs_dump_dentry) - queue_work(hisi_hba->wq, &hisi_hba->debugfs_work); - if (!hisi_hba->hw->soft_reset) return -EINVAL;
if (test_and_set_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags)) return -EPERM;
+ if (hisi_sas_debugfs_enable && hisi_hba->debugfs_itct && + !hisi_hba->debugfs_dump_dentry) + hisi_hba->hw->debugfs_work_handler(&hisi_hba->debugfs_work); + dev_info(dev, "controller resetting...\n"); hisi_sas_controller_reset_prepare(hisi_hba);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 9ce1177a8e455..0e4cc16e542d6 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -3332,6 +3332,7 @@ static const struct hisi_sas_hw hisi_sas_v3_hw = { .snapshot_restore = debugfs_snapshot_restore_v3_hw, .set_bist = debugfs_set_bist_v3_hw, .get_managed_irq_aff = get_managed_irq_aff_v3_hw, + .debugfs_work_handler = hisi_sas_debugfs_work_handler, };
static struct Scsi_Host *
From: Laibin Qiu qiulaibin@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4FS3G?from=project-issue CVE: NA
---------------------------
There are some language problems in the README file, and MarkDown fromat syntax is not effective, and it needs to be adjusted.
Signed-off-by: suqin suqin2@huawei.com Signed-off-by: Laibin Qiu qiulaibin@huawei.com Reviewed-by: Cheng Jian cj.chengjian@huawei.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com --- README | 226 --------------------------------------------------- README.md | 237 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 237 insertions(+), 226 deletions(-) delete mode 100644 README create mode 100644 README.md
diff --git a/README b/README deleted file mode 100644 index 46c9ea3522c1c..0000000000000 --- a/README +++ /dev/null @@ -1,226 +0,0 @@ -Contributions to openEuler kernel project -========================================= - -Sign CLA --------- - -Before submitting any Contributions to openEuler, you have to sign CLA. - -See: - https://openeuler.org/zh/cla.html - https://openeuler.org/en/cla.html - -Steps of submitting patches ---------------------------- - -1. Compile and test your patches successfully. -2. Generate patches - Your patches should be based on top of latest openEuler branch, and should - use git-format-patch to generate patches, and if it's a patchset, it's - better to use --cover-letter option to describe what the patchset does. - - Using scripts/checkpatch.pl to make sure there's no coding style issue. - - And make sure your patch follow unified openEuler patch format describe - below. - -3. Send patch to openEuler mailing list - Use this command to send patches to openEuler mailing list: - - git send-email *.patch -to="kernel@openeuler.org" --suppress-cc=all - - *NOTE*: that you must add --suppress-cc=all if you use git send-email, - otherwise the email will be cced to the people in upstream community and mailing - lists. - - *See*: How to send patches using git-send-email - https://git-scm.com/docs/git-send-email - -4. Mark "v1, v2, v3 ..." in your patch subject if you have multiple versions - to send out. - - Use --subject-prefix="PATCH v2" option to add v2 tag for patchset. - git format-patch --subject-prefix="PATCH v2" -1 - - Subject examples: - Subject: [PATCH v2 01/27] fork: fix some -Wmissing-prototypes warnings - Subject: [PATCH v3] ext2: improve scalability of bitmap searching - -5. Upstream your kernel patch to kernel community is strongly recommended. - openEuler will sync up with kernel master timely. - -6. Sign your work - the Developer’s Certificate of Origin - As the same of upstream kernel community, you also need to sign your patch. - - See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html - - The sign-off is a simple line at the end of the explanation for the patch, - which certifies that you wrote it or otherwise have the right to pass it - on as an open-source patch. The rules are pretty simple: if you can certify - the below: - - Developer’s Certificate of Origin 1.1 - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - - By making a contribution to this project, I certify that: - - (a) The contribution was created in whole or in part by me and I have - the right to submit it under the open source license indicated in - the file; or - - (b The contribution is based upon previous work that, to the best of - my knowledge, is covered under an appropriate open source license - and I have the right under that license to submit that work with - modifications, whether created in whole or in part by me, under - the same open source license (unless I am permitted to submit under - a different license), as indicated in the file; or - - (c) The contribution was provided directly to me by some other person - who certified (a), (b) or (c) and I have not modified it. - - (d) I understand and agree that this project and the contribution are - public and that a record of the contribution (including all personal - information I submit with it, including my sign-off) is maintained - indefinitely and may be redistributed consistent with this project - or the open source license(s) involved. - - then you just add a line saying: - - Signed-off-by: Random J Developer random@developer.example.org - - using your real name (sorry, no pseudonyms or anonymous contributions.) - -Use unified patch format ------------------------- - -Reasons: - -1. long term maintainability - openEuler will merge massive patches. If all patches are merged by casual - changelog format without a unified format, the git log will be messy, and - then it's hard to figure out the original patch. - -2. kernel upgrade - We definitely will upgrade our openEuler kernel in someday, using strict - patch management will alleviate the pain to migrate patches during big upgrade. - -3. easy for script parsing - Keyword highlighting is necessary for script parsing. - -Patch format definition ------------------------ - -[M] stands for "mandatory" -[O] stands for "option" -$category can be: bug preparation, bugfix, perf, feature, doc, other... - -If category is feature, then we also need to add feature name like below: - category: feature - feature: YYY (the feature name) - -If the patch is related to CVE or bugzilla, then we need add the corresponding -tag like below (In general, it should include at least one of the following): - CVE: $cve-id - bugzilla: $bug-id - -Additional changelog should include at least one of the flollwing: - 1) Why we should apply this patch - 2) What real problem in product does this patch resolved - 3) How could we reproduce this bug or how to test - 4) Other useful information for help to understand this patch or problem - -The detail information is very useful for porting patch to another kenrel branch. - -Example for mainline patch: - - mainline inclusion [M] - from $mainline-version [M] - commit $id [M] - category: $category [M] - bugzilla: $bug-id [O] - CVE: $cve-id [O] - - additional changelog [O] - - -------------------------------- - - original changelog - - Signed-off-by: $yourname $yourname@huawei.com [M] - - ($mainline-version could be mainline-3.5, mainline-3.6, etc...) - -Examples --------- - -mainline inclusion -from mainline-4.10 -commit 0becc0ae5b42828785b589f686725ff5bc3b9b25 -category: bugfix -bugzilla: 3004 -CVE: NA - -The patch fixes a BUG_ON in the product: injecting single bit ECC error -to memory before system boot use hardware inject tools, which cause a -large amount of CMCI during system booting . - -[ 1.146580] mce: [Hardware Error]: Machine check events logged -[ 1.152908] ------------[ cut here ]------------ -[ 1.157751] kernel BUG at kernel/timer.c:951! -[ 1.162321] invalid opcode: 0000 [#1] SMP -... - -------------------------------------------------- - -original changelog - -<original S-O-B> -Signed-off-by: Zhang San zhangsan@huawei.com -Tested-by: Li Si lisi@huawei.com - -Email Client - Thunderbird Settings ------------------------------------ - -If you are newly developer in the kernel community, it is highly recommended -to use thunderbird mail client. - -1. Thunderbird Installation - Get English version Thunderbird from http://www.mozilla.org/ and install - it on your system。 - - Download url: https://www.thunderbird.net/en-US/thunderbird/all/ - -2. Settings - 2.1 Use plain text format instead of HTML format - Options -> Account Settings -> Composition & Addressing, do *NOT* select - "Compose message in HTML format". - - 2.2 Editor Settings - Tools->Options->Advanced->Config editor. - - - To bring up the thunderbird's registry editor, and set: - "mailnews.send_plaintext_flowed" to "false". - - Disable HTML Format: Set "mail.identity.id1.compose_html" to "false". - - Enable UTF8: Set "prefs.converted-to-utf8" to "true". - - View message in UTF-8: Set "mailnews.view_default_charset" to "UTF-8". - - Set mailnews.wraplength to 9999 for avoiding auto-wrap - -Linux kernel -============ - -There are several guides for kernel developers and users. These guides can -be rendered in a number of formats, like HTML and PDF. Please read -Documentation/admin-guide/README.rst first. - -In order to build the documentation, use ``make htmldocs`` or -``make pdfdocs``. The formatted documentation can also be read online at: - - https://www.kernel.org/doc/html/latest/ - -There are various text files in the Documentation/ subdirectory, -several of them using the Restructured Text markup notation. -See Documentation/00-INDEX for a list of what is contained in each file. - -Please read the Documentation/process/changes.rst file, as it contains the -requirements for building and running the kernel, and information about -the problems which may result by upgrading your kernel. diff --git a/README.md b/README.md new file mode 100644 index 0000000000000..20832fd85d356 --- /dev/null +++ b/README.md @@ -0,0 +1,237 @@ +# How to Contribute +------- + +- [How to Contribute](#How to Contribute) + + - [Sign the CLA](#Sign the CLA) + + - [Steps of submitting patches](#Steps of submitting patches) + + - [Use the unified patch format](#Use the unified patch format) + + - [Define the patch format](#Define the patch format) + + - [Examples](#Examples) + + - [Email client - Thunderbird settings](#Email client - Thunderbird settings) + +- [Linux kernel](#Linux kernel) + +### Sign the CLA + +------- + +Before making any contributions to openEuler, sign the CLA first. + +Address: [https://openeuler.org/en/cla.html%5D(https://openeuler.org/en/cla.html) + +### Steps of submitting patches +------- + +**Step 1** Compile and test your patches. + +**Step 2** Generate patches. + +Your patches should be generated based on the latest openEuler branch using git-format-patch. If your patches are in a patchset, it is better to use the **--cover-letter** option to describe what the patchset does. + +Use **scripts/checkpatch.pl** to ensure that no coding style issue exists. + +In addition, ensure that your patches comply with the unified openEuler patch format described below. + +**Step 3** Send your patches to the openEuler mailing list. + +To do so, run the following command: + + `git send-email *.patch -to="kernel@openeuler.org" --suppress-cc=all` + +*NOTE*: Add **--suppress-cc=all** if you use git-send-email; otherwise, the email will be copied to all people in the upstream community and mailing lists. + +For details about how to send patches using git-send-email, see [https://git-scm.com/docs/git-send-email%5D(https://git-scm.com/docs/git-send...). + +**Step 4** Mark "v1, v2, v3 ..." in your patch subject if you have multiple versions to send out. + +Use the **--subject-prefix="PATCH v2"** option to add the v2 tag to the patchset. + + `git format-patch --subject-prefix="PATCH v2" -1` + +Subject examples: + + Subject: [PATCH v2 01/27] fork: fix some -Wmissing-prototypes warnings + + Subject: [PATCH v3] ext2: improve scalability of bitmap searching + +**Step 5** Upstream your kernel patches to the kernel community (recommended). openEuler will synchronize with the kernel master in a timely manner. + +**Step 6** Sign your work - the Developer’s Certificate of Origin. + + Similar to the upstream kernel community, you also need to sign your patch. + + For details, see [https://www.kernel.org/doc/html/latest/process/submitting-patches.html%5D(ht...). + + The sign-off is a simple line at the end of the explanation of the patch, which certifies that you wrote it or otherwise have the right to pass it on as an open source patch. The rules are pretty simple. You can certify as below: + + Developer’s Certificate of Origin 1.1 + + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + By making a contribution to this project, I certify that: + + (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; + + (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; + + (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. + + (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. + +Then you add a line saying: + +Signed-off-by: Random J Developer random@developer.example.org + +Use your real name (sorry, no pseudonyms or anonymous contributions). + +### Use the unified patch format +------- + +Reasons: + +1. Long term maintainability + + openEuler will merge massive patches. If all patches are merged by casual + + changelog formats without a unified format, the git logs will be messy, and + + then it is hard to figure out the original patches. + +2. Kernel upgrade + + We definitely will upgrade our openEuler kernel in someday, so strict patch management + + will alleviate the pain to migrate patches during big upgrades. + +3. Easy for script parsing + + Keyword highlighting is necessary for script parsing. + +### Define the patch format +------- + +[M] stands for "mandatory". + +[O] stands for "option". + +$category can be: bug preparation, bugfix, perf, feature, doc, other... + +If category is feature, we need to add a feature name as below: + +```cpp +category: feature +feature: YYY (the feature name) +``` + +If the patch is related to CVE or bugzilla, we need to add the corresponding tag as below (In general, it should include at least one of the following): + +```cpp +CVE: $cve-id +bugzilla: $bug-id +``` + +Additional changelog should include at least one of the following: + +1. Why we should apply this patch + +2. What real problems in the product does this patch resolved + +3. How could we reproduce this bug or how to test + +4. Other useful information for help to understand this patch or problem + +The detailed information is very useful for migrating a patch to another kernel branch. + +Example for mainline patch: + +```cpp +mainline inclusion [M] +from $mainline-version [M] +commit $id [M] +category: $category [M] +bugzilla: $bug-id [O] +CVE: $cve-id [O] + +additional changelog [O] + +-------------------------------- + +original changelog +Signed-off-by: $yourname $yourname@huawei.com [M] +($mainline-version could be mainline-3.5, mainline-3.6, etc...) +``` + +### Examples +------- + +```cpp +mainline inclusion +from mainline-4.10 +commit 0becc0ae5b42828785b589f686725ff5bc3b9b25 +category: bugfix +bugzilla: 3004 +CVE: N/A + +The patch fixes a BUG_ON in the product: Injecting a single bit ECC error to the memory before system boot using hardware inject tools will cause a large amount of CMCI during system booting . +[ 1.146580] mce: [Hardware Error]: Machine check events logged +[ 1.152908] ------------[ cut here ]------------ +[ 1.157751] kernel BUG at kernel/timer.c:951! +[ 1.162321] invalid opcode: 0000 [#1] SMP + +------------------------------------------------- + +original changelog + +<original S-O-B> +Signed-off-by: Zhang San zhangsan@huawei.com +Tested-by: Li Si lisi@huawei.com +``` + +### Email client - Thunderbird settings +------- + +If you are a new developer in the kernel community, it is highly recommended that you use the Thunderbird mail client. + +1. Thunderbird Installation + + Obtain the English version of Thunderbird from [http://www.mozilla.org/%5D( http://www.mozilla.org/) and install it on your system. + + Download URL: https://www.thunderbird.net/en-US/thunderbird/all/ + +2. Settings + + 2.1 Use the plain text format instead of the HTML format. + + Choose **Options > Account Settings > Composition & Addressing**, and do **NOT** select Compose message in HTML format. + + 2.2 Editor settings + + **Tools > Options> Advanced > Config editor** + + - To bring up the Thunderbird's registry editor, set **mailnews.send_plaintext_flowed** to **false**. + + - Disable HTML Format: Set **mail.identity.id1.compose_html** to **false**. + + - Enable UTF-8: Set **prefs.converted-to-utf8** to **true**. + + - View messages in UTF-8: Set **mailnews.view_default_charset** to **UTF-8**. + + - Set **mailnews.wraplength** to **9999** to avoid auto-wrap. + +# Linux kernel +------- + +There are several guides for kernel developers and users, which can be rendered in a number of formats, like HTML and PDF. You can read **Documentation/admin-guide/README.rst** first. + +In order to build the documentation, use **make htmldocs** or **make pdfdocs**. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ + +There are various text files in the Documentation/ subdirectory, several of which use the Restructured Text markup notation. See Documentation/00-INDEX for a list of what is contained in each file. + +Read the **Documentation/process/changes.rst** file, as it contains the requirements for building and running the kernel, and information about the problems that may be caused by upgrading your kernel. +