From: lipeng (Y) Sent: Sunday, April 25, 2021 2:46 AM
在 2021/4/24 22:51, Salil Mehta 写道:
The fallback logic of the set channels when set_channel() fails to configure TX Sched/RSS H/W configuration or for the code which brings down/restores client before/therafter is not handled properly.
I do not really get the problem here, what is not proper.
I should admit that I had loosely put the subject and the matter in earlier RFC patch I shared and I copied it here the same. I need to update the subject and also perhaps re-word it to clarify what is it is precisely doing before finally sending it. Perhaps I will be breaking this patch into further smaller changes and sending it to 'net-next' instead of 'net' repo.
For now, I have explained the problems and inconsistencies in their handling below.
before/therafter --- is any mistake here?
Above is what you call 'change' and the code in the patch reflects it.
Multiple problems: 1. Logic used during recovery is inconsistent. For example, below are the handling cases from the existing/previous code:
Case 1: int hns3_set_channels(struct net_device *netdev, struct ethtool_channels *ch) { [...]
ret = hns3_reset_notify(h, HNAE3_DOWN_CLIENT); if (ret) return ret;
ret = hns3_reset_notify(h, HNAE3_UNINIT_CLIENT); if (ret) return ret;
Problem 1: Here above, we don’t care to handle the error properly or even print the message to the user what caused an error. In this case, user will find the net_device in *down* state after failure to configure the specified number of channels.
Resolution: If we cannot do much with the returned error we should at least intimate Or instruct user what to do to get out of the state.
Problem 2: If we have intentionally fabricated hns3_reset_notify() API to necessarily return errors for all types maybe because we think operations like HNAE3_UNINIT_CLIENT can return error then we cannot assume after this error device will be in sane state.
Resolution: This then again requires user to be intimated about this and user can either reload the driver or reset.
Problem 3: the return types of hns3_reset_notify() for the HNAE3_{DOWN, UINIT}_CLIENT are not . Ideally they should be *void* since netif_tx_stop_all_queues() and netif_carrier_off() are void.
Resolution: If we need to solve this then we require separate patch and should follow this patch. Ideally, none of the operations HNAE3_{DOWN, UINIT}_CLIENT should return error.
Problem 4: Client down and restore logic is spanning across functions i.e. It is brought *down* within hns3_set_channels() and then is restored inside function hns3_change_channels() but then half of the error handling is being done inside the hns3_change_channels() and then other half of restoring to the old TPQ is being initiated from hns3_set_channels(). All of these are against the principles of good "structured programming".
The functions naming and prototype should be such that one does not have to read the entire code inside always. It should do to the entirety (along with any fallbacks/reverts inside) what its name suggest i.e. almost self-contained - all related code inside it!
"Structured Programming" is a paradigm just like "Object-Oriented Programming"
Resolution: Either entire logic including the error handling/fallback should exist within the hns3_set_channels() OR we should make the function hns3_change_channels() self-contained. This patch does later.
[...] }
Case 2: static int hns3_change_channels(struct hnae3_handle *handle, u32 new_tqp_num, bool rxfh_configured) { int ret;
ret = handle->ae_algo->ops->set_channels(handle, new_tqp_num, rxfh_configured); if (ret) { dev_err(&handle->pdev->dev, "Change tqp num(%u) fail.\n", new_tqp_num); return ret; }
Problem 5: If this happens multiple times then earlier code just returns with some error to the user not indicating that this device now is not in a functional state. Hence, user will need to reset or reload the driver.
Handling in hns3_set_channels(): if (ret1) { netdev_err(netdev, "revert to old channel fail\n"); return ret1;
After above, user will find that net device is either missing or not functional.
Resolution: Indicate to the user clearly what went wrong and how to come out of it. This is part and parcel of the handling of errors and fallback logic. This patch does that.
Problem 6: After HNAE3_UP_CLIENT fails(which ideally should never) the error handling is to reverse the earlier successful initialization of the client. Why?
If this happens
ret = hns3_reset_notify(handle, HNAE3_INIT_CLIENT); if (ret) return ret;
ret = hns3_reset_notify(handle, HNAE3_UP_CLIENT); if (ret) hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT);
return ret; }
1. If above operation also fails why should handling be different than when "ifconfig <netdev> up" fails(if ever)? 2. Could we not just ask user to attempt to UP the net_device manually? OR 3. Request user to reset the device or reload the driver if he faces further failure?
Resolution: Just intimate user with appropriate error and advice to reset or reload the driver if further error persists. This is the approach being taken in the patch.
Thanks Salil.
Fix and refactor the code for handling the errors properly and to improve the readibility.
Fixes: 3a5a5f06d4d2 ("net: hns3: revert to old channel when setting new channel
num fail")
Signed-off-by: Salil Mehta salil.mehta@huawei.com Signed-off-by: Peng Li lipeng321@huawei.com
.../net/ethernet/hisilicon/hns3/hns3_enet.c | 77 +++++++++++-------- 1 file changed, 47 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index bf4302a5cf95..fbb0f4c9b98e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -4692,24 +4692,60 @@ static int hns3_reset_notify(struct hnae3_handle
*handle,
static int hns3_change_channels(struct hnae3_handle *handle, u32
new_tqp_num,
bool rxfh_configured)
{
- int ret;
- const struct hnae3_ae_ops *ops = handle->ae_algo->ops;
- u32 org_tqp_num = handle->kinfo.num_tqps;
- struct device *dev = &handle->pdev->dev;
- u32 req_tqp_num = new_tqp_num;
- bool revert_old_config = false;
- int ret, retval = 0;
- /* bring down the client */
- ret = hns3_reset_notify(handle, HNAE3_DOWN_CLIENT);
- if (ret) {
dev_err(dev, "client down fail, this should'nt have happened!\n");
return ret;
- }
- ret = handle->ae_algo->ops->set_channels(handle, new_tqp_num,
rxfh_configured);
- ret = hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT); if (ret) {
dev_err(&handle->pdev->dev,
"Change tqp num(%u) fail.\n", new_tqp_num);
return ret; }dev_err(dev, "client uninit fail, this should'nt have happened!\n");
+revert_old_tpqs_config:
- /* update the TX Sched and RSS config in the H/W */
- ret = ops->set_channels(handle, req_tqp_num, rxfh_configured);
- if (ret) {
dev_err(dev, "TX Sched/RSS H/W cfg fail(=%d) for %s TPQs\n",
ret, revert_old_config ? "old" : "new");
goto err_set_channel;
- }
- /* restore the client */ ret = hns3_reset_notify(handle, HNAE3_INIT_CLIENT);
- if (ret)
return ret;
if (ret) {
dev_err(dev, "failed to initialize the client again\n");
goto err_set_channel;
}
ret = hns3_reset_notify(handle, HNAE3_UP_CLIENT);
- if (ret)
hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT);
- if (ret) {
dev_err(dev, "Client up fail, this should'nt have happened!\n");
return ret;
- }
- return retval;
+err_set_channel:
if (!revert_old_config) {
dev_warn(dev, "Revert TX Sched/RSS H/W config with old TPQs\n");
req_tqp_num = org_tqp_num;
revert_old_config = true;
retval = ret;
goto revert_old_tpqs_config;
}
dev_err(dev, "Bad, we could'nt revert to old TPQ H/W config\n");
dev_warn(dev, "Device maybe insane. Reload driver/Reset required!\n"); return ret; }
@@ -4720,7 +4756,6 @@ int hns3_set_channels(struct net_device *netdev, struct hnae3_knic_private_info *kinfo = &h->kinfo; bool rxfh_configured = netif_is_rxfh_configured(netdev); u32 new_tqp_num = ch->combined_count;
u16 org_tqp_num; int ret;
if (hns3_nic_resetting(netdev))
@@ -4750,28 +4785,10 @@ int hns3_set_channels(struct net_device *netdev, "set channels: tqp_num=%u, rxfh=%d\n", new_tqp_num, rxfh_configured);
- ret = hns3_reset_notify(h, HNAE3_DOWN_CLIENT);
- if (ret)
return ret;
- ret = hns3_reset_notify(h, HNAE3_UNINIT_CLIENT);
- if (ret)
return ret;
- org_tqp_num = h->kinfo.num_tqps; ret = hns3_change_channels(h, new_tqp_num, rxfh_configured); if (ret) {
int ret1;
netdev_warn(netdev,
"Change channels fail, revert to old value\n");
ret1 = hns3_change_channels(h, org_tqp_num, rxfh_configured);
if (ret1) {
netdev_err(netdev,
"revert to old channel fail\n");
return ret1;
}
netdev_err(netdev, "fail(=%d) to set number of channels to %u\n", ret,
return ret; }new_tqp_num);