On 2021/6/8 3:46, Jakub Kicinski wrote:
On Mon, 7 Jun 2021 09:36:38 +0800 Yunsheng Lin wrote:
On 2021/6/5 2:41, Jakub Kicinski wrote:
On Fri, 4 Jun 2021 09:18:04 +0800 Yunsheng Lin wrote:
My initial thinking is a id from a global IDA pool, which indeed may change on every boot.
I am not really thinking much deeper about the controller id, just mirroring the bus identifiers for pcie device and ifindex for netdev,
devlink instance id seems fine, but there's already a controller concept in devlink so please steer clear of that naming.
I am not sure if controller concept already existed is reusable for the devlink instance representing problem for multi-function which shares common resource in the same ASIC. If not, we do need to pick up other name.
Another thing I am not really think throught is how is the VF represented by the devlink instance when VF is passed through to a VM. I was thinking about VF is represented as devlink port, just like PF(with different port flavour), and VF devlink port only exist on the same host as PF(which assumes PF is never passed through to a VM), so it may means the PF is responsible for creating the devlink port for VF when VF is passed through to a VM?
Or do we need to create a devlink instance for VF in the VM too when the VF is passed through to a VM? Or more specificly, does user need to query or configure devlink info or configuration in a VM? If not, then devlink instance in VM seems unnecessary?
I believe the current best practice is to create a devlink instance for the VF with a devlink port of type "virtual". Such instance represents a "virtualized" view of the device.
Afer discussion with Parav in other thread, I undersood it was the current practice, but I am not sure I understand why it is current *best* practice.
If we allow all PF of a ASCI to register to the same devlink instance, does it not make sense that all VF under one PF also register to the same devlink instance that it's PF is registering to when they are in the same host?
For eswitch legacy mode, whether VF and PF are the same host or not, the VF can also provide the serial number of a ASIC to register to the devlink instance, if that devlink instance does not exist yet, just create that devlink instance according to the serial number, just like PF does.
For eswitch DEVLINK_ESWITCH_MODE_SWITCHDEV mode, the flavour type for devlink port instance representing the netdev of VF function is FLAVOUR_VIRTUAL, the flavour type for devlink port instance representing the representor netdev of VF is FLAVOUR_PCI_VF, which are different type, so they can register to the same devlink instance even when both of the devlink port instance is in the same host?
Is there any reason why VF use its own devlink instance?
which may change too if the device is pluged into different pci slot on every boot?
Heh. What is someone reflashes the part to change it's serial number? :) pci slot is reasonably stable, as proven by years of experience trying to find stable naming for netdevs.
I suppose that requires a booting to take effect and a vendor tool to reflash the serial number, it seems reasonable the vendor/user will try their best to not mess the serial number, otherwise service and maintenance based on serial number will not work? I was thinking about adding the vendor name besides the serial number to indicate a devlink instance, to avoid that case that two hw from different vendor having the same serial number accidentally.
I'm not opposed to the use of attributes such as serial number for selecting instance, in principle. I was just trying to prove that PCI slot/PCI device name is as stable as any other attribute.
In fact for mass-produced machines using PCI slot is far more convenient than globally unique identifiers because it can be used to talk to a specific device in a server for all machines of a given model, hence easing automation.
Make sense.
We could still allow devlink instances to have multiple names, which seems to be more like devlink tool problem?
For example, devlink tool could use the id or the vendor_info/ serial_number to indicate a devlink instance according to user's request.
Typing serial numbers seems pretty painful.
Aliase could be allowed too as long as devlink core provide a field and ops to set/get the field mirroring the ifalias for netdevice?
I don't understand.
I meant we could still allow the user to provide a more meaningful name to indicate a devlink instance besides the id.
To clarify/summarize my statement above serial number may be a useful addition but PCI device names should IMHO remain the primary identifiers, even if it means devlink instances with multiple names.
I am not sure I understand what does it mean by "devlink instances with multiple names"?
Does that mean whenever a devlink port instance is registered to a devlink instance, that devlink instance get a new name according to the PCI device which the just registered devlink port instance corresponds to?
In addition I don't think that user-controlled names/aliases are necessarily a great idea for devlink.
.
On Tue, 8 Jun 2021 20:10:37 +0800 Yunsheng Lin wrote:
I am not sure if controller concept already existed is reusable for the devlink instance representing problem for multi-function which shares common resource in the same ASIC. If not, we do need to pick up other name.
Another thing I am not really think throught is how is the VF represented by the devlink instance when VF is passed through to a VM. I was thinking about VF is represented as devlink port, just like PF(with different port flavour), and VF devlink port only exist on the same host as PF(which assumes PF is never passed through to a VM), so it may means the PF is responsible for creating the devlink port for VF when VF is passed through to a VM?
Or do we need to create a devlink instance for VF in the VM too when the VF is passed through to a VM? Or more specificly, does user need to query or configure devlink info or configuration in a VM? If not, then devlink instance in VM seems unnecessary?
I believe the current best practice is to create a devlink instance for the VF with a devlink port of type "virtual". Such instance represents a "virtualized" view of the device.
Afer discussion with Parav in other thread, I undersood it was the current practice, but I am not sure I understand why it is current *best* practice.
If we allow all PF of a ASCI to register to the same devlink instance, does it not make sense that all VF under one PF also register to the same devlink instance that it's PF is registering to when they are in the same host?
For eswitch legacy mode, whether VF and PF are the same host or not, the VF can also provide the serial number of a ASIC to register to the devlink instance, if that devlink instance does not exist yet, just create that devlink instance according to the serial number, just like PF does.
For eswitch DEVLINK_ESWITCH_MODE_SWITCHDEV mode, the flavour type for devlink port instance representing the netdev of VF function is FLAVOUR_VIRTUAL, the flavour type for devlink port instance representing the representor netdev of VF is FLAVOUR_PCI_VF, which are different type, so they can register to the same devlink instance even when both of the devlink port instance is in the same host?
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
I meant we could still allow the user to provide a more meaningful name to indicate a devlink instance besides the id.
To clarify/summarize my statement above serial number may be a useful addition but PCI device names should IMHO remain the primary identifiers, even if it means devlink instances with multiple names.
I am not sure I understand what does it mean by "devlink instances with multiple names"?
Does that mean whenever a devlink port instance is registered to a devlink instance, that devlink instance get a new name according to the PCI device which the just registered devlink port instance corresponds to?
Not devlink port, new PCI device. Multiple ports may reside on the same PCI function, some ports don't have a function (e.g. Ethernet ports).
On 2021/6/9 1:29, Jakub Kicinski wrote:
On Tue, 8 Jun 2021 20:10:37 +0800 Yunsheng Lin wrote:
I am not sure if controller concept already existed is reusable for the devlink instance representing problem for multi-function which shares common resource in the same ASIC. If not, we do need to pick up other name.
Another thing I am not really think throught is how is the VF represented by the devlink instance when VF is passed through to a VM. I was thinking about VF is represented as devlink port, just like PF(with different port flavour), and VF devlink port only exist on the same host as PF(which assumes PF is never passed through to a VM), so it may means the PF is responsible for creating the devlink port for VF when VF is passed through to a VM?
Or do we need to create a devlink instance for VF in the VM too when the VF is passed through to a VM? Or more specificly, does user need to query or configure devlink info or configuration in a VM? If not, then devlink instance in VM seems unnecessary?
I believe the current best practice is to create a devlink instance for the VF with a devlink port of type "virtual". Such instance represents a "virtualized" view of the device.
Afer discussion with Parav in other thread, I undersood it was the current practice, but I am not sure I understand why it is current *best* practice.
If we allow all PF of a ASCI to register to the same devlink instance, does it not make sense that all VF under one PF also register to the same devlink instance that it's PF is registering to when they are in the same host?
For eswitch legacy mode, whether VF and PF are the same host or not, the VF can also provide the serial number of a ASIC to register to the devlink instance, if that devlink instance does not exist yet, just create that devlink instance according to the serial number, just like PF does.
For eswitch DEVLINK_ESWITCH_MODE_SWITCHDEV mode, the flavour type for devlink port instance representing the netdev of VF function is FLAVOUR_VIRTUAL, the flavour type for devlink port instance representing the representor netdev of VF is FLAVOUR_PCI_VF, which are different type, so they can register to the same devlink instance even when both of the devlink port instance is in the same host?
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM case? But it might not be the case for VF using in container?
Also I read about the devlink disscusion betwwen you and jiri in [1]: "I think we agree that all objects of an ASIC should be under one devlink instance, the question remains whether both ends of the pipe for PCI devices (subdevs or not) should appear under ports or does the "far end" (from ASICs perspective)/"host end" get its own category."
I am not sure if there is already any conclusion about the latter part (I did not find the conclusion in that thread)?
"far end" (from ASICs perspective)/"host end" means PF/VF, right? Which seems to correspond to port flavor of FLAVOUR_PHYSICAL and FLAVOUR_VIRTUAL if we try to represent PF/VF using devlink port instance?
It seems the conclusion is very important to our disscusion in this thread, as we are trying to represent PF/VF as devlink port instance in this thread(at least that is what I think, hns3 does not support eswitch SWITCHDEV mode yet).
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet? pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
1. https://lore.kernel.org/netdev/20190304164007.7cef8af9@cakuba.netronome.com/...
I meant we could still allow the user to provide a more meaningful name to indicate a devlink instance besides the id.
To clarify/summarize my statement above serial number may be a useful addition but PCI device names should IMHO remain the primary identifiers, even if it means devlink instances with multiple names.
I am not sure I understand what does it mean by "devlink instances with multiple names"?
Does that mean whenever a devlink port instance is registered to a devlink instance, that devlink instance get a new name according to the PCI device which the just registered devlink port instance corresponds to?
Not devlink port, new PCI device. Multiple ports may reside on the same PCI function, some ports don't have a function (e.g. Ethernet ports).
Multiple ports on the same mainly PCI function means subfunction from mlx, right?
“some ports don't have a function (e.g. Ethernet ports)” does not seem exist yet? For now devlink port instance of FLAVOUR_PHYSICAL represents both PF and Ethernet ports?
.
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 2:46 PM
[..]
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM case? But it might not be the case for VF using in container?
Devlink instance has net namespace attached to it controlled using devlink reload command. So a VF devlink instance can be assigned to a container/process running in a specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet?
switch_id is present for switch ports in [1] and documented in [2].
[1] /sys/class/net/representor_netdev/phys_switch_id. [2] https://www.kernel.org/doc/Documentation/networking/switchdev.txt " Switch ID"
On 2021/6/9 17:38, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 2:46 PM
[..]
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM case? But it might not be the case for VF using in container?
Devlink instance has net namespace attached to it controlled using devlink reload command. So a VF devlink instance can be assigned to a container/process running in a specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Could we create another devlink instance when the net namespace of devlink port instance is changed? It may seems we need to change the net namespace based on devlink port instance instead of devlink instance. This way container case seems be similiar to the VM case?
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet?
switch_id is present for switch ports in [1] and documented in [2].
[1] /sys/class/net/representor_netdev/phys_switch_id. [2] https://www.kernel.org/doc/Documentation/networking/switchdev.txt " Switch ID"
Thanks for info. I suppose we could use "switch_id" to indentify a eswitch since "switch_id is present for switch ports"? Where does the "switch_id" of switch port come from? Is it from FW? Or the driver generated it?
Is there any rule for "switch_id"? Or is it vendor specific?
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 4:35 PM
On 2021/6/9 17:38, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 2:46 PM
[..]
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM
case?
But it might not be the case for VF using in container?
Devlink instance has net namespace attached to it controlled using devlink
reload command.
So a VF devlink instance can be assigned to a container/process running in a
specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Could we create another devlink instance when the net namespace of devlink port instance is changed?
Net namespace of (a) netdevice (b) rdma device (c) devlink instance can be changed. Net namespace of devlink port cannot be changed.
It may seems we need to change the net namespace based on devlink port instance instead of devlink instance. This way container case seems be similiar to the VM case?
I mostly do not understand the topology you have in mind or if you explained previously I missed the thread. In your case what is the flavour of a devlink port?
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet?
switch_id is present for switch ports in [1] and documented in [2].
[1] /sys/class/net/representor_netdev/phys_switch_id. [2]
https://www.kernel.org/doc/Documentation/networking/switchdev.txt " Switch ID"
Thanks for info. I suppose we could use "switch_id" to indentify a eswitch since "switch_id is present for switch ports"? Where does the "switch_id" of switch port come from? Is it from FW? Or the driver generated it?
Is there any rule for "switch_id"? Or is it vendor specific?
It should be unique enough, usually generated out of board serial id or other fields such as vendor OUI that makes it fairly unique.
On 2021/6/9 19:59, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 4:35 PM
On 2021/6/9 17:38, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 2:46 PM
[..]
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM
case?
But it might not be the case for VF using in container?
Devlink instance has net namespace attached to it controlled using devlink
reload command.
So a VF devlink instance can be assigned to a container/process running in a
specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Could we create another devlink instance when the net namespace of devlink port instance is changed?
Net namespace of (a) netdevice (b) rdma device (c) devlink instance can be changed. Net namespace of devlink port cannot be changed.
Yes, net namespace is changed based on the devlink instance, not devlink port instance, *right now*.
It may seems we need to change the net namespace based on devlink port instance instead of devlink instance. This way container case seems be similiar to the VM case?
I mostly do not understand the topology you have in mind or if you explained previously I missed the thread. In your case what is the flavour of a devlink port?
flavour of the devlink port instance is FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL.
The reason I suggest to change the net namespace on devlink port instance instead of devlink instance is: I proposed that all the PF and VF in the same ASIC are registered to the same devlink instance as flavour FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL when there are in the same host and in the same net namespace.
If a VF's devlink port instance is unregistered from old devlink instance in the old net namespace and registered to new devlink instance in the new net namespace(create a new devlink instance if needed) when devlink port instance's net namespace is changed, then the security mentioned by jakub is not a issue any more?
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet?
switch_id is present for switch ports in [1] and documented in [2].
[1] /sys/class/net/representor_netdev/phys_switch_id. [2]
https://www.kernel.org/doc/Documentation/networking/switchdev.txt " Switch ID"
Thanks for info. I suppose we could use "switch_id" to indentify a eswitch since "switch_id is present for switch ports"? Where does the "switch_id" of switch port come from? Is it from FW? Or the driver generated it?
Is there any rule for "switch_id"? Or is it vendor specific?
It should be unique enough, usually generated out of board serial id or other fields such as vendor OUI that makes it fairly unique.
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 6:00 PM
On 2021/6/9 19:59, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 4:35 PM
On 2021/6/9 17:38, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 2:46 PM
[..]
> Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM
case?
But it might not be the case for VF using in container?
Devlink instance has net namespace attached to it controlled using devlink
reload command.
So a VF devlink instance can be assigned to a container/process running in a
specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Could we create another devlink instance when the net namespace of devlink port instance is changed?
Net namespace of (a) netdevice (b) rdma device (c) devlink instance can be
changed.
Net namespace of devlink port cannot be changed.
Yes, net namespace is changed based on the devlink instance, not devlink port instance, *right now*.
It may seems we need to change the net namespace based on devlink port instance instead of devlink instance. This way container case seems be similiar to the VM case?
I mostly do not understand the topology you have in mind or if you
explained previously I missed the thread.
In your case what is the flavour of a devlink port?
flavour of the devlink port instance is FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL.
The reason I suggest to change the net namespace on devlink port instance instead of devlink instance is: I proposed that all the PF and VF in the same ASIC are registered to the same devlink instance as flavour FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL when there are in the same host and in the same net namespace.
If a VF's devlink port instance is unregistered from old devlink instance in the old net namespace and registered to new devlink instance in the new net namespace(create a new devlink instance if needed) when devlink port instance's net namespace is changed, then the security mentioned by jakub is not a issue any more?
It seems that devlink instance of VF is not needed in your case, and if so what is the motivation to even have VIRTUAL port attach to the PF? If only netdevice of the VF is of interest, it can be assigned to net namespace directly.
It doesn’t make sense to me to create new devlink instance in new net namespace, that also needs to be deleted when net ns is deleted. And pre_exit() routine will mostly deadlock holding global devlink_mutex.
On 2021/6/9 21:45, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 6:00 PM
On 2021/6/9 19:59, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 4:35 PM
On 2021/6/9 17:38, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 2:46 PM
[..]
>> Is there any reason why VF use its own devlink instance? > > Primary use case for VFs is virtual environments where guest isn't > trusted, so tying the VF to the main devlink instance, over which > guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM
case?
But it might not be the case for VF using in container?
Devlink instance has net namespace attached to it controlled using devlink
reload command.
So a VF devlink instance can be assigned to a container/process running in a
specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Could we create another devlink instance when the net namespace of devlink port instance is changed?
Net namespace of (a) netdevice (b) rdma device (c) devlink instance can be
changed.
Net namespace of devlink port cannot be changed.
Yes, net namespace is changed based on the devlink instance, not devlink port instance, *right now*.
It may seems we need to change the net namespace based on devlink port instance instead of devlink instance. This way container case seems be similiar to the VM case?
I mostly do not understand the topology you have in mind or if you
explained previously I missed the thread.
In your case what is the flavour of a devlink port?
flavour of the devlink port instance is FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL.
The reason I suggest to change the net namespace on devlink port instance instead of devlink instance is: I proposed that all the PF and VF in the same ASIC are registered to the same devlink instance as flavour FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL when there are in the same host and in the same net namespace.
If a VF's devlink port instance is unregistered from old devlink instance in the old net namespace and registered to new devlink instance in the new net namespace(create a new devlink instance if needed) when devlink port instance's net namespace is changed, then the security mentioned by jakub is not a issue any more?
It seems that devlink instance of VF is not needed in your case, and if so what is the motivation to even have VIRTUAL port attach to the PF?
The devlink instance is mainly used to hold the devlink port instance of VF if there is only one VF in vm, we might still need to have param/health specific to the VF to registered to the devlink port instance of that VF.
If only netdevice of the VF is of interest, it can be assigned to net namespace directly.
I think that is another option, if there is nothing in the devlink port instance specific to VF that need exposing to the user in another net namespace.
It doesn’t make sense to me to create new devlink instance in new net namespace, that also needs to be deleted when net ns is deleted. And pre_exit() routine will mostly deadlock holding global devlink_mutex.
Would you be more specific why there is deadlock? It seems more of implementation detail, which we can discuss later when we are agreed it is the right way to go down deeper?
From: Yunsheng Lin linyunsheng@huawei.com Sent: Thursday, June 10, 2021 12:34 PM
On 2021/6/9 21:45, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 6:00 PM
On 2021/6/9 19:59, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 4:35 PM
On 2021/6/9 17:38, Parav Pandit wrote:
> From: Yunsheng Lin linyunsheng@huawei.com > Sent: Wednesday, June 9, 2021 2:46 PM > [..]
>>> Is there any reason why VF use its own devlink instance? >> >> Primary use case for VFs is virtual environments where guest >> isn't trusted, so tying the VF to the main devlink instance, >> over which guest should have no control is counter productive. > > The security is mainly about VF using in container case, right? > Because VF using in VM, it is different host, it means a > different devlink instance for VF, so there is no security issue > for VF using in VM
case?
> But it might not be the case for VF using in container? Devlink instance has net namespace attached to it controlled using devlink
reload command.
So a VF devlink instance can be assigned to a container/process running in a
specific net namespace.
$ ip netns add n1 $ devlink dev reload pci/0000:06:00.4 netns n1 ^^^^^^^^^^^^^ PCI VF/PF/SF.
Could we create another devlink instance when the net namespace of devlink port instance is changed?
Net namespace of (a) netdevice (b) rdma device (c) devlink instance can be
changed.
Net namespace of devlink port cannot be changed.
Yes, net namespace is changed based on the devlink instance, not devlink port instance, *right now*.
It may seems we need to change the net namespace based on devlink port instance instead of devlink instance. This way container case seems be similiar to the VM case?
I mostly do not understand the topology you have in mind or if you
explained previously I missed the thread.
In your case what is the flavour of a devlink port?
flavour of the devlink port instance is FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL.
The reason I suggest to change the net namespace on devlink port instance instead of devlink instance is: I proposed that all the PF and VF in the same ASIC are registered to the same devlink instance as flavour FLAVOUR_PHYSICAL or FLAVOUR_VIRTUAL when there are in the same host and in the same net
namespace.
If a VF's devlink port instance is unregistered from old devlink instance in the old net namespace and registered to new devlink instance in the new net namespace(create a new devlink instance if needed) when devlink port instance's net namespace is changed, then the security mentioned by jakub is not a issue any more?
It seems that devlink instance of VF is not needed in your case, and if so
what is the motivation to even have VIRTUAL port attach to the PF?
The devlink instance is mainly used to hold the devlink port instance of VF if there is only one VF in vm, we might still need to have param/health specific to the VF to registered to the devlink port instance of that VF.
This will cover things uniformly with/without container or VM.
If only netdevice of the VF is of interest, it can be assigned to net
namespace directly.
I think that is another option, if there is nothing in the devlink port instance specific to VF that need exposing to the user in another net namespace.
Yes. no need for devlink instance or devlink port.
It doesn’t make sense to me to create new devlink instance in new net
namespace, that also needs to be deleted when net ns is deleted.
And pre_exit() routine will mostly deadlock holding global devlink_mutex.
Would you be more specific why there is deadlock?
Net namespace exit routine cannot invoke a devlink API that demands acquiring devlink global mutex.
On Wed, 9 Jun 2021 17:16:06 +0800 Yunsheng Lin wrote:
On 2021/6/9 1:29, Jakub Kicinski wrote:
On Tue, 8 Jun 2021 20:10:37 +0800 Yunsheng Lin wrote:
Afer discussion with Parav in other thread, I undersood it was the current practice, but I am not sure I understand why it is current *best* practice.
If we allow all PF of a ASCI to register to the same devlink instance, does it not make sense that all VF under one PF also register to the same devlink instance that it's PF is registering to when they are in the same host?
For eswitch legacy mode, whether VF and PF are the same host or not, the VF can also provide the serial number of a ASIC to register to the devlink instance, if that devlink instance does not exist yet, just create that devlink instance according to the serial number, just like PF does.
For eswitch DEVLINK_ESWITCH_MODE_SWITCHDEV mode, the flavour type for devlink port instance representing the netdev of VF function is FLAVOUR_VIRTUAL, the flavour type for devlink port instance representing the representor netdev of VF is FLAVOUR_PCI_VF, which are different type, so they can register to the same devlink instance even when both of the devlink port instance is in the same host?
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM case? But it might not be the case for VF using in container?
How do you differentiate from the device perspective VF being assigned to the host vs VM? Presumably PFs and VFs have a similar API to talk to the FW, if VF can "join" the devlink instance of the PF that'd suggest to me it has access to privileged FW commands.
Also I read about the devlink disscusion betwwen you and jiri in [1]: "I think we agree that all objects of an ASIC should be under one devlink instance, the question remains whether both ends of the pipe for PCI devices (subdevs or not) should appear under ports or does the "far end" (from ASICs perspective)/"host end" get its own category."
I am not sure if there is already any conclusion about the latter part (I did not find the conclusion in that thread)?
"far end" (from ASICs perspective)/"host end" means PF/VF, right? Which seems to correspond to port flavor of FLAVOUR_PHYSICAL and FLAVOUR_VIRTUAL if we try to represent PF/VF using devlink port instance?
No, no, PHYSICAL is a physical port on the adapter, like an SFP port. There wasn't any conclusion to that discussion. Mellanox views devlink ports as eswitch ports, I view them as device ports which is hard to reconcile.
It seems the conclusion is very important to our disscusion in this thread, as we are trying to represent PF/VF as devlink port instance in this thread(at least that is what I think, hns3 does not support eswitch SWITCHDEV mode yet).
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet? pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
I am not sure I understand what does it mean by "devlink instances with multiple names"?
Does that mean whenever a devlink port instance is registered to a devlink instance, that devlink instance get a new name according to the PCI device which the just registered devlink port instance corresponds to?
Not devlink port, new PCI device. Multiple ports may reside on the same PCI function, some ports don't have a function (e.g. Ethernet ports).
Multiple ports on the same mainly PCI function means subfunction from mlx, right?
Not necessarily, there are older devices out there (older NFPs, mlx4) which have one PF which is logically divided by the driver to service multiple ports.
“some ports don't have a function (e.g. Ethernet ports)” does not seem exist yet? For now devlink port instance of FLAVOUR_PHYSICAL represents both PF and Ethernet ports?
It does. I think Mellanox cards are incapable of divorcing PFs from Ethernet ports, but the NFP driver represents the Ethernet port/SFP as one netdev and devlink port (PHYSICAL) and the host port by another netdev and devlink port (PCI_PF). Which allows forwarding frames between PFs and between Ethernet ports directly (again, something not supported efficiently by simpler cards, but supported by NFPs).
On 2021/6/10 0:40, Jakub Kicinski wrote:
On Wed, 9 Jun 2021 17:16:06 +0800 Yunsheng Lin wrote:
On 2021/6/9 1:29, Jakub Kicinski wrote:
On Tue, 8 Jun 2021 20:10:37 +0800 Yunsheng Lin wrote:
Afer discussion with Parav in other thread, I undersood it was the current practice, but I am not sure I understand why it is current *best* practice.
If we allow all PF of a ASCI to register to the same devlink instance, does it not make sense that all VF under one PF also register to the same devlink instance that it's PF is registering to when they are in the same host?
For eswitch legacy mode, whether VF and PF are the same host or not, the VF can also provide the serial number of a ASIC to register to the devlink instance, if that devlink instance does not exist yet, just create that devlink instance according to the serial number, just like PF does.
For eswitch DEVLINK_ESWITCH_MODE_SWITCHDEV mode, the flavour type for devlink port instance representing the netdev of VF function is FLAVOUR_VIRTUAL, the flavour type for devlink port instance representing the representor netdev of VF is FLAVOUR_PCI_VF, which are different type, so they can register to the same devlink instance even when both of the devlink port instance is in the same host?
Is there any reason why VF use its own devlink instance?
Primary use case for VFs is virtual environments where guest isn't trusted, so tying the VF to the main devlink instance, over which guest should have no control is counter productive.
The security is mainly about VF using in container case, right? Because VF using in VM, it is different host, it means a different devlink instance for VF, so there is no security issue for VF using in VM case? But it might not be the case for VF using in container?
How do you differentiate from the device perspective VF being assigned to the host vs VM? Presumably PFs and VFs have a similar API to talk to the FW, if VF can "join" the devlink instance of the PF that'd suggest to me it has access to privileged FW commands.
I was thinking info/param/health that is specfic to a VF is only registered to the devlink port instance of that VF, same for resource that is specific to PF. And it seems the param is already able to registered based on devlink instance(devlink_params_register()) or based on devlink port instance( devlink_port_params_register()).
Only PF will register privileged common resource based on devlink instance, we may need to ensure only one PF register the privileged common resource (maybe the PF probed first do the the privileged common resource registering, I am not sure how to ensure that or implement it yet).
When user access the common resource in devlink instance, I think it is ok to pass it through one of the PF(suppose all PF is in the same privilege level)?
When user access the resource in devlink port instance of PHYSICAL/VIRTUAL, the access is through the specific function(PF/VF) corresponds to that devlink port instance?
When user access the resource in devlink port instance of PCI_PF/PCI_VF/PCI_SF, the access is through the function where the eswitch is located?
so if a devlink instance only have devlink port instance of VF, that devlink instance has not privileged common resource registered, so the user is not able to access the privileged common resource?
If the PF and VF is in the same host and in the same net namespace, I suppose it is ok to have the PF and VF to share the same devlink instance with the privileged common resource registered?
Also I read about the devlink disscusion betwwen you and jiri in [1]: "I think we agree that all objects of an ASIC should be under one devlink instance, the question remains whether both ends of the pipe for PCI devices (subdevs or not) should appear under ports or does the "far end" (from ASICs perspective)/"host end" get its own category."
I am not sure if there is already any conclusion about the latter part (I did not find the conclusion in that thread)?
"far end" (from ASICs perspective)/"host end" means PF/VF, right? Which seems to correspond to port flavor of FLAVOUR_PHYSICAL and FLAVOUR_VIRTUAL if we try to represent PF/VF using devlink port instance?
No, no, PHYSICAL is a physical port on the adapter, like an SFP port. There wasn't any conclusion to that discussion. Mellanox views devlink ports as eswitch ports, I view them as device ports which is hard to reconcile.
I suppose eswitch ports only exist when DEVLINK_ESWITCH_MODE_SWITCHDEV mode is enabled, right? Does "Mellanox views devlink ports as eswitch ports" means mlx driver will not create any devlink port instance when DEVLINK_ESWITCH_MODE_LEGACY mode is enabled? It does not seems to be the case any more, because the PF is registered as a devlink port instance of FLAVOUR_PHYSICAL and VF is registered as a devlink port instance of FLAVOUR__VIRTUAL in mlx5e_devlink_port_register(), unless mlx5e_devlink_port_register() is only called in SWITCHDEV mode too.
From discussion in other thread with parav in [1], it seems: 1. Whenever there is a pcie function(PF/VF, maybe SF too?), there is a devlink instance corresponds to that pcie function. 2. Whenever there is a netdev(netdev of PF/VF, or representor netdev), there is a devlink port instance corresponds to that netdev.
It seems we only need to change (1) to enable "all objects of an ASIC should be under one devlink instance" as below: Whenever there is a ASIC(or switch), there is a devlink instance corresponds to that ASIC(or switch)?
I am not sure I understand what it means by "device ports"? netdev? or "physical port on the adapter, like an SFP port"? or "pcie function like PF/VF"? Let's suppose it is in MODE_LEGACY mode.
1. https://patchwork.kernel.org/project/netdevbpf/patch/20210603111901.9888-1-p...
It seems the conclusion is very important to our disscusion in this thread, as we are trying to represent PF/VF as devlink port instance in this thread(at least that is what I think, hns3 does not support eswitch SWITCHDEV mode yet).
Also, there is a "switch_id" concept from jiri's example, which seems to be not implemented yet? pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
I am not sure I understand what does it mean by "devlink instances with multiple names"?
Does that mean whenever a devlink port instance is registered to a devlink instance, that devlink instance get a new name according to the PCI device which the just registered devlink port instance corresponds to?
Not devlink port, new PCI device. Multiple ports may reside on the same PCI function, some ports don't have a function (e.g. Ethernet ports).
Multiple ports on the same mainly PCI function means subfunction from mlx, right?
Not necessarily, there are older devices out there (older NFPs, mlx4) which have one PF which is logically divided by the driver to service multiple ports.
“some ports don't have a function (e.g. Ethernet ports)” does not seem exist yet? For now devlink port instance of FLAVOUR_PHYSICAL represents both PF and Ethernet ports?
It does. I think Mellanox cards are incapable of divorcing PFs from Ethernet ports, but the NFP driver represents the Ethernet port/SFP as one netdev and devlink port (PHYSICAL) and the host port by another netdev and devlink port (PCI_PF). Which allows forwarding frames between PFs and between Ethernet ports directly (again, something not supported efficiently by simpler cards, but supported by NFPs).
If "Whenever there is a netdev(netdev of PF/VF, or representor netdev), there is a devlink port instance corresponds to that netdev." rule apply to the above case, as there is one netdev for PF and one netdev for Ethernet port, then we have two devlink port instance too, one for netdev of PF, one for the netdev of Ethernet port, which is different from Mellanox having one netdev for both PF and Ethernet port,hence one devlink port for both PF and Ethernet port.
It seems it is needed to clarify the FLAVOUR_PHYSICAL and FLAVOUR_PCI_PF maybe having different semantic between NFP and Mellanox?
we might need to add another flavour type to indicate the netdev of PF, if FLAVOUR_PHYSICAL indicates netdev of Ethernet port(if that netdev exists) and FLAVOUR_PCI_PF indicates representor netdev of PF, as the comment in definiation of flavour type:
DEVLINK_PORT_FLAVOUR_PHYSICAL, /* Any kind of a port physically * facing the user. */
DEVLINK_PORT_FLAVOUR_PCI_PF, /* Represents eswitch port for * the PCI PF. It is an internal * port that faces the PCI PF. */
.
From: Yunsheng Lin linyunsheng@huawei.com Sent: Tuesday, June 8, 2021 5:41 PM
Is there any reason why VF use its own devlink instance?
Because devlink instance gives the ability for the VF and SF to control itself. (a) device parameters (devlink dev param show) (b) resources of the device (c) health reporters (d) reload in net ns
There knobs (a) to (c) etc are not for the hypervisor to control. These are mainly for the VF/SF users to manage its own device.
On 2021/6/9 17:52, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Tuesday, June 8, 2021 5:41 PM
Is there any reason why VF use its own devlink instance?
Because devlink instance gives the ability for the VF and SF to control itself. (a) device parameters (devlink dev param show) (b) resources of the device (c) health reporters (d) reload in net ns
There knobs (a) to (c) etc are not for the hypervisor to control. These are mainly for the VF/SF users to manage its own device.
Do we need to disable user from changing the net ns in a container?
From: Yunsheng Lin linyunsheng@huawei.com Sent: Wednesday, June 9, 2021 4:47 PM
On 2021/6/9 17:52, Parav Pandit wrote:
From: Yunsheng Lin linyunsheng@huawei.com Sent: Tuesday, June 8, 2021 5:41 PM
Is there any reason why VF use its own devlink instance?
Because devlink instance gives the ability for the VF and SF to control itself. (a) device parameters (devlink dev param show) (b) resources of the device (c) health reporters (d) reload in net ns
There knobs (a) to (c) etc are not for the hypervisor to control. These are
mainly for the VF/SF users to manage its own device.
Do we need to disable user from changing the net ns in a container?
It is not the role of the hw/vendor driver to disable it. Process capabilities such as NET_ADMIN etc take care of it.