This is a reboot of my original series to address the problem of drivers for legacy ISA devices accessing unmapped IO port regions on arm64 systems and causing the system to crash.
There was another recent report of such an issue [0], and some old ones [1] and [2] for reference.
The background is that many systems do not include PCI host controllers, or they do and controller probe may have failed. For these cases, no IO ports are mapped. However, loading drivers for legacy ISA devices can crash the system as there is nothing to stop them accessing those IO ports (which have not been io remap'ed).
My original solution tried to keep the kernel alive in these situations by rejecting logical PIO access to PCI IO regions until PCI IO port regions have been mapped.
This series goes one step further, by just reserving the complete legacy IO port range in 0x0--0xffff for arm64. The motivation for doing this is to make the request_region() calls for those drivers fail, like this:
root@ubuntu:/home/john# insmod mk712.ko [ 3415.575800] mk712: unable to get IO region insmod: ERROR: could not insert module mk712.ko: No such device
Otherwise, in theory, those drivers could initiate rogue accesses to mapped IO port regions for other devices and cause corruptions or side-effects. Indeed, those drivers should not be allowed to access IO ports at all in such a system.
As a secondary defence, for broken drivers who do not call request_region(), IO port accesses in range 0--0xffff will be ignored, again preserving the system.
I am sending as an RFC as I am not sure of any problem with reserving first 0x10000 of IO space like this. There is reserve= commandline argument, which does allow this already.
For reference, here's how /proc/ioports looks on my arm64 system with this change:
root@ubuntu:/home/john# more /proc/ioports 00010000-0001ffff : PCI Bus 0002:f8 00010000-00010fff : PCI Bus 0002:f9 00010000-00010007 : 0002:f9:00.0 00010000-00010007 : serial 00010008-0001000f : 0002:f9:00.1 00010008-0001000f : serial 00010010-00010017 : 0002:f9:00.2 00010018-0001001f : 0002:f9:00.2 00020000-0002ffff : PCI Bus 0004:88 00030000-0003ffff : PCI Bus 0005:78 00040000-0004ffff : PCI Bus 0006:c0 00050000-0005ffff : PCI Bus 0007:90 00060000-0006ffff : PCI Bus 000a:10 00070000-0007ffff : PCI Bus 000c:20 00080000-0008ffff : PCI Bus 000d:30
[0] https://lore.kernel.org/linux-input/20210112055129.7840-1-song.bao.hua@hisil... [1] https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com [2] https://lore.kernel.org/linux-arm-kernel/e6995b4a-184a-d8d4-f4d4-9ce75d8f47c...
Difference since v4: https://lore.kernel.org/linux-pci/1560262374-67875-1-git-send-email-john.gar... - Reserve legacy ISA region
John Garry (4): arm64: io: Introduce IO_SPACE_BASE asm-generic/io.h: Add IO_SPACE_BASE kernel/resource: Make ioport_resource.start configurable logic_pio: Warn on and discard accesses to addresses below IO_SPACE_BASE
arch/arm64/include/asm/io.h | 1 + include/asm-generic/io.h | 4 ++++ include/linux/logic_pio.h | 5 +++++ kernel/resource.c | 2 +- lib/logic_pio.c | 20 ++++++++++++++------ 5 files changed, 25 insertions(+), 7 deletions(-)
Introduce IO_SPACE_BASE, which is the base address of the IO port region.
0x10000 is chosen intentionally to exclude legacy ISA IO port address range.
Signed-off-by: John Garry john.garry@huawei.com --- arch/arm64/include/asm/io.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index fd172c41df90..2bcf55704bee 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -148,6 +148,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr) #define arch_has_dev_port() (1) #define IO_SPACE_LIMIT (PCI_IO_SIZE - 1) #define PCI_IOBASE ((void __iomem *)PCI_IO_START) +#define IO_SPACE_BASE (0x10000)
/* * String version of I/O memory access operations.
Add a fallback default for IO_SPACE_BASE (at 0) for when an architecture does not define a value.
Signed-off-by: John Garry john.garry@huawei.com --- include/asm-generic/io.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index 9ea83d80eb6f..4d74a0c696ca 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -448,6 +448,10 @@ static inline void writesq(volatile void __iomem *addr, const void *buffer, #define IO_SPACE_LIMIT 0xffff #endif
+#ifndef IO_SPACE_BASE +#define IO_SPACE_BASE 0x0 +#endif + /* * {in,out}{b,w,l}() access little endian I/O. {in,out}{b,w,l}_p() can be * implemented on hardware that needs an additional delay for I/O accesses to
Make IO space base address to be configurable through IO_SPACE_BASE.
This will allow architectures which do not natively support IO ports - like arm64 - to harden against legacy ISA-based drivers which use hardcoded addresses to access IO ports.
Any attempts for these drivers to request a resource region will now fail for architectures with set IO_SPACE_BASE above legacy ISA IO port region (0xffff).
Signed-off-by: John Garry john.garry@huawei.com --- kernel/resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/resource.c b/kernel/resource.c index 3ae2f56cc79d..d191c4d796c7 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -29,7 +29,7 @@
struct resource ioport_resource = { .name = "PCI IO", - .start = 0, + .start = IO_SPACE_BASE, .end = IO_SPACE_LIMIT, .flags = IORESOURCE_IO, };
Start the PCI MMIO region at IO_SPACE_BASE, and warn on any accesses below that address. Those accesses are also discarded.
This is only for CONFIG_INDIRECT_PIO currently, and support can be added later for !CONFIG_INDIRECT_PIO.
Signed-off-by: John Garry john.garry@huawei.com --- include/linux/logic_pio.h | 5 +++++ lib/logic_pio.c | 20 ++++++++++++++------ 2 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/include/linux/logic_pio.h b/include/linux/logic_pio.h index 54945aa824b4..425369f2ddd5 100644 --- a/include/linux/logic_pio.h +++ b/include/linux/logic_pio.h @@ -111,7 +111,12 @@ void logic_outsl(unsigned long addr, const void *buffer, unsigned int count); #else #define PIO_INDIRECT_SIZE 0 #endif /* CONFIG_INDIRECT_PIO */ + #define MMIO_UPPER_LIMIT (IO_SPACE_LIMIT - PIO_INDIRECT_SIZE) +#define MMIO_LOWER_LIMIT IO_SPACE_BASE +#if MMIO_LOWER_LIMIT >= MMIO_UPPER_LIMIT +#error MMIO_UPPPER_LIMIT should be above MMIO_LOWER_LIMIT +#endif
struct logic_pio_hwaddr *find_io_range_by_fwnode(struct fwnode_handle *fwnode); unsigned long logic_pio_trans_hwaddr(struct fwnode_handle *fwnode, diff --git a/lib/logic_pio.c b/lib/logic_pio.c index f32fe481b492..cbb12260ede6 100644 --- a/lib/logic_pio.c +++ b/lib/logic_pio.c @@ -36,7 +36,7 @@ int logic_pio_register_range(struct logic_pio_hwaddr *new_range) struct logic_pio_hwaddr *range; resource_size_t start; resource_size_t end; - resource_size_t mmio_end = 0; + resource_size_t mmio_end = MMIO_LOWER_LIMIT; resource_size_t iio_sz = MMIO_UPPER_LIMIT; int ret = 0;
@@ -234,7 +234,9 @@ type logic_in##bwl(unsigned long addr) \ { \ type ret = (type)~0; \ \ - if (addr < MMIO_UPPER_LIMIT) { \ + if (addr < MMIO_LOWER_LIMIT) { \ + WARN_ON_ONCE(1); \ + } else if (addr < MMIO_UPPER_LIMIT) { \ ret = _in##bwl(addr); \ } else if (addr >= MMIO_UPPER_LIMIT && addr < IO_SPACE_LIMIT) { \ struct logic_pio_hwaddr *entry = find_io_range(addr); \ @@ -250,8 +252,10 @@ type logic_in##bwl(unsigned long addr) \ \ void logic_out##bwl(type value, unsigned long addr) \ { \ - if (addr < MMIO_UPPER_LIMIT) { \ - _out##bwl(value, addr); \ + if (addr < MMIO_LOWER_LIMIT) { \ + WARN_ON_ONCE(1); \ + } else if (addr < MMIO_UPPER_LIMIT) { \ + _out##bwl(value, addr); \ } else if (addr >= MMIO_UPPER_LIMIT && addr < IO_SPACE_LIMIT) { \ struct logic_pio_hwaddr *entry = find_io_range(addr); \ \ @@ -266,7 +270,9 @@ void logic_out##bwl(type value, unsigned long addr) \ void logic_ins##bwl(unsigned long addr, void *buffer, \ unsigned int count) \ { \ - if (addr < MMIO_UPPER_LIMIT) { \ + if (addr < MMIO_LOWER_LIMIT) { \ + WARN_ON_ONCE(1); \ + } else if (addr < MMIO_UPPER_LIMIT) { \ reads##bwl(PCI_IOBASE + addr, buffer, count); \ } else if (addr >= MMIO_UPPER_LIMIT && addr < IO_SPACE_LIMIT) { \ struct logic_pio_hwaddr *entry = find_io_range(addr); \ @@ -283,7 +289,9 @@ void logic_ins##bwl(unsigned long addr, void *buffer, \ void logic_outs##bwl(unsigned long addr, const void *buffer, \ unsigned int count) \ { \ - if (addr < MMIO_UPPER_LIMIT) { \ + if (addr < MMIO_LOWER_LIMIT) { \ + WARN_ON_ONCE(1); \ + } else if (addr < MMIO_UPPER_LIMIT) { \ writes##bwl(PCI_IOBASE + addr, buffer, count); \ } else if (addr >= MMIO_UPPER_LIMIT && addr < IO_SPACE_LIMIT) { \ struct logic_pio_hwaddr *entry = find_io_range(addr); \
在 2021/1/16 上午12:58, John Garry 写道:
This is a reboot of my original series to address the problem of drivers for legacy ISA devices accessing unmapped IO port regions on arm64 systems and causing the system to crash.
There was another recent report of such an issue [0], and some old ones [1] and [2] for reference.
The background is that many systems do not include PCI host controllers, or they do and controller probe may have failed. For these cases, no IO ports are mapped. However, loading drivers for legacy ISA devices can crash the system as there is nothing to stop them accessing those IO ports (which have not been io remap'ed).
My original solution tried to keep the kernel alive in these situations by rejecting logical PIO access to PCI IO regions until PCI IO port regions have been mapped.
This series goes one step further, by just reserving the complete legacy IO port range in 0x0--0xffff for arm64. The motivation for doing this is to make the request_region() calls for those drivers fail, like this:
root@ubuntu:/home/john# insmod mk712.ko [ 3415.575800] mk712: unable to get IO region insmod: ERROR: could not insert module mk712.ko: No such device
Otherwise, in theory, those drivers could initiate rogue accesses to mapped IO port regions for other devices and cause corruptions or side-effects. Indeed, those drivers should not be allowed to access IO ports at all in such a system.
As a secondary defence, for broken drivers who do not call request_region(), IO port accesses in range 0--0xffff will be ignored, again preserving the system.
I am sending as an RFC as I am not sure of any problem with reserving first 0x10000 of IO space like this. There is reserve= commandline argument, which does allow this already.
Hi John,
Is it ok with ACPI? I'm not really familiar with ACPI on arm64 but my impression is ACPI would use legacy I/O ports to communicate with kbd controller, EC and power management facilities.
We'd better have a method to detect if ISA bus is not present on the system instead of reserve them unconditionally.
Thanks.
- Jiaxun
For reference, here's how /proc/ioports looks on my arm64 system with this change:
root@ubuntu:/home/john# more /proc/ioports 00010000-0001ffff : PCI Bus 0002:f8 00010000-00010fff : PCI Bus 0002:f9 00010000-00010007 : 0002:f9:00.0 00010000-00010007 : serial 00010008-0001000f : 0002:f9:00.1 00010008-0001000f : serial 00010010-00010017 : 0002:f9:00.2 00010018-0001001f : 0002:f9:00.2 00020000-0002ffff : PCI Bus 0004:88 00030000-0003ffff : PCI Bus 0005:78 00040000-0004ffff : PCI Bus 0006:c0 00050000-0005ffff : PCI Bus 0007:90 00060000-0006ffff : PCI Bus 000a:10 00070000-0007ffff : PCI Bus 000c:20 00080000-0008ffff : PCI Bus 000d:30
[0] https://lore.kernel.org/linux-input/20210112055129.7840-1-song.bao.hua@hisil... [1] https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com [2] https://lore.kernel.org/linux-arm-kernel/e6995b4a-184a-d8d4-f4d4-9ce75d8f47c...
Difference since v4: https://lore.kernel.org/linux-pci/1560262374-67875-1-git-send-email-john.gar...
- Reserve legacy ISA region
John Garry (4): arm64: io: Introduce IO_SPACE_BASE asm-generic/io.h: Add IO_SPACE_BASE kernel/resource: Make ioport_resource.start configurable logic_pio: Warn on and discard accesses to addresses below IO_SPACE_BASE
arch/arm64/include/asm/io.h | 1 + include/asm-generic/io.h | 4 ++++ include/linux/logic_pio.h | 5 +++++ kernel/resource.c | 2 +- lib/logic_pio.c | 20 ++++++++++++++------ 5 files changed, 25 insertions(+), 7 deletions(-)
On 18/01/2021 01:59, Jiaxun Yang wrote:
在 2021/1/16 上午12:58, John Garry 写道:
This is a reboot of my original series to address the problem of drivers for legacy ISA devices accessing unmapped IO port regions on arm64 systems and causing the system to crash.
There was another recent report of such an issue [0], and some old ones [1] and [2] for reference.
The background is that many systems do not include PCI host controllers, or they do and controller probe may have failed. For these cases, no IO ports are mapped. However, loading drivers for legacy ISA devices can crash the system as there is nothing to stop them accessing those IO ports (which have not been io remap'ed).
My original solution tried to keep the kernel alive in these situations by rejecting logical PIO access to PCI IO regions until PCI IO port regions have been mapped.
This series goes one step further, by just reserving the complete legacy IO port range in 0x0--0xffff for arm64. The motivation for doing this is to make the request_region() calls for those drivers fail, like this:
root@ubuntu:/home/john# insmod mk712.ko [ 3415.575800] mk712: unable to get IO region insmod: ERROR: could not insert module mk712.ko: No such device
Otherwise, in theory, those drivers could initiate rogue accesses to mapped IO port regions for other devices and cause corruptions or side-effects. Indeed, those drivers should not be allowed to access IO ports at all in such a system.
As a secondary defence, for broken drivers who do not call request_region(), IO port accesses in range 0--0xffff will be ignored, again preserving the system.
I am sending as an RFC as I am not sure of any problem with reserving first 0x10000 of IO space like this. There is reserve= commandline argument, which does allow this already.
Hi Jiaxun,
Is it ok with ACPI? I'm not really familiar with ACPI on arm64 but my impression is ACPI would use legacy I/O ports to communicate with kbd controller, EC and power management facilities.
I tested for ACPI. As far as I'm concerned, fixed IO ports should not be used on arm64 systems.
Indeed, ACPI spec says IO port addresses should be CPU addressable, and it is the job of the kernel to io remap those correctly for systems which do not support IO ports natively, i.e. those that define PCI_IOBASE.
We'd better have a method to detect if ISA bus is not present on the system instead of reserve them unconditionally.
For ISA bus or any IO ports region, they would/should be behind PCI host bridge or modeled as INDIRECT IO host and we should allocate logic PIO region for them, and there should be no assumption on the IO port address in drivers, i.e. not fixed.
Thanks, John
For reference, here's how /proc/ioports looks on my arm64 system with this change:
root@ubuntu:/home/john# more /proc/ioports 00010000-0001ffff : PCI Bus 0002:f8 00010000-00010fff : PCI Bus 0002:f9 00010000-00010007 : 0002:f9:00.0 00010000-00010007 : serial 00010008-0001000f : 0002:f9:00.1 00010008-0001000f : serial 00010010-00010017 : 0002:f9:00.2 00010018-0001001f : 0002:f9:00.2 00020000-0002ffff : PCI Bus 0004:88 00030000-0003ffff : PCI Bus 0005:78 00040000-0004ffff : PCI Bus 0006:c0 00050000-0005ffff : PCI Bus 0007:90 00060000-0006ffff : PCI Bus 000a:10 00070000-0007ffff : PCI Bus 000c:20 00080000-0008ffff : PCI Bus 000d:30
[0] https://lore.kernel.org/linux-input/20210112055129.7840-1-song.bao.hua@hisil...
[1] https://lore.kernel.org/linux-pci/56F209A9.4040304@huawei.com [2] https://lore.kernel.org/linux-arm-kernel/e6995b4a-184a-d8d4-f4d4-9ce75d8f47c...
Difference since v4: https://lore.kernel.org/linux-pci/1560262374-67875-1-git-send-email-john.gar...
- Reserve legacy ISA region
John Garry (4): arm64: io: Introduce IO_SPACE_BASE asm-generic/io.h: Add IO_SPACE_BASE kernel/resource: Make ioport_resource.start configurable logic_pio: Warn on and discard accesses to addresses below IO_SPACE_BASE
arch/arm64/include/asm/io.h | 1 + include/asm-generic/io.h | 4 ++++ include/linux/logic_pio.h | 5 +++++ kernel/resource.c | 2 +- lib/logic_pio.c | 20 ++++++++++++++------ 5 files changed, 25 insertions(+), 7 deletions(-)
.
On Fri, Jan 15, 2021 at 5:58 PM John Garry john.garry@huawei.com wrote:
This is a reboot of my original series to address the problem of drivers for legacy ISA devices accessing unmapped IO port regions on arm64 systems and causing the system to crash.
There was another recent report of such an issue [0], and some old ones [1] and [2] for reference.
The background is that many systems do not include PCI host controllers, or they do and controller probe may have failed. For these cases, no IO ports are mapped. However, loading drivers for legacy ISA devices can crash the system as there is nothing to stop them accessing those IO ports (which have not been io remap'ed).
My original solution tried to keep the kernel alive in these situations by rejecting logical PIO access to PCI IO regions until PCI IO port regions have been mapped.
This series goes one step further, by just reserving the complete legacy IO port range in 0x0--0xffff for arm64. The motivation for doing this is to make the request_region() calls for those drivers fail, like this:
root@ubuntu:/home/john# insmod mk712.ko [ 3415.575800] mk712: unable to get IO region insmod: ERROR: could not insert module mk712.ko: No such device
Otherwise, in theory, those drivers could initiate rogue accesses to mapped IO port regions for other devices and cause corruptions or side-effects. Indeed, those drivers should not be allowed to access IO ports at all in such a system.
As a secondary defence, for broken drivers who do not call request_region(), IO port accesses in range 0--0xffff will be ignored, again preserving the system.
I am sending as an RFC as I am not sure of any problem with reserving first 0x10000 of IO space like this. There is reserve= commandline argument, which does allow this already.
For reference, here's how /proc/ioports looks on my arm64 system with this change:
root@ubuntu:/home/john# more /proc/ioports 00010000-0001ffff : PCI Bus 0002:f8 00010000-00010fff : PCI Bus 0002:f9 00010000-00010007 : 0002:f9:00.0 00010000-00010007 : serial 00010008-0001000f : 0002:f9:00.1 00010008-0001000f : serial 00010010-00010017 : 0002:f9:00.2 00010018-0001001f : 0002:f9:00.2 00020000-0002ffff : PCI Bus 0004:88 00030000-0003ffff : PCI Bus 0005:78 00040000-0004ffff : PCI Bus 0006:c0 00050000-0005ffff : PCI Bus 0007:90 00060000-0006ffff : PCI Bus 000a:10 00070000-0007ffff : PCI Bus 000c:20 00080000-0008ffff : PCI Bus 000d:30
Doesn't this mean we lose the ability to access PCI devices with legacy ISA compatibility? Most importantly, any GPU today should in theory still support VGA frame buffer mode or text console, but both of these stop working if the low I/O ports are not mapped to the corresponding PCI bus. There is of course already a problem if you have multiple PCI host bridges, and each one gets its own PIO range, which means that only one of them can have an ISA bridge with working PIO behind it.
Another such case would be a BMC that has legacy ISA devices behind a (real or emulated) LPC bus, e.g. a 8250 UART, ps2 keyboard, RTC, or an ATA CDROM. Not sure if any of those are ever used on Arm machines.
Regarding the size of the reservation, does this actually need to cover the 0x0fff...0xffff range or just 0x0000...0x0fff? I don't think there are any drivers that hardcode I/O ports beyond 0x0fff because those would not work on ISA buses but require PCI assigned BARs.
One more thought: There are two common ways in which PCI host bridges map their PIO ports: either each host bridge has its own 0x0...0xffff BAR range but gets remapped to an arbitrary range of port numbers in the kernel, or each host bridge uses a distinct range of port numbers, and the kernel can use a 1:1 mapping between hardware and software port numbers, i.e. the number in the BAR is the same as in the kernel.
If all numbers are shifted by 0x10000, that second case no longer works, and there is always an offset.
Arnd
For reference, here's how /proc/ioports looks on my arm64 system with this change:
root@ubuntu:/home/john# more /proc/ioports 00010000-0001ffff : PCI Bus 0002:f8 00010000-00010fff : PCI Bus 0002:f9 00010000-00010007 : 0002:f9:00.0 00010000-00010007 : serial 00010008-0001000f : 0002:f9:00.1 00010008-0001000f : serial 00010010-00010017 : 0002:f9:00.2 00010018-0001001f : 0002:f9:00.2 00020000-0002ffff : PCI Bus 0004:88 00030000-0003ffff : PCI Bus 0005:78 00040000-0004ffff : PCI Bus 0006:c0 00050000-0005ffff : PCI Bus 0007:90 00060000-0006ffff : PCI Bus 000a:10 00070000-0007ffff : PCI Bus 000c:20 00080000-0008ffff : PCI Bus 000d:30
Hi Arnd,
Doesn't this mean we lose the ability to access PCI devices with legacy ISA compatibility? Most importantly, any GPU today should in theory still support VGA frame buffer mode or text console, but both of these stop working if the low I/O ports are not mapped to the corresponding PCI bus.
Hmmm.. so are you saying that there is an expectation that the kernel PIO region assigned for these devices must start at 0x0? If so, I assume it's because fixed IO ports are used.
There is of course already a problem if you have multiple PCI host bridges, and each one gets its own PIO range, which means that only one of them can have an ISA bridge with working PIO behind it.
The answer to my question, above, seems to be 'yes' now.
Another such case would be a BMC that has legacy ISA devices behind a (real or emulated) LPC bus, e.g. a 8250 UART, ps2 keyboard, RTC, or an ATA CDROM. Not sure if any of those are ever used on Arm machines.
Regarding the size of the reservation, does this actually need to cover the 0x0fff...0xffff range or just 0x0000...0x0fff? I don't think there are any drivers that hardcode I/O ports beyond 0x0fff because those would not work on ISA buses but require PCI assigned BARs.
I just chose the complete legacy IO port range, that being 0x0--0xffff. If there would be no hardcoded ports beyond 0x0fff, then reserving 0x0--0xfff would work.
One more thought: There are two common ways in which PCI host bridges map their PIO ports: either each host bridge has its own 0x0...0xffff BAR range but gets remapped to an arbitrary range of port numbers in the kernel, or each host bridge uses a distinct range of port numbers, and the kernel can use a 1:1 mapping between hardware and software port numbers, i.e. the number in the BAR is the same as in the kernel.
If all numbers are shifted by 0x10000, that second case no longer works, and there is always an offset.
Yes, this change would definitely break the second. But does - or could - anyone use it on arm64? I didn't think that it was possible.
Thanks, John