Recently I bought an odroid-x development board that is powered by an Exynos4412, Quad-core Cortex-A9, processor. It is incredibly cheap, costing well below $200. So it was a perfect fit for my purpose—doing some research (s/research/hacking/g).
For my purpose, I wanted to use hardware performance counter interrupt. Each Cortex-A9 core has six counters, each of which can be configured to collect any of the 58 available events, including instruction and cache miss events. Each counter can be configured to generate interrupts on counter overflow; you can call your software handler at every 1000 cache misses, for example. My recent work utilizes this capability to control the memory access rate of each core.
Since my code is based on perf_event infrastructure in accessing the counters, it should work fine, in theory, without any modification. So, I gave it a try on the board. Unfortunately, however, it did not work in that way (as expected). I had no trouble on compiling kernel and my code on the board because Ubuntu is running directly on the board (Thanks ubuntu and Linaro). The problem arose, however, while running the code. There were two problems: one is Cortex-A9 speicific hardware limitation and the other one is specific Exynos4412 kernel implementation problem.
The first problem, Cortex-A9 specific, was an easy one. My kernel code accesses performance counters through perf_event instrastructure as shown in the following:
struct perf_event_attr sched_perf_hw_attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CACHE_MISSES,
.size = sizeof(struct perf_event_attr),
.pinned = 1,
.disabled = 1,
.exclude_kernel = 1,
};
event_overflow_callback ,NULL
The trouble was the bold line. Because I only want to count events in user mode, I set .exclude_kernel=1 in order to instruct the counter not to count in the kernel mode. Unfortunately, Cortex-A9 does not support mode dependant counting. So the counter object creation failed. The solution in my case was simply not to enable the flag (i.e., .exclude_kernel = 0), because it only means somewhat inaccurate counting. Not good but not a deal breaker.
The second problem, the Exynos4412 specific, was, however, rather difficult one. Although everything was configured correctly through perf_event infrastructure, the processor simply did not generate interrupts. After digging the ARM ARM (Architecture Reference Manaual) and the Exynos 4412 User’s manual, I figured that the kernel from the official odroid-x repository enable a wrong interrupt line. Exynos4412 has a Power Management Unit block whose abbreviation is also PMU. The problem is that core’s Performance Monitoring Unit is also PMU. As it sounds confusing, the kernel developer who wrote the following code seems to be confused as well. Here, “arm-pmu”, the resource for per-core performance monitor, use the IRQ number of Exynos 4412’s power management unit. static struct platform_device s5p_device_pmu = {.name = “arm-pmu”, // resource for Performance Monitoring Unit
.id = ARM_PMU_DEVICE_CPU,
.num_resources = ARRAY_SIZE(s5p_pmu_resource),
.resource = s5p_pmu_resource,
}; static struct resource s5p_pmu_resource[] = {
DEFINE_RES_IRQ(IRQ_PMU) // IRQ for Power Management Unit
}; As a result, armpmu_reserve_hardware() function in arch/arm/kernel/perf_event.c request irq with using IRQ_PMU. Of course, it never generate interrupts on counter overflow. Because per-core PMU (Performance Monitoring Unit) was connected indirectly through an interrupt combinator and Exynos 4412’s interrupt line mappings are somewhat weird, I had to add special initialization routines to route per-core PMU interrupt properlly to its designated core. Here’s the patch. Hope it would help someone wasting his time.