-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dv] Integrity error generation in memory busses #1811
base: master
Are you sure you want to change the base?
Conversation
This does not take into account the type of memory request. That means we won't differentiate between seeing an integrity error on the memory interface while doing a store or a load (either way if rdata integrity bits are faulty, we're seeing an internal NMI -is this okay? WDYT @GregAC ). I believe this might not be ideal and we should only send an integrity error while we are doing a read. Should I change it? |
With the current Ibex RTL integrity errors on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good @ctopal just a few tweaks til it's ready to go.
@@ -1266,6 +1266,7 @@ class core_ibex_mem_error_test extends core_ibex_directed_test; | |||
memory_error_seq_h = memory_error_seq::type_id::create("memory_error_seq_h", this); | |||
`uvm_info(`gfn, "Running core_ibex_mem_error_test", UVM_LOW) | |||
memory_error_seq_h.vseq = vseq; | |||
`DV_ASSERT_CTRL_REQ("NoAlertsTriggered", 1'b0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little too blunt. We wouldn't want alerts to occur on bus errors for example.
Perhaps randomly choose whether or not a test will generate integrity errors at all and then only disable this assertion if we'll be generating integrity errors?
Oh and we do need a check that an integrity error triggers an alert. This can be done as an assert for the data side. Instruction side requires a little more thought (a fetch may not actually get executed so the integrity error doesn't get consumed and doesn't trigger an alert). Though strictly that's a V2S task. So for this PR at least we don't need to worry about it, but we must not forget about it (best to create an issue to track). |
1a49c7c
to
1e3b1fb
Compare
While simulating with this version I found a minor mismatch regarding two back to back integrity errors (with load instructions). RTL side does not take the faulty data from memory into the register, Spike does. I might need help about how to tell Spike to not go into a fault but also not process the retired instruction. |
Test pass rate for this memory test is now around 60%. We hit internal NMI coverpoints though. I suspect almost all failures are related with the issue that I mentioned in the previous message. |
As discussed in the Ibex DV Sync I think we should avoid generating integrity errors in the With that done this can pass CI and we can get it merged. If the new test pass rate is 60% that should do for our V2 targets and we can fix up remaining issues in V3. |
36a9048
to
5435edf
Compare
I think it might be cleaner to split out the cosim/NMI changes into a separate PR? Those changes are quite significant on their own. And could you add some more detail to the PR comment, the messages for 7222d4a and 66f3e2d are a good start but I think some more context would be really useful in the future. |
55ab9bc
to
2bd646b
Compare
e75daed
to
bc16e84
Compare
Thanks @hcallahan-lowrisc for having a look, I updated the commit messages for almost all of them and dropped that one unnecessary commit about catching external RVFI signals -that one should belong to another PR I agree. Latest changes should make the mem_test pass in CI (as can be seen in Ibex CI + Private CI). Might worth just another quick look @GregAC since now it includes cosim integration too. PS Failures on the CI says cancelled,I'm not sure why |
dv/cosim/cosim.h
Outdated
// Set the state of the internal NMI (non-maskable interrupt) line. | ||
// Behaviour wise this is almost as same as external NMI case explained at | ||
// set_nmi method. Main difference to consider is that this interrupt is | ||
// synchronous because it comes from the integrity checking logic inside the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal interrupt is effectively synchronous as currently implemented but it should be consider asynchronous (in particular we don't make any guarantees about when it'll occur relative to a memory access). So I think just drop the final statement here or tweak the language so it doesn't call it 'synchronous'
dv/cosim/spike_cosim.cc
Outdated
@@ -200,7 +200,10 @@ bool SpikeCosim::step(uint32_t write_reg, uint32_t write_reg_data, | |||
|
|||
if (processor->get_state()->last_inst_pc == PC_INVALID) { | |||
if (!(processor->get_state()->mcause->read() & 0x80000000) || | |||
processor->get_state()->debug_mode) { // (Async-Traps are disabled in debug mode) | |||
(processor->get_state()->mcause->read() == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal NMIs should be handled as an async trap here, had you encountered difficulties using internal NMI with the sync trap here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested removing it now, it didn't changed the pass rate. Possibly it was some other problem that lead me to believe this was the correct behaviour. I'll change it to behave just as another NMI now.
14633ae
to
8fc2e59
Compare
I think this PR increases our test time for mem_error_test (after changing it with the new sequence we probably are injecting way more memory errors) and tips it over 30 mins. What should we do about it? |
Let's try dropping the frequency of memory errors and see what that does to the run time. |
Signed-off-by: Canberk Topal <[email protected]>
This will prevent seeing mismatches right at the end of our test. Before this change, mem_error_test could inject error at the store instruction in which we finish up the test, resulting with mismatches with Spike and Ibex on the instructions after finishing. Also do the same prevention for signature_addr as well, since we also don't want to corrupt that memory transaction too. Signed-off-by: Canberk Topal <[email protected]>
Signed-off-by: Canberk Topal <[email protected]>
This test picks between inserting an integrity error or a bus error to the response in the case of a memory request from Ibex. Introduces a control knob `enable_mem_intg_err` which can control the rate of having integrity errors per request. This commit also disables checking for double fault alerts in the scoreboard because they're expected to be seen while simulating and they don't cause infinite loop problems because every time a memory response is requested the error causing part is just randomized. That means Ibex trying to execute same instruction again would have a chance to succeed this time. Signed-off-by: Canberk Topal <[email protected]>
This commit is mainly an extension to cosim environment to drive the newly introduced state variable `nmi_int` in Spike. This commit - Extends RVFI interface by a single bit (ext_nmi_int) - Configures cosim to set nmi_int inside Spike Signed-off-by: Canberk Topal <[email protected]>
Signed-off-by: Canberk Topal <[email protected]>
Signed-off-by: Canberk Topal <[email protected]>
Signed-off-by: Canberk Topal <[email protected]>
I've worked out how I'm going to do this, just need to implement it. I'm going to use a method that doesn't require spike changes (as otherwise these would be quite invasive to suppress a register write on a load). |
I've added some stuff and reworked commits here to give #1811 which supersedes this PR and should get the memory integrity test running and passing. |
1st commit changes the old
memory_error_seq
so that it uses the new sequence library.2nd commit is a bug-fix guaranteeing not injecting an error while handshaking.
3rd commit allows us to stop/start InfiniteRuns type of sequences. Necessary for stopping injecting errors while we are in an IRQ handler.
4th commit improves memory interface to have an integrity error response on demand, also changes the
memory_error_seq
to inject integrity errors as well as bus errors. Also introduces a new seperate test namedriscv_mem_intg_err_test
that randomly picks between having an integrity error or a bus error on a memory response to a request.5th commit integrates internal NMI handling to our cosim environment by driving
nmi_int
in Spike and set internal NMI traps as synchronous.6th commit updates CI to have latest version of ibex_cosim branch of Spike.
7th commit is a simple fix for tying up pc_wdata of RVFI interface
8th commit is an RTL change about allowing the register write in the case of an integrity error. Reason being, if we don't let this instruction execute, this means the interrupt caused by this instruction does need to happen pretty much instantly. And letting Spike know about that behaviour (we retired a faulty instruction that caused an interrupt) was challenging. Instead, after the load/store we now flag the next RVFI package (with the jump instruction) with int_nmi causing Spike to match with Ibex.
For more details please check commit messages.
Resolves #1731
Resolves #1732