Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add info on how DS/NetComm safety controls work (the sequel) #2308

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Created by https://www.gitignore.io/api/linux,macos,windows,visualstudiocode
# Edit at https://www.gitignore.io/?templates=linux,macos,windows,visualstudiocode

### Miscellaneous ###
.wpilib/

### Sphinx ###
# Build directories
build/
Expand Down
9 changes: 9 additions & 0 deletions source/docs/hardware/hardware-safety/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Hardware Safety
===============

This section describes some of the safety features built into the NI roboRIO Control System.

.. toctree::
:maxdepth: 1

io_safety
27 changes: 27 additions & 0 deletions source/docs/hardware/hardware-safety/io_safety.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Input/Output Safety Mechanisms Built into the 2015-2026 `FRC` Control System
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Input/Output Safety Mechanisms Built into the 2015-2026 `FRC` Control System
Input/Output Safety Mechanisms Built into the `FRC` Control System

calling out the years is overly specfici and not really helpful. If the next control system has different safety mechanisms, the document would be updated.

============================================================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of issues here

  • Underline is too long (legal in rST spec, but against our styleguide)
  • You probably want ``FRC`` instead of FRC
  • The title is too long, I think "Control System" and "FRC" is redundant


roboRIO Control System
^^^^^^^^^^^^^^^^^^^^^^

There are multiple safety mechanisms on the robot that handle input / output operations while it is powered on.

Robot side: there are multiple hardware and software components involved. Outputs of the RoboRIO (e.g. :term:`PWM` s) are controlled by the :term:`FPGA` hardware. :term:`NetComm` is a software daemon that talks to the DS, the FPGA, and the user program. Inside of the user process \(the team\’s robot program\), there\’s a NetComm :term:`DLL` component that talks to the FPGA, CAN, and the NetComm daemon. And of course there are CAN motor controllers on the CAN bus.

- The FPGA has a system :term:`watchdog`. This watchdog will time out and force a “disable” of PWM motor outputs if NetComm hasn\’t told it it\’s received an enable packet in the last 125 `ms`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The FPGA has a system :term:`watchdog`. This watchdog will time out and force a “disable” of PWM motor outputs if NetComm hasn\’t told it it\’s received an enable packet in the last 125 `ms`.
- The FPGA has a system :term:`watchdog`. This watchdog will time out and force a “disable” of PWM motor outputs if NetComm hasn\’t told it it\’s received an enable packet in the last 125 `ms`.

Don't use non-UTF8 quotations. Replace “ with ". This helps our translators. Additionally, replace the non-UTF8 apostrophe with '

- The NetComm DLL in the user process will send a disable broadcast message on the CAN network and then stop sending keep-alive CAN messages after the disabled system watchdog signal is read back from the FPGA \(this is checked on a 20 `ms` loop\). REV pneumatics and motor controllers will stop immediately upon receipt of the disable broadcast. They also stop if no keep-alive is received for 100 `ms` \(pneumatics\) or 220 `ms` \(motor controllers\).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our specific timeout values are really just implementation details. It's one thing to share them for a Chief Delphi post that isn't expected to be evergreen, and another thing to put them somewhere that's expected to be an up-to-date resource. Here's my suggestion for a more generic replacement.

Suggested change
- The NetComm DLL in the user process will send a disable broadcast message on the CAN network and then stop sending keep-alive CAN messages after the disabled system watchdog signal is read back from the FPGA \(this is checked on a 20 `ms` loop\). REV pneumatics and motor controllers will stop immediately upon receipt of the disable broadcast. They also stop if no keep-alive is received for 100 `ms` \(pneumatics\) or 220 `ms` \(motor controllers\).
- The NetComm DLL in the user process will send a disable broadcast message on the CAN network and then stop sending keep-alive CAN messages after the disabled system watchdog signal is read back from the FPGA \(this is checked on a 20 `ms` loop\). REV pneumatics and motor controllers will stop immediately upon receipt of the disable broadcast. They also stop if no keep-alive is received after a very brief timeout period.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, there is incorrect information about how the roboRIO's software behaves in this paragraph. While the roboRIO is booted, it never stops sending Universal Heartbeat messages. When the robot is stopped, the roboRIO merely changes the WatchdogEnabled field from a 1 to a 0, but it keeps sending out the heartbeat.

This paragraph should also probably use the official "heartbeat" term consistently, rather than "keep-alive", which I think more strongly implies that receipt of the frame itself indicates an enabled status, rather than the truth, which is that it merely indicates the presence of a roboRIO, and you have to look at a specific byte to check the enabled status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to block the PR, but I think it would be nice to have the timeout value(s) documented. Even if implementations will vary and even change, would it be fair to document minimum and maximum timeouts that all implementations fall inside of?

Copy link
Member

@rzblue rzblue Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its important to have those timeouts (or at least an upper bound) documented. @Kevin-OConnor maybe you can comment on what FIRST's expectation of motor controllers' timeout period is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd that this specifically calls out what REVs devices do while not saying anything about other vendors. This should probably be genericized and use language similar to the "brief timeout period" Noah has suggested.

- The PWM disable works by sending a single idle pulse to the motor controller at the start of the next 20 `ms` PWM cycle after the disable condition is set, and following that, stopping output on the PWM signal line.
- When NetComm receives a control packet from the DS with enable set to true, it will immediately enable motors \(and restart the FPGA watchdog timer\).
- A count of watchdog expiration events is sent by NetComm to the DS, so this data is in the DS log.

Software Side:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "Driver Station side" or similar


- The DS sends a control packet to the robot every 20 `ms`. This is on a high-priority timed loop. Other loops in the DS, including the joystick and GUI loops, run at lower priority. What this means is that under poor CPU conditions or rendering delays \(e.g. large amounts of console output\), it\’s possible for the DS to have internal delay between disable being clicked, a key being hit, or joystick inputs being read to those changes being reflected in the control packets being sent to the robot.
- Control \(DS->Robot\) and status \(Robot->DS\) packets have an embedded sequence number. The DS uses these to compute round-trip-time and packet loss. A status packet that\’s returned is marked as “lost” if the RTT is greater than ~250 ms. This does not mean it was actually missing \(no response received\). The DS does keep a separate count of truly missing \(e.g. no response\) packets and disables \(starts sending control packets with enable=false\) after ~10 drops occur \(so I think this works out to ~450 ms, assuming it\’s 250+10*20\).
- High CPU / GUI delays result in the DS continuing to send packets with enable=true for a period of time until that loop is notified a disable occurs.

Known Issues / Potential Fault Conditions:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- There is no upper limit to control lag. As long as packets keep arriving, they may be several seconds delayed from the DS, so a disable command from the DS would take the same amount of time to be reflected in robot operation. Once it\’s delayed, all controls, including disable/estop, will be delayed. We\’ve all seen delays increase either slowly or quickly\–the robot was controllable until it\’s suddenly much more laggy, or even been laggy from the start.
- Packet buffering / wifi retransmits of control packets result in sporadic enable packets making it to the robot after some delay. The watchdog would disable after 125 `ms`, but a single enable packet would re-enable motors for another 125 `ms` at a time.
15 changes: 15 additions & 0 deletions source/docs/software/frc-glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ FRC Glossary
DHCP
Dynamic Host Configuration Protocol, the protocol that allows a central device to assign unique IP addresses to all other devices.

DLL
An acronym which stands for "Dynamic-Link Library", a shared library or resource used for programs or files within the Microsoft Windows operating system. See `Dynamic-link library <https://en.wikipedia.org/wiki/Dynamic-link_library/>` for more information.

encapsulation
A software design pattern which uses a class to hide the implementation details of other classes. See `encapsulation <https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)>`__ on Wikipedia for more info.

Expand Down Expand Up @@ -116,6 +119,9 @@ FRC Glossary
mutable
An object that can be modified after it is created.

NetComm
A software daemon running on the NI roboRIO controller to maintain communications with the Driver Station, :term:`FPGA`, and user program.

permanent-magnet DC motor
The classification of all legal motors for the FIRST robotics competition. This type of motor takes direct current as input, and uses it to create a magnetic field. In turn, this magnetic field interacts with a physical magnet to create a force that turns the output shaft. Electrical ("brushless") or mechanical ("brushed") means are used to ensure the electrically-generated magnetic field always points in a direction that creates forces when it interacts with the physical magnet, even as the motor's shaft rotates. See `permanent-magnet motor <https://en.wikipedia.org/wiki/Brushed_DC_electric_motor#Permanent-magnet_motors>`__ on Wikipedia for more info.

Expand All @@ -131,6 +137,9 @@ FRC Glossary
pose
The collection of position and rotation information that describes how a rigid body is oriented in space, relative to some fixed reference point.

PWM
An acronym which stands for "Pulse Width Modulation", a method for motor controllers, sensors, and other components to transmit or receive their operational status. For motor controllers, PWM is often used to affect the power output to the motor or connected actuator. See :ref:`PWM` for more information.

RAII
Resource Acquisition Is Initialization; a language behavior (in C++, but not in Java) where holding a resource is tied to object lifetime.

Expand Down Expand Up @@ -178,3 +187,9 @@ FRC Glossary

transitory
In :term:`NetworkTables`, a :term:`topic` that will disappear after the last :term:`publisher` stops publishing.

user program
In the context of the roboRIO control system, the primary FRC runtime program that handles all communication and robot operations.

watchdog
A timer mechanism often built into embedded devices or software to monitor the runtime status of a program and reset it if crashes or errors occur. The watchdog will reset the main program if the timer itself is not reset, an activity which occurs during normal operations of the program.
2 changes: 2 additions & 0 deletions source/docs/software/hardware-apis/motors/pwm-controllers.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.. include:: <isonum.txt>

.. _PWM:

PWM Motor Controllers in Depth
==============================

Expand Down
1 change: 1 addition & 0 deletions source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,7 @@ Community translations can be found in a variety of languages in the bottom-left

docs/hardware/hardware-basics/index
docs/hardware/hardware-tutorials/index
docs/hardware/hardware-safety/index
docs/hardware/sensors/index

.. toctree::
Expand Down