Exception handling

During the OCSP renewal proces lots of things could go wrong, some errors are recoverable, others can be ignored, still others could be cause by temporary issues e.g.: a service interruption of the OCSP server in question. So extensive error handling is done to keep the daemons threads running.

The following is an overview of what can be expected when exceptions occur.

Exception Source Raised when? Action
IOError/OSError certfinder Directory can’t be read. Ignore, certfinder will try at every refresh.
CertFileAccessError certfinder Certificate file can’t be read. Schedule retry 3x n*60s, then 3x, every hour, then ignore. [1]
CertParsingError certparser Can’t access the certificate file, doesn’t parse or part of the chain is missing. Ignore, certfinder will try at every refresh.
StapleBadResponse staplerenewer The response is empty, invalid or the status is not “good”. Schedule retry 3x n*60s, then 3x, every hour, then twice a day. indefinately. If it’s not a server issue, wait for the file to change [1]
urllib.error.URLError staplerenewer An OCSP url can’t be opened. We can try again later, maybe there is a server side issue. Some certificates contain multiple URL’s so we will try each one with 10 seconds intervals and then start from the first again. Schedule retry 3x n*60s, then 3x, every hour, then then twice a day.
requests.exceptions.Timeout Data didn’t reach us within the expected time frame.
requests.exceptions.ReadTimeout
requests.exceptions.ConnectTimeout A connection can’t be established because the server doesn’t reply within the expected time frame.
requests.exceptions.TooManyRedirects When the OCSP server redirects us too many times. Limit is quite high so probably something is wrong with the OCSP server.
requests.exceptions.HTTPError A HTTP error code was returned, this can be a 4xx or 5xx status code.
requests.exceptions.ConnectionError A connection to the OCSP server can’t be established.
SocketError stapleadder A HAProxy socket can not be opened Log a critical error. Every “send” action will try to re-open the socket.
BrokenPipeError A HAProxy socket consistently has a broken pipe
StapleAdderBadResponse HAProxy does not respond with ‘OCSP Response updated!’ Schedule a retry 3x n*60s, then 3x, every hour, then ignore.
[1](1, 2) When the certificate file is changed, certfinder will add the file back to the parsing queue.

stapled.core.exceptions

This module holds the application specific exceptions.

exception stapled.core.exceptions.OCSPBadResponse[source]

Raised when a OCSP staple is not valid.

exception stapled.core.exceptions.RenewalRequirementMissing[source]

Raised when a OCSP renewal is run while not all requirements are met.

exception stapled.core.exceptions.SocketError[source]

Raised by the StapleAdder when it is impossible to connect to or use its socket.

exception stapled.core.exceptions.StapleAdderBadResponse[source]

Raised when HAProxy does not respond with “OCSP Response updated”.

exception stapled.core.exceptions.CertFileAccessError[source]

Raised when a file can’t be accessed at all.

exception stapled.core.exceptions.CertParsingError(msg, *args, **kwargs)[source]

Raised when something went wrong while parsing the certificate file.

exception stapled.core.exceptions.CertValidationError[source]

Raised when validation the certificate chain fails.

stapled.core.excepthandler

This module defines a context in which we can run actions that are likely to fail because they have intricate dependencies e.g. network connections, file access, parsing certificates and validating their chains, etc., without stopping execution of the application. Additionally it will log these errors and depending on the nature of the error reschedule the task at a time that seems reasonable, i.e.: we can reasonably expect the issue to be resolved by that time.

It is generally considered bad practice to catch all remaining exceptions, however this is a daemon. We can’t afford it to get stuck or crashed. So in the interest of staying alive, if an exception is not caught specifically, the handler will catch it, generate a stack trace and save if in a file in the current working directory. A log entry will be created explaining that there was an exception, inform about the location of the stack trace dump and that the context will be dropped. It will also kindly request the administrator to contact the developers so the exception can be caught in a future release which will probably increase stability and might result in a retry rather than just dropping the context.

Dropping the context effectively means that a retry won’t occur and since the context will have no more references, it will be garbage collected. There is however still a reference to the certificate model in core.daemon.run.models. With no scheduled actions it will just sit idle, until the finder detects that it is either removed – which will cause the entry in core.daemon.run.models to be deleted, or it is changed. If the certificate file is changed the finder will schedule schedule a parsing action for it and it will be picked up again. Hopefully the issue that caused the uncaught exception will be resolved, if not, if will be caught again and the cycle continues.

stapled.core.excepthandler.LOG_DIR = '/var/log/stapled/'

This is a global variable that is overridden by stapled.__main__ with the command line argument: --logdir

stapled.core.excepthandler.stapled_except_handle(*args, **kwds)[source]

Handle lots of potential errors and reschedule failed action contexts.

stapled.core.excepthandler.handle_file_error(exc)[source]

Wrapper for handling IOError and OSError logging..

Can’t use FileNotFoundError and PermissionError because they don’t exist in Python 2.7.x yet. This won’t be required after we remove Python 2.7.x support. :param Exception exc: OSError or IOError to handle logging for. :return str: Reason for OSError/IOError.

stapled.core.excepthandler.delete_ocsp_for_context(ctx)[source]

When something bad happens, sometimes it is good to delete a related bad OCSP file so it can’t be served any more.

Todo

Check that HAProxy doesn’t cache this, it probably does, we need to be able to tell it not to remember it.

stapled.core.excepthandler.dump_stack_trace(ctx, exc)[source]

Examine the last exception and dump a stack trace to a file, if it fails due to an IOError or OSError, log that it failed so the a sysadmin may make the directory writeable.