Time Series Data

https://moshez.wordpress.com/2016/08/25/time-series-data/

http://moshez.wordpress.com/?p=1216

When operating computers, we are often exposed to so-called “time series”. Whether it is database latency, page fault rate or total memory used, these are all exposed as numbers that are usually sampled at frequent intervals.

However, not only computer engineers are exposed to such data. It is worthwhile to know what other disciplines are exposed to such data, and what they do with it. “Earth sciences” (geology, climate, etc.) have a lot of numbers, and often need to analyze trends and make predictions. Sometimes these predictions have, literally, billions dollars’ worth of decision hinging on them. It is worthwhile to read some of the textbooks for students of those disciplines to see how to approach those series.

Another discipline that needs to visually inspect time series data is physicians. EKG data is often vital to analyze patients’ health — and especially when compared to their historical records. For that, that data needs to be saved. A lot of EKG research has been done on how to compress numerical data, but still keep it “visually the same”. While the research on that is not as rigorous, and not as settled, as the trend analysis in geology, it is still useful to look into. Indeed, even the basics are already better than so-called “roll-ups”, which preserve none of the visual distinction of the data, flattening peaks and filling hills while keeping a score of “standard deviation” that is not as helpful as is usually hoped for.

Extension API: An exercise in a negative case study

https://moshez.wordpress.com/2016/08/20/extension-api-an-exercise-in-a-negative-case-study/

http://moshez.wordpress.com/?p=1165

I was idly contemplating implementing a new Jupyter kernel. Luckily, they try to provide facility to make it easier. Unfortunately, they made a number of suboptimal choices in their API. Fortunately, those mistakes are both common and easily avoidable.

Subclassing as API

They suggest subclassing IPython.kernel.zmq.kernelbase.Kernel. Errr…not “suggest”. It is a “required step”. The reason is probably that this class already implements 21 methods. When you subclass, make sure to not use any of these names, or things will break randomly. If you do not want to subclass, good luck figuring out what the assumption that the system makes about these 21 methods because there is no interface or even prose documentation.

The return statement in their example is particularly illuminating:

        return {'status': 'ok',
                # The base class increments the execution count
                'execution_count': self.execution_count,
                'payload': [],
                'user_expressions': {},
               }

Note the comment “base class increments the execution count”. This is a classic code smell: this seems like this would be needed in every single overrider, which means it really belongs in the helper class, not in every kernel.

None

The signature for the example do_execute is:

    def do_execute(self, code, silent, store_history=True, 
                   user_expressions=None,
                   allow_stdin=False):

Of course, this means that user_expressions will sometimes be a dictionary and sometimes None. It is likely that the code will be written to anticipate one or the other, and will fail in interesting ways if None is actually sent.

Optional Overrides

As described in this section there are also ways to make the kernel better with optional overrides. The convention used, which is nowhere explained, is that do_ methods mean you should override to make a better kernel. Nowhere it is explained why there is no default history implementation, or where to get one, or why a simple stupid implementation is wrong.

Dictionaries

 

All overrides return dictionaries, which get serialized directly into the underlying communication platform. This is a poor abstraction, especially when the documentation is direct links to the underlying protocol. When wrapping a protocol, it is much nicer to use an Interface as the documentation of what is assumed — and define an attr.s-based class to allow returning something which is automatically the correct type, and will fail in nice ways if a parameter is forgotten.

Summary

If you are providing an API, here are a few positive lessons based on the issues above:

  • You should expect interfaces, not subclasses. Use composition, not subclassing.If you want to provide a default implementation in composition, just check for a return of NotImplemeted(), and use the default.
  • Do the work of abstracting your customers from the need to use dictionaries and unwrap automatically. Use attr.s to avoid customer boilerplate.
  • Send all arguments. Isolate your customers from the need to come up with sane defaults.
  • As much as possible, try to have your interfaces be side-effect free. Instead of asking the customer to directly make a change, allow the customer to make the “needed change” be part of the return type. This will let the customers test their class much more easily.