pass through attributes when creating metrics, dimensions, identifiers by danfrankj · Pull Request #18 · matthewwardrop/mensor

danfrankj · 2018-08-07T00:11:54Z

Summary

When configuring measures, I need to be able to configure additional properties, especially transforms. This change passes additional configuration up the chain.

e,g,

enables answering questions like ...

mensor_registry.evaluate('user', measures=['transfer/traders'], segment_by=['transfer/ds'])

Testing Done

Manual

danfrankj · 2018-08-07T00:12:35Z

mensor/measures/types.py

    @transforms.setter
    def transforms(self, transforms):
        # TODO: Check structure of transforms dict
-        if not transforms:


nonsubstantive, but I do like oneliners

Aye... I like this better too. I think I left it uncollapsed because I was expecting to do more logic than would be elegant in a one-liner. For now, this works well.

matthewwardrop · 2018-08-07T03:52:09Z

mensor/measures/types.py

            else:
-                raise KeyError("No such attribute {}.".format(attr))
+                raise AttributeError(
+                    "Cannot initialize {}<{}> with attribute {}.".format(self.__class__.__name__, self.name, attr)


Can you quote the attribute name in single quotes?

matthewwardrop · 2018-08-07T03:59:10Z

I don't know how I feel about passing through all extra keyword arguments like this. There are some attributes which have been intentionally masked by the subclass constructors. Doing this re-opens direct access to these attributes, but does on the other hand simplify setting them. Most of these attributes are definitely not supposed to be touched.

The transforms were intentionally not made something that measure providers could set, because that can lead to unintuitive and confusing behaviour. Can you give me some examples of where it is necessary?

danfrankj · 2018-08-30T20:52:59Z

mensor/backends/sql.py

    @classmethod
    def _on_registered(cls, key):
-        for agg in ['sum', 'mean', 'sos', 'count']:
+        for agg in ['sum', 'mean', 'sos', 'count', '1']:


perhaps rename to "any"?

danfrankj · 2018-08-30T20:54:48Z

mensor/measures/types.py

-    def __init__(self, name, expr=None, default=None, desc=None, shared=False, partition=False, requires_constraint=False, provider=None):
-        _ProvidedFeature.__init__(self, name, expr=expr, default=default, desc=desc, shared=shared, provider=provider)
+    def __init__(self, name, expr=None, default=None, desc=None, shared=False, partition=False, requires_constraint=False, provider=None, **attrs):
+        _ProvidedFeature.__init__(self, name, expr=expr, default=default, desc=desc, shared=shared, provider=provider, **attrs)


as discussed, perhaps we should be more explicit with which attrs get sent through

- add '1' SQL transform, aka "any"

danfrankj · 2018-08-30T21:01:51Z

mensor/measures/types.py

        }
+
        if isinstance(self.transforms, dict):
+            transforms.update(self.transforms.get('_default', {}))


default transforms for all unit types?

I remain unconvinced, I'm afraid, that this is a good idea, as it leads to confusing (and usually wrong) behaviour :/. Suppose we chose count as the default aggregation for some measure. That might make sense the first time it is aggregated, but on subsequent aggregations (which may occur due to multiple rebase operations, for example), you would be taking the count of counts at each aggregations, which would make the resulting measure likely meaningless.

matthewwardrop · 2018-08-31T05:33:38Z

@danfrankj I still need to think about this a bit more deeply. It seems important to me that MeasureProviders not be able to influence the aggregation of the metric beyond aggregations that occur over the MeasureProvider in question. Anything else seems to break the natural and intuitive composition of arbitrary MeasureProviders, each ignorant of what the others provide. And yet... and yet... I can see why something like this is valuable. I'm going to be working pretty regularly on Mensor in the evening over the next couple of weeks (while at home on Dad duty), cleaning things up and implementing a few more higher-level components (such as a revised Metric and stats layer). As I do that, I'll keep this use-case in mind and see how much of a concession it actually is.

One alternative approach to that pursued here is to encourage things like this to be implemented as metrics, rather than measures. Is there a reason why this would not work well in this case?

matthewwardrop · 2018-09-07T05:32:51Z

@danfrankj We've spoken about this a little through other channels, but just to keep it all in one place:

I have given some more thought to the aggregation configuration issue. Are there any cases that you can think of where it makes sense to change the aggregation past the first aggregation of a given measure?

I'm thinking of adding support at the provider level for specifying what the next aggregation will be; perhaps via something like mp.provides_measure(..., next_agg='sum', next_post_agg=None). Would that deal with your use cases?

If we allow further configuration, it seems to me that there would be an explosion of complexity, not just in building the providers, but in understanding the behaviour of new providers. Measures can be aggregated over different unit types, potentially multiple times, and if a measure provider can affect downstream aggregations beyond the first aggregation, then a lot of weird (and probably incorrect) operations will occur.

danfrankj commented Aug 7, 2018

View reviewed changes

danfrankj requested a review from matthewwardrop August 7, 2018 00:14

matthewwardrop reviewed Aug 7, 2018

View reviewed changes

danfrankj force-pushed the df_configure_transforms branch from ef3ba16 to 8009d35 Compare August 30, 2018 20:52

danfrankj commented Aug 30, 2018

View reviewed changes

- enable configuration of default transforms for measures

5a33132

- add '1' SQL transform, aka "any"

danfrankj force-pushed the df_configure_transforms branch from 8009d35 to 5a33132 Compare August 30, 2018 21:01

danfrankj commented Aug 30, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pass through attributes when creating metrics, dimensions, identifiers#18

pass through attributes when creating metrics, dimensions, identifiers#18
danfrankj wants to merge 1 commit intomasterfrom
df_configure_transforms

danfrankj commented Aug 7, 2018 •

edited

Loading

Uh oh!

danfrankj Aug 7, 2018

Uh oh!

matthewwardrop Aug 7, 2018

Uh oh!

matthewwardrop Aug 7, 2018

Uh oh!

danfrankj Aug 30, 2018

Uh oh!

matthewwardrop commented Aug 7, 2018 •

edited

Loading

Uh oh!

danfrankj Aug 30, 2018

Uh oh!

danfrankj Aug 30, 2018

Uh oh!

matthewwardrop Aug 31, 2018

Uh oh!

danfrankj Aug 30, 2018

Uh oh!

matthewwardrop Aug 31, 2018

Uh oh!

matthewwardrop commented Aug 31, 2018

Uh oh!

matthewwardrop commented Sep 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danfrankj commented Aug 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewwardrop commented Aug 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewwardrop commented Aug 31, 2018

Uh oh!

matthewwardrop commented Sep 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danfrankj commented Aug 7, 2018 •

edited

Loading

matthewwardrop commented Aug 7, 2018 •

edited

Loading