"Data munging" is the process of manipulating raw data into a different form,
making it more useful for a downstream purpose.  This doesn't change the source
data, rather it transforms it for specific uses.

TripleBlind supports two methods of munging, one by the data owner and one by
data consumer.  Both can use the well-known SQL language to perform the munging.
Data Consumers can also utilize Python for preprocessing.

Data Owner Munging:
-------------------
A data owner can transform, filter and limit the data published as a data asset
when they create the asset.  The SQL query against a database allows the owner
of the data to expose exactly what they want as they want, and nothing more.
Consumers of these data assets have no way of knowing what the original source
looks like.

The 1a through 1c examples show raw datasets are transformed in various ways
to engineer new features:

 * Calculating a new average feature from existing high and low values
 * Calculating a moving average
 * Reclassifying a textual label into a numerical label (commonly needed for
   training a neural network)

All of these are performed using SQL commands.  SQL is a broadly known
language and can be applied all kinds of tabular data, either directly from
an existing SQL database to form a new asset, or as part of a preprocessing
pipeline.

The 2_run_report.py shows how to utilize one of these assets as a report,
retrieving only a portion of the columns.  In order for this to work, the
fields which will be reported have to be "unmasked" by the data owner, shown
in the 1a example.


Data Consumer Munging:
----------------------
The second method of manipulation allows the user of a dataset to filter and
manipulate data before it is fed into an operation.  These operations can use
the full details of the data in the operations, but this intermediate result
remains completely private and hidden from the consumer's view -- the munged
data feeds straight into operations.

The simple 2a_model_train.py example included here shows how datasets can be
normalized (in this case adjusting units of measurement), subsets of data can be
extracted (extracting participants older than 50), and how a new feature can be
engineered by combining fields inside the dataset.

The data consumer has two tools for performing munging: sql_transform() and
python_transform().  The example illustrates both, using a different method on
each of the two input datasets.  The Python transformer is particularly
powerful, exposing the richness of the Python language and the allowed Pandas
and NumPy libraries for manipulating tabular data.  The security features of
the preprocessor harness prevents abuse by limiting the script capabilities as
the kernel level, blocking file system and internet access.
