Bolster Documentation - Release 0.1.1 Andrew Bolster
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Bolster Documentation
Release 0.1.1
Andrew Bolster
Aug 20, 2021CONTENTS:
1 Bolster 1
1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Installation 3
2.1 Stable release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 From sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Usage 5
4 Contributing 7
4.1 Types of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Get Started! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Pull Request Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.4 Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.5 Deploying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Credits 11
5.1 Development Lead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 History 13
6.1 0.1.0 (2021-03-11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 0.1.1 (2021-05-18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7 API Reference 15
7.1 bolster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Indices and tables 35
Python Module Index 37
Index 39
iii
CHAPTER
ONE
BOLSTER
Bolster’s Brain, you’ve been warned
• Free software: GNU General Public License v3
• Documentation: https://bolster.readthedocs.io.
1.1 Features
• Efficient tree/node traversal and iteration
• Datetime helpers
• Concurrecy Helpers
• Web safe Encapsulation/Decapsulation helpers
• pandas-esque aggregate/transform_r functions
• “Best Practice” AWS service handling
1.2 Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
1Bolster Documentation, Release 0.1.1 2 Chapter 1. Bolster
CHAPTER
TWO
INSTALLATION
2.1 Stable release
To install Bolster, run this command in your terminal:
$ pip install bolster
This is the preferred method to install Bolster, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
2.2 From sources
The sources for Bolster can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/andrewbolster/bolster
Or download the tarball:
$ curl -OJL https://github.com/andrewbolster/bolster/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
3Bolster Documentation, Release 0.1.1 4 Chapter 2. Installation
CHAPTER
THREE
USAGE
To use Bolster in a project:
import bolster
5Bolster Documentation, Release 0.1.1 6 Chapter 3. Usage
CHAPTER
FOUR
CONTRIBUTING
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
4.1 Types of Contributions
4.1.1 Report Bugs
Report bugs at https://github.com/andrewbolster/bolster/issues.
If you are reporting a bug, please include:
• Your operating system name and version.
• Any details about your local setup that might be helpful in troubleshooting.
• Detailed steps to reproduce the bug.
4.1.2 Fix Bugs
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants
to implement it.
4.1.3 Implement Features
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to
whoever wants to implement it.
4.1.4 Write Documentation
Bolster could always use more documentation, whether as part of the official Bolster docs, in docstrings, or even on
the web in blog posts, articles, and such.
7Bolster Documentation, Release 0.1.1
4.1.5 Submit Feedback
The best way to send feedback is to file an issue at https://github.com/andrewbolster/bolster/issues.
If you are proposing a feature:
• Explain in detail how it would work.
• Keep the scope as narrow as possible, to make it easier to implement.
• Remember that this is a volunteer-driven project, and that contributions are welcome :)
4.2 Get Started!
Ready to contribute? Here’s how to set up bolster for local development.
1. Fork the bolster repo on GitHub.
2. Clone your fork locally:
$ git clone git@github.com:your_name_here/bolster.git
3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up
your fork for local development:
$ mkvirtualenv bolster
$ cd bolster/
$ python setup.py develop
4. Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other
Python versions with tox:
$ flake8 bolster tests
$ python setup.py test or pytest
$ tox
To get flake8 and tox, just pip install them into your virtualenv.
6. Commit your changes and push your branch to GitHub:
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
7. Submit a pull request through the GitHub website.
8 Chapter 4. ContributingBolster Documentation, Release 0.1.1
4.3 Pull Request Guidelines
Before you submit a pull request, check that it meets these guidelines:
1. The pull request should include tests.
2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function
with a docstring, and add the feature to the list in README.rst.
3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check https://travis-ci.com/
andrewbolster/bolster/pull_requests and make sure that the tests pass for all supported Python versions.
4.4 Tips
To run a subset of tests:
$ pytest tests.test_bolster
4.5 Deploying
A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in
HISTORY.rst). Then run:
$ bump2version patch # possible: major / minor / patch
$ git push
$ git push --tags
Travis will then deploy to PyPI if tests pass.
4.3. Pull Request Guidelines 9Bolster Documentation, Release 0.1.1 10 Chapter 4. Contributing
CHAPTER
FIVE
CREDITS
5.1 Development Lead
• Andrew Bolster
5.2 Contributors
None yet. Why not be the first?
11Bolster Documentation, Release 0.1.1 12 Chapter 5. Credits
CHAPTER
SIX
HISTORY
6.1 0.1.0 (2021-03-11)
• First release on PyPI.
6.2 0.1.1 (2021-05-18)
• Decouple from pg_config requirement (now using psycopg2-binary, which while less than ideal, can run in zee
clouds)
• Dependency updates for security vulns
• Companies House API
13Bolster Documentation, Release 0.1.1 14 Chapter 6. History
CHAPTER
SEVEN
API REFERENCE
This page contains auto-generated API reference documentation1 .
7.1 bolster
Top-level package for Bolster.
7.1.1 Subpackages
bolster.data_sources
Submodules
bolster.data_sources.companies_house
Module Contents
Functions
get_basic_company_data_url() Parse the companies house website to get the current
URL for the ‘BasicCompanyData’
query_basic_company_data(query_func = al- Grab the url for the basic company data, and walk
ways) through the CSV files within, and
companies_house_record_might_be_farset(r)A heuristic function for working out if a record in the
companies house registry might be based in Farset Labs
get_companies_house_records_that_might_be_in_farset()
bolster.data_sources.companies_house.get_basic_company_data_url()
Parse the companies house website to get the current URL for the ‘BasicCompanyData’
Currently uses the ‘one file’ method but it could be split into the multi files for memory efficiency
Return type AnyStr
bolster.data_sources.companies_house.query_basic_company_data(query_func=always)
Grab the url for the basic company data, and walk through the CSV files within, and for each row in each CSV
1 Created with sphinx-autoapi
15Bolster Documentation, Release 0.1.1
file, parse the row data through the given query_func such that if query_func(row) is True it will be yielded
Parameters query_func (Callable[Ellipsis, bool]) –
Return type Iterator[Dict]
bolster.data_sources.companies_house.companies_house_record_might_be_farset(r)
A heuristic function for working out if a record in the companies house registry might be based in Farset Labs
Almost certainly incomplete and needs more testing/validation
Parameters r (Dict) –
Return type bool
bolster.data_sources.companies_house.get_companies_house_records_that_might_be_in_farset()
Return type Iterator[Dict]
bolster.stats
Submodules
bolster.stats.distributions
Module Contents
Functions
best_fit_distribution(data, bins=200, Model data by finding best fit distribution to data
ax=None, include_slow=False, discriminator='sse')
bolster.stats.distributions.best_fit_distribution(data, bins=200, ax=None, in-
clude_slow=False, discrimina-
tor='sse')
Model data by finding best fit distribution to data
bolster.utils
Subpackages
bolster.utils.aws
AWS based Asset handling
Includes S3, Kinesis, SSM, SQS, Lambda self-invocation and Redshift querying helpers
16 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
Package Contents
Classes
KinesisLoader Kinesis batchwise insertion handler with chunking and
retry
Functions
chunks(iterable, size=10) Outputs chunks of size N from an iterable (gen-
erator)
start_session(*args, restart=False, **kwargs)
get_s3_client()
put_s3(obj, key, bucket, keys=None, gzip = True, Take either a list of dicts (and dump them as csv to s3)
client=None) or a
get_s3(key, bucket, gzip = True, log_exception=True, Get Object from S3, generally with gzip decompression
client=None) included.
check_s3(key, bucket, client=None) https://www.peterbe.com/plog/
fastest-way-to-find-out-if-a-file-exists-in-s3
get_matching_s3_objects(bucket, prefix='', Generate objects in an S3 bucket.
suffix='', client=None)
get_matching_s3_keys(bucket, **kwargs) Generate the keys in an S3 bucket.
select_from_csv(bucket, key, fields, client=None)
get_latest_key(prefix, bucket, key = None, Walk a given S3 bucket for the lexicographically highest
client=None) item in the given bucket (defaults to the analysis store
get_sqs_client()
send_to_sqs(records, queue, chunksize = 1, Send records in chunks of chunksize for a given sqs
client=None) queue in json-serialised format
get_ssm_client()
get_ssm_param(param_name, client=None) Locally memoized getter for configuration parameters
stored in the AWS “Simple Systems Manager” (now just
fh_json_decode(content) Customised JSON Decoder for consuming Firehose
batched records;
decapsulate_kinesis_payloads(event) Decapsulate base64 encoded kinesis data records to a
list
iterate_kinesis_payloads(event) Iterate over a base64 encoded kinesis data record, yield-
ing entries
send_to_kinesis(records, stream, partition_key = Accessory function for the KinesisLoader class
None)
get_sns_client()
invoke_self_async(event, context) Have the Lambda invoke itself asynchronously, passing
the same event it received originally,
continues on next page
7.1. bolster 17Bolster Documentation, Release 0.1.1
Table 4 – continued from previous page
query(q, redshift_conn_dict, Helper for making queries to redshift (or any postgres
named_cursor='bolster_query_cursor', **kwargs) compatible backend)
SQSWrapper(event, context, queuename, function,
timeout=60000, reinvokelimit=10, maxmessages=1,
raise_exceptions=True, deduplicate=False, fkwargs={},
client=None)
Attributes
logger
session
_ssm_params
bolster.utils.aws.chunks(iterable, size=10)
Outputs chunks of size N from an iterable (generator)
Parameters
• iterable (Iterable) – param size:
• iterable – Iterable:
• size – (Default value = 10)
Return type Generator[List, None, None]
Returns:
>>> next((b for b in chunks(range(10), 2)))
[0, 1]
>>> [b for b in chunks(list(range(10)), 2)]
[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
bolster.utils.aws.logger
bolster.utils.aws.session :Optional[boto3.Session]
bolster.utils.aws.start_session(*args, restart=False, **kwargs)
Return type boto3.Session
bolster.utils.aws.get_s3_client()
bolster.utils.aws.put_s3(obj, key, bucket, keys=None, gzip=True, client=None)
Take either a list of dicts (and dump them as csv to s3) or a StringIO buffer (and dump-as-is to s3)
Parameters
• obj (Union[Sequence[Dict], io.StringIO]) – List of records to be written to
CSV (or StringIO for direct upload):
• key (str) – Destination Key
• bucket (str) – Destination Bucket (Default value = S3_ANALYSIS_STORE)
18 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
• keys – List of expected keys, can be used to filter or set the order of key entry in the
resultant file (Default value = None)
• gzip (bool) – Compress the object (Default value = True)
Return type dict
Returns:
bolster.utils.aws.get_s3(key, bucket, gzip=True, log_exception=True, client=None)
Get Object from S3, generally with gzip decompression included.
Parameters
• key (str) – param bucket:
• gzip (bool) – return:
• key – str:
• bucket (str) – str: (Default value = S3_ANALYSIS_STORE)
• gzip – bool: (Default value = True)
Return type io.StringIO
Returns:
bolster.utils.aws.check_s3(key, bucket, client=None)
https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3
Parameters
• key (str) – str:
• bucket (str) – str: (Default value = S3_ANALYSIS_STORE)
Return type bool
Returns:
bolster.utils.aws.get_matching_s3_objects(bucket, prefix='', suffix='', client=None)
Generate objects in an S3 bucket.
https://alexwlchan.net/2018/01/listing-s3-keys-redux/
Parameters
• bucket (AnyStr) – Name of the S3 bucket.
• prefix – Only fetch objects whose key starts with this prefix (optional).
• suffix – Only fetch objects whose keys end with this suffix (optional).
Return type Iterator
bolster.utils.aws.get_matching_s3_keys(bucket, **kwargs)
Generate the keys in an S3 bucket. https://alexwlchan.net/2018/01/listing-s3-keys-redux/
Parameters
• bucket (AnyStr) – Name of the S3 bucket.
• prefix – Only fetch keys that start with this prefix (optional).
• suffix – Only fetch keys that end with this suffix (optional).
Return type Iterator
bolster.utils.aws.select_from_csv(bucket, key, fields, client=None)
7.1. bolster 19Bolster Documentation, Release 0.1.1
Return type List
bolster.utils.aws.get_latest_key(prefix, bucket, key=None, client=None)
Walk a given S3 bucket for the lexicographically highest item in the given bucket (defaults to the analysis store
defined in utils.env)
Accepts a key callable that can be used to decide how the candidate keys are sorted.
For example, to use loose-versioning, distutils.version.LooseVersion can be passed as the key argument
Parameters
• prefix (str) – param bucket:
• key (Optional[Callable]) – return:
• prefix – str:
• bucket (str) – str: (Default value = S3_ANALYSIS_STORE)
• key – Optional[Callable]: (Default value = None)
Return type str
Returns:
bolster.utils.aws.get_sqs_client()
bolster.utils.aws.send_to_sqs(records, queue, chunksize=1, client=None)
Send records in chunks of chunksize for a given sqs queue in json-serialised format
Parameters
• records (Iterator) – param queue:
• chunksize (int) – return:
• records – Iterator:
• queue (str) – str:
• chunksize – int: (Default value = 1)
Return type None
Returns:
bolster.utils.aws._ssm_params
bolster.utils.aws.get_ssm_client()
bolster.utils.aws.get_ssm_param(param_name, client=None)
Locally memoized getter for configuration parameters stored in the AWS “Simple Systems Manager” (now just
systems manager) Parameter Store
Parameters
• param_name (str) – return:
• param_name – str:
Return type str
Returns:
bolster.utils.aws.fh_json_decode(content)
Customised JSON Decoder for consuming Firehose batched records;
20 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
Firehose doesn’t include entry separators between entries, so we intercept the raw_decoder on JSONDecodeEr-
ror and ‘skip’ over the ‘where is my comma?’ issue and continue to parse the rest of the content until we reach
the end of the given content string.
Parameters content (AnyStr) – AnyStr:
Return type Iterator[Union[Dict, List]]
Returns:
>>> list(fh_json_decode('{"test":"value"}{"test":"othervalue"}'))
[{'test': 'value'}, {'test': 'othervalue'}]
bolster.utils.aws.decapsulate_kinesis_payloads(event)
Decapsulate base64 encoded kinesis data records to a list
Parameters event (Dict) – Dict:
Return type List[Dict]
Returns:
bolster.utils.aws.iterate_kinesis_payloads(event)
Iterate over a base64 encoded kinesis data record, yielding entries
Parameters
• event (Dict) – return:
• event – Dict:
Return type Generator[Dict, None, None]
Returns:
class bolster.utils.aws.KinesisLoader(batch_size=500, maximum_records=None,
stream=None)
Bases: object
Kinesis batchwise insertion handler with chunking and retry
generate_and_submit(self, items, partition_key=None)
Submit batches of items to the configured stream
Parameters
• items (Iterator) – param partition_key:
• items – Iterator:
• partition_key (str) – str: (Default value = None)
Return type SupportsInt
Returns:
submit_batch_until_successful(self, this_batch, response)
If needed, retry a batch of records, backing off exponentially until it goes through
Parameters
• this_batch (List) – List:
• response (Dict) – Dict:
Returns:
7.1. bolster 21Bolster Documentation, Release 0.1.1
bolster.utils.aws.send_to_kinesis(records, stream, partition_key=None)
Accessory function for the KinesisLoader class
Parameters
• records (Iterator[Sequence]) – Iterator[Sequence]:
• stream (str) – str:
• partition_key (str) – str: (Default value = None)
Return type int
Returns:
bolster.utils.aws.get_sns_client()
bolster.utils.aws.invoke_self_async(event, context)
Have the Lambda invoke itself asynchronously, passing the same event it received originally, and tagging the
event as ‘async’ so it’s actually processed
THIS DOES NOT WORK FROM WITHIN A VPC! (There is no lambda-invoke endpoint accessible without
poking lots of holes in the VPC.
Parameters
• event (Dict) – Dict:
• context (Any) – Any:
Returns:
bolster.utils.aws.query(q, redshift_conn_dict, named_cursor='bolster_query_cursor', **kwargs)
Helper for making queries to redshift (or any postgres compatible backend)
{
"user":"USERNAME",
"host":"HOSTNAME",
"connect_timeout":3,
"dbname":"DATABASE",
"port":5439,
"password":"SUPERSECRETPASSWORD1111"
}
This function implements the ‘is_local’ check if it is getting it’s configuration dictionary from the parameter
store, and will overwrite the ‘host’ in the store with a resolvable hostname for the ALDS datastore.
Basically, if you’re not working on ALDS, in a few very specific locations, or are outside the ALDS VPC, give
this a sensible dictionary.
kwargs are passed through as vars to the SQL execution, i.e. to be used with substitution queries, eg:
query("select * from table where id = %(my_id)s", my_id = 14228)
NOTE! If you use % wildcards (i.e. LIKE ‘%string’), you’re gonna have a bad time. . . (Use the POSIX regex
instead: https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-posix.html)
Parameters
• q (str) – param redshift_conn_dict:
• kwargs – return:
• q – str:
• redshift_conn_dict (dict) – dict: (Default value = None)
22 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
• **kwargs –
Return type Iterator[Dict]
Returns:
bolster.utils.aws.SQSWrapper(event, context, queuename, function, timeout=60000, rein-
vokelimit=10, maxmessages=1, raise_exceptions=True, dedupli-
cate=False, fkwargs={}, client=None)
Submodules
bolster.utils.deco
Module Contents
Functions
timed(func) This decorator prints the execution time for the deco-
rated function.
bolster.utils.deco.timed(func)
This decorator prints the execution time for the decorated function.
bolster.utils.dt
Module Contents
Functions
round_to_week(dt) Return a date for the Monday before the given date
round_to_month(dt) Return a date for the first day of the month of a given
date
utc_midnight_on(dt) Some services don’t like timezones, so this helper func-
tion converts datetime.date and
bolster.utils.dt.round_to_week(dt)
Return a date for the Monday before the given date
Parameters
• dt (Union[datetime.datetime, datetime.date]) – return:
• dt – datetime:
Return type datetime.date
Returns:
>>> round_to_week(datetime(2018,8,9,12,1))
datetime.date(2018, 8, 6)
(continues on next page)
7.1. bolster 23Bolster Documentation, Release 0.1.1
(continued from previous page)
>>> round_to_week(date(2018,8,9))
datetime.date(2018, 8, 6)
bolster.utils.dt.round_to_month(dt)
Return a date for the first day of the month of a given date
Parameters
• dt (Union[datetime.datetime, datetime.date]) – return:
• dt – datetime:
Return type datetime.date
Returns:
>>> round_to_month(datetime(2018,8,9,12,1))
datetime.date(2018, 8, 1)
>>> round_to_month(date(2018,8,9))
datetime.date(2018, 8, 1)
bolster.utils.dt.utc_midnight_on(dt)
Some services don’t like timezones, so this helper function converts datetime.date and datetime.datetime objects
to a datetime.datetime object corresponding to UTC Midnight on that date.
Pays primary attention to the actual Date of the input, regardless of if the combination of given-time and time-
zone would roll over into another date.
Parameters
• dt (datetime.datetime) – return:
• dt – datetime:
Return type datetime.datetime
>>> utc_midnight_on(datetime(2018,9,1,12,12))
datetime.datetime(2018, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)
>>> utc_midnight_on(datetime(2018,9,1,12,12, tzinfo=timezone(timedelta(hours=-
˓→13))))
datetime.datetime(2018, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)
bolster.utils.web
Module Contents
Functions
download_extract_zip(url) Download a ZIP file and extract its contents in memory
bolster.utils.web.download_extract_zip(url)
Download a ZIP file and extract its contents in memory yields (filename, file-like object) pairs
24 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
7.1.2 Submodules
bolster.cli
Console script for bolster.
Module Contents
Functions
main(args=None) Console script for bolster.
bolster.cli.main(args=None)
Console script for bolster.
7.1.3 Package Contents
Classes
memoize cache the return value of a method
Functions
_dumb_passthrough(x, **kwargs) Pointless passthrough replacement for tqdm (and simi-
lar) fallback
always(x, **kwargs) Pointless passthrough replacement for ‘always true’ fil-
tering
poolmap(f, iterable, max_workers = None, progress = Helper function to encapsulate a ThreadPoolExecutor
None, **kwargs) mapped function workflow
batch(seq, n = 1) Split a sequence into n-length batches (is still iterable,
not list)
chunks(iterable, size=10) Outputs chunks of size N from an iterable (gen-
erator)
arg_exception_logger(func) Helper Decorator to provide info on the arguments that
cause the exception of a wrapped function
backoff(exception_to_check, tries = 5, delay = 0.2, Retry calling the decorated function using an exponen-
backoff = 2, logger = logger) tial backoff.
tag_gen(seq, **kwargs) Generator stream that adds a kwargs to each entry
yielded
exceptional_executor(futures, excep- Generator for concurrent.Futures handling
tion_handler=None, timeout=None)
working_directory(path) Contextmanager that changes working directory and re-
turns to previous on exit.
compress_for_relay(obj) Compress json-serializable object to a gzipped base64
string
continues on next page
7.1. bolster 25Bolster Documentation, Release 0.1.1
Table 11 – continued from previous page
decompress_from_relay(msg) Uncompress gzipped base64 string to a json-serializable
object
pretty_print_request(req, expose_auth=False, At this point it is completely built and ready
authentication_header_blacklist = None)
get_recursively(search_dict, field) Takes a dict with nested lists and dicts,
transform_(r, rule_keys) Generic Item-wise transformation function;
diff(new, old, excluded_fields = None) Perform a one-depth diff of a pair of dictionaries
aggregate(base, group_key, item_key, condition = Abstracted groupby-sum for lists of dicts
None)
breadth(d) Get the total ‘width’ of a tree
depth(d) Get the maximum depth of a tree
set_keys(d) Extract the set of all keys of a nested dict/tree
keys_at(d, n, i = 0) Extract the keys of a tree at a given depth
items_at(d, n, i = 0) Extract the elements from a tree at a given depth
leaves(d) Iterate on the leaves of a tree
leaf_paths(d, path = None)
flatten_dict(d, head = '', sep = ':')
uncollect_object(d)
dict_concat_safe(d, keys, default = None) Really Lazy Func because dict.get(‘key’,default) is a
pain in the ass for lists
Attributes
__author__
__email__
__version__
logger
bolster.__author__ = Andrew Bolster
bolster.__email__ = me@andrewbolster.info
bolster.__version__ = 0.1.1
bolster.logger
bolster._dumb_passthrough(x, **kwargs)
Pointless passthrough replacement for tqdm (and similar) fallback
Parameters x – return:
Returns:
bolster.always(x, **kwargs)
Pointless passthrough replacement for ‘always true’ filtering
26 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
>>> always('false')
True
>>> always(False)
True
>>> always(True)
True
Return type bool
bolster.poolmap(f, iterable, max_workers=None, progress=None, **kwargs)
Helper function to encapsulate a ThreadPoolExecutor mapped function workflow Accepts (assumed to be tqdm
style) progress monitor callback
kwargs are passed identically to all f(i) calls for each i in iterable
Parameters
• f (Callable) – function to map across
• iterable (Iterable) –
• max_workers (Optional[int]) – (Default value = None)
• progress (Callable) – (Default value = None)
• **kwargs – passed as arguments to f
Return type Dict
Returns:
bolster.batch(seq, n=1)
Split a sequence into n-length batches (is still iterable, not list)
Parameters
• seq (Sequence) –
• n (int) – (Default value = 1)
Return type Generator[Iterable, None, None]
Returns:
>>> next((b for b in batch(range(10), 2)))
range(0, 2)
>>> [b for b in batch(list(range(10)), 2)]
[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
bolster.chunks(iterable, size=10)
Outputs chunks of size N from an iterable (generator)
Parameters
• iterable (Iterable) – param size:
• iterable – Iterable:
• size – (Default value = 10)
Return type Generator[List, None, None]
Returns:
7.1. bolster 27Bolster Documentation, Release 0.1.1
>>> next((b for b in chunks(range(10), 2)))
[0, 1]
>>> [b for b in chunks(list(range(10)), 2)]
[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
bolster.arg_exception_logger(func)
Helper Decorator to provide info on the arguments that cause the exception of a wrapped function
Parameters func (Callable) –
Return type Callable
Returns:
bolster.backoff(exception_to_check, tries=5, delay=0.2, backoff=2, logger=logger)
Retry calling the decorated function using an exponential backoff.
http://www.saltycrane.com/blog/2009/11/trying-out-retry-decorator-python/ original from: http://wiki.python.
org/moin/PythonDecoratorLibrary#Retry
Parameters
• exception_to_check (Union[BaseException,
Sequence[BaseException]]) – the exception to check. may be a tuple of
• tries (SupportsInt) –
• delay (SupportsFloat) –
• backoff (SupportsFloat) –
• logger (Optional[logging.Logger]) –
exceptions to check tries: number of times to try (not retry) before giving up (Default value = 5) delay: initial
delay between retries in seconds (Default value = 0.4) backoff: backoff multiplier e.g. value of 2 will
double the delay
each retry (Default value = 2) logger: logger to use. If None, print (Default value = local utils logger)
Returns:
exception bolster.MultipleErrors(errors=None)
Bases: BaseException
Exception Class to enable the capturing of multiple exceptions without interrupting control flow, i.e. catch the
exception, but carry on and report the exceptions at the end.
E.g.
exceptions = MultipleErrors()
try:
do_risky_thing_with(this) #raises ValueError
except:
exceptions.capture_current_exception()
try:
do_other_thing_with(this) #raises AttributeError
except:
exceptions.capture_current_exception()
exceptions.do_raise()
28 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
Traceback (most recent call last):
....
Value Error
Traceback (most recent call last):
...
AttributeError
classmethod _traceback_for(cls, exc_info)
Formatting!
__str__(self )
Return str(self).
capture_current_exception(self )
Gathers exception info from the current context and retains it
do_raise(self )
Raises itself if it contains any errors
bolster.tag_gen(seq, **kwargs)
Generator stream that adds a kwargs to each entry yielded
The below example shows the creation of an empty dict generator where tag_gen is used to insert a new key/value
(k=1) in each item on the fly
>>> all([i['k'] == 1 for i in tag_gen(({} for _ in range(4)), k=1)])
True
Parameters
• seq (Iterator[Dict]) – param kwargs:
• seq – Iterator[Dict]:
• **kwargs –
Return type Iterator[Dict]
bolster.exceptional_executor(futures, exception_handler=None, timeout=None)
Generator for concurrent.Futures handling
When an exception is raised in an executing Future, f.result() called on it’s own will raise that exception in the
parent thread, killing execution and causing loss of ‘future local’ scope.
Instead, query the future for it’s exception state first, and handle that separately, by default by logging it as an
exception.
Parameters
• futures (Sequence[concurrent.futures.Future]) –
• exception_handler –
• timeout –
Return type Iterator
Returns:
bolster.working_directory(path)
Contextmanager that changes working directory and returns to previous on exit.
7.1. bolster 29Bolster Documentation, Release 0.1.1
Parameters path (Union[str, pathlib.Path]) – Union[str: Path]:
Return type contextlib.AbstractContextManager
bolster.compress_for_relay(obj)
Compress json-serializable object to a gzipped base64 string
Parameters
• obj (Union[List, Dict]) – return:
• obj – Union[List,Dict]:
Return type AnyStr
>>> decompress_from_relay(compress_for_relay(['test']))
['test']
>>> decompress_from_relay(compress_for_relay({'test':'test'}))
{'test': 'test'}
bolster.decompress_from_relay(msg)
Uncompress gzipped base64 string to a json-serializable object [‘test’]
Parameters msg (AnyStr) – AnyStr:
Return type Union[List, Dict]
Returns:
class bolster.memoize(func)
Bases: object
cache the return value of a method
This class is meant to be used as a decorator of methods. The return value from a given method invocation
will be cached on the instance whose method was invoked. All arguments passed to a method decorated with
memoize must be hashable.
If a memoized method is invoked directly on its class the result will not be cached. Instead the method will be
invoked like a static method:
class Obj(object):
@memoize
def add_to(self, arg):
return self + arg
Obj.add_to(1) # not enough arguments
Obj.add_to(1, 2) # returns 3, result is not cached
Source: http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/
Augmented with cache hit/miss population Counters
__get__(self, obj, objtype=None)
__call__(self, *args, **kw)
bolster.pretty_print_request(req, expose_auth=False, authentication_header_blacklist=None)
At this point it is completely built and ready to be fired; it is “prepared”.
However pay attention at the formatting used in this function because it is programmed to be pretty printed and
may differ from the actual request.
30 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
Parameters
• req –
• expose_auth – (Default value = False)
• authentication_header_blacklist (Optional[List]) –
Return type None
Returns:
bolster.get_recursively(search_dict, field)
Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided.
Originally taken from https://stackoverflow.com/a/20254842
Parameters
• search_dict (Dict) – Dict:
• field (str) – str:
Return type List
Returns:
>>> get_recursively({'id' : 5,'children' : {'id' : 6,'children' : {'id' : 7,
˓→'children' : {}}}}, 'id')
[5, 6, 7]
bolster.transform_(r, rule_keys)
Generic Item-wise transformation function; The values in r are updated based on key-matching in rule_keys, i.e.
-> out[k] = rule_keys[k] (r[k])
HOWEVER, this can do more that straight callable mapping; can also update the key, i.e., for a given rule such
that R = rule_keys[k]:
R can be used to select that field to be selected in the output >>> r = {‘a’:’1’,’b’:’2’,’c’:’3’} >>> transform_(r,
{‘a’:None}) {‘a’: ‘1’}
Rename a key >>> transform_(r, {‘a’:(‘A’,None)}) {‘A’: ‘1’}
Apply a function to a key’s value >>> transform_(r, {‘a’:(‘a’,int)}) {‘a’: 1}
Or a combination of these >>> transform_(r, {‘a’:(‘A’,int), ‘b’:None}) {‘A’: 1, ‘b’: ‘2’}
Parameters
• r (Dict) –
• rule_keys (Dict[AnyStr, Optional[Tuple]]) –
Return type Dict
bolster.diff(new, old, excluded_fields=None)
Perform a one-depth diff of a pair of dictionaries
Parameters
• new (Dict) –
• old (Dict) –
• excluded_fields (Optional[set]) –
Return type Dict
7.1. bolster 31Bolster Documentation, Release 0.1.1
bolster.aggregate(base, group_key, item_key, condition=None)
Abstracted groupby-sum for lists of dicts operationally equivalent to ` df = pd.DataFrame(base) df.
where(condition).groupby(group_key)[item_key].sum() `
Parameters
• base (List[Dict]) –
• group_key (Union[AnyStr, Tuple[AnyStr], List[AnyStr]]) –
• item_key (AnyStr) –
• condition (Optional[Callable]) –
Returns:
bolster.breadth(d)
Get the total ‘width’ of a tree
> Why was this a thing? No idea
bolster.depth(d)
Get the maximum depth of a tree
Parameters d (Dict) –
Return type SupportsInt
bolster.set_keys(d)
Extract the set of all keys of a nested dict/tree
Parameters d (Dict) –
Return type Set
bolster.keys_at(d, n, i=0)
Extract the keys of a tree at a given depth
Parameters
• d (Dict) –
• n (SupportsInt) –
• i (SupportsInt) –
Return type Iterator
bolster.items_at(d, n, i=0)
Extract the elements from a tree at a given depth
Parameters
• d (Dict) –
• n (SupportsInt) –
• i (SupportsInt) –
Return type Iterator[Tuple]
bolster.leaves(d)
Iterate on the leaves of a tree
Parameters d (Dict) –
Return type Iterator
bolster.leaf_paths(d, path=None)
32 Chapter 7. API ReferenceBolster Documentation, Release 0.1.1
Parameters
• d (Dict) –
• path (Optional[List]) –
Return type Iterator[Tuple[List, Dict]]
bolster.flatten_dict(d, head='', sep=':')
Parameters
• d (Dict) –
• head (AnyStr) –
• sep (AnyStr) –
Return type Dict
bolster.uncollect_object(d)
Parameters d (Dict) –
Return type Dict
bolster.dict_concat_safe(d, keys, default=None)
Really Lazy Func because dict.get(‘key’,default) is a pain in the ass for lists
Parameters
• d (Dict) –
• keys (List[Hashable]) –
• default (Optional) –
Return type Iterator
7.1. bolster 33Bolster Documentation, Release 0.1.1 34 Chapter 7. API Reference
CHAPTER
EIGHT
INDICES AND TABLES
• genindex
• modindex
• search
35Bolster Documentation, Release 0.1.1 36 Chapter 8. Indices and tables
PYTHON MODULE INDEX
b
bolster, 15
bolster.cli, 25
bolster.data_sources, 15
bolster.data_sources.companies_house,
15
bolster.stats, 16
bolster.stats.distributions, 16
bolster.utils, 16
bolster.utils.aws, 16
bolster.utils.deco, 23
bolster.utils.dt, 23
bolster.utils.web, 24
37Bolster Documentation, Release 0.1.1 38 Python Module Index
INDEX
Symbols bolster.utils.web
__author__ (in module bolster), 26 module, 24
__call__() (bolster.memoize method), 30 breadth() (in module bolster), 32
__email__ (in module bolster), 26
__get__() (bolster.memoize method), 30 C
__str__() (bolster.MultipleErrors method), 29 capture_current_exception() (bol-
__version__ (in module bolster), 26 ster.MultipleErrors method), 29
_dumb_passthrough() (in module bolster), 26 check_s3() (in module bolster.utils.aws), 19
_ssm_params (in module bolster.utils.aws), 20 chunks() (in module bolster), 27
_traceback_for() (bolster.MultipleErrors class chunks() (in module bolster.utils.aws), 18
method), 29 companies_house_record_might_be_farset()
(in module bol-
A ster.data_sources.companies_house), 16
aggregate() (in module bolster), 31 compress_for_relay() (in module bolster), 30
always() (in module bolster), 26
arg_exception_logger() (in module bolster), 28 D
decapsulate_kinesis_payloads() (in module
B bolster.utils.aws), 21
backoff() (in module bolster), 28 decompress_from_relay() (in module bolster),
batch() (in module bolster), 27 30
best_fit_distribution() (in module bol- depth() (in module bolster), 32
ster.stats.distributions), 16 dict_concat_safe() (in module bolster), 33
bolster diff() (in module bolster), 31
module, 15 do_raise() (bolster.MultipleErrors method), 29
bolster.cli download_extract_zip() (in module bol-
module, 25 ster.utils.web), 24
bolster.data_sources
module, 15
E
bolster.data_sources.companies_house exceptional_executor() (in module bolster), 29
module, 15
bolster.stats F
module, 16 fh_json_decode() (in module bolster.utils.aws), 20
bolster.stats.distributions flatten_dict() (in module bolster), 33
module, 16
bolster.utils G
module, 16 generate_and_submit() (bol-
bolster.utils.aws ster.utils.aws.KinesisLoader method), 21
module, 16 get_basic_company_data_url() (in module
bolster.utils.deco bolster.data_sources.companies_house), 15
module, 23 get_companies_house_records_that_might_be_in_farset
bolster.utils.dt (in module bol-
module, 23 ster.data_sources.companies_house), 16
39Bolster Documentation, Release 0.1.1
get_latest_key() (in module bolster.utils.aws), 20 Q
get_matching_s3_keys() (in module bol- query() (in module bolster.utils.aws), 22
ster.utils.aws), 19 query_basic_company_data() (in module bol-
get_matching_s3_objects() (in module bol- ster.data_sources.companies_house), 15
ster.utils.aws), 19
get_recursively() (in module bolster), 31 R
get_s3() (in module bolster.utils.aws), 19
round_to_month() (in module bolster.utils.dt), 24
get_s3_client() (in module bolster.utils.aws), 18
round_to_week() (in module bolster.utils.dt), 23
get_sns_client() (in module bolster.utils.aws), 22
get_sqs_client() (in module bolster.utils.aws), 20 S
get_ssm_client() (in module bolster.utils.aws), 20
get_ssm_param() (in module bolster.utils.aws), 20 select_from_csv() (in module bolster.utils.aws),
19
I send_to_kinesis() (in module bolster.utils.aws),
21
invoke_self_async() (in module bol-
send_to_sqs() (in module bolster.utils.aws), 20
ster.utils.aws), 22
session (in module bolster.utils.aws), 18
items_at() (in module bolster), 32
set_keys() (in module bolster), 32
iterate_kinesis_payloads() (in module bol-
SQSWrapper() (in module bolster.utils.aws), 23
ster.utils.aws), 21
start_session() (in module bolster.utils.aws), 18
submit_batch_until_successful() (bol-
K ster.utils.aws.KinesisLoader method), 21
keys_at() (in module bolster), 32
KinesisLoader (class in bolster.utils.aws), 21 T
tag_gen() (in module bolster), 29
L timed() (in module bolster.utils.deco), 23
leaf_paths() (in module bolster), 32 transform_() (in module bolster), 31
leaves() (in module bolster), 32
logger (in module bolster), 26 U
logger (in module bolster.utils.aws), 18
uncollect_object() (in module bolster), 33
utc_midnight_on() (in module bolster.utils.dt), 24
M
main() (in module bolster.cli), 25 W
memoize (class in bolster), 30
working_directory() (in module bolster), 29
module
bolster, 15
bolster.cli, 25
bolster.data_sources, 15
bolster.data_sources.companies_house,
15
bolster.stats, 16
bolster.stats.distributions, 16
bolster.utils, 16
bolster.utils.aws, 16
bolster.utils.deco, 23
bolster.utils.dt, 23
bolster.utils.web, 24
MultipleErrors, 28
P
poolmap() (in module bolster), 27
pretty_print_request() (in module bolster), 30
put_s3() (in module bolster.utils.aws), 18
40 IndexYou can also read