feat: adds libraries for data processing#28
feat: adds libraries for data processing#280x6861746366574 wants to merge 19 commits intosymbol:mainfrom
Conversation
6c77252 to
7cde8ea
Compare
# Conflicts: # block/block/extractor/extract.py # block/block/extractor/process.py
gimre-xymcity
left a comment
There was a problem hiding this comment.
general comments:
- we usually do not shortcut things if there's no need, so:
accountrather thanacc,transactionrather thantx- although I do this one all the time as well,rcpt->recipient,b_series/f_series->balance_series/fee_series)
|
|
||
| _finds current delegates associated with one or more nodes using serialized state data_ | ||
|
|
||
| This script requires a JSON containing accounts similar to what is receieved from the /node/info API endpoint; see example in `resources/accounts.json`. |
There was a problem hiding this comment.
this file is not present (?)
|
|
||
| block_format_pattern = re.compile('[0-9]{5}'+args.block_extension) | ||
| block_paths = glob.glob(os.path.join(args.input, '**', '*'+args.block_extension), recursive=True) | ||
| block_paths = tqdm(sorted(list(filter(lambda x: block_format_pattern.match(os.path.basename(x)), block_paths)))) |
There was a problem hiding this comment.
you're getting extra points for tqdm ;), heck we should be throwing it everywhere
block/block/extractor/extract.py
Outdated
| parser.add_argument('--block_save_path', type=str, default='block_data.msgpack', help='file to write the extracted block data to') | ||
| parser.add_argument('--statement_save_path', type=str, default='stmt_data.msgpack', help='file to write extracted statement data to') | ||
| parser.add_argument('--state_save_path', type=str, default='state_map.msgpack', help='file to write the extracted chain state data to') |
There was a problem hiding this comment.
not sure, but would probably drop those options and use hardcoded filenames - given that you can set output directory
(or could hide like this https://stackoverflow.com/questions/37303960/show-hidden-option-using-argparse)
| for chunk in tx_chunks: | ||
| filtered.append(filter_transactions(chunk, address, tx_types, start_datetime, end_datetime)) | ||
| return pd.concat(filtered, axis=0) | ||
|
|
There was a problem hiding this comment.
none of process_tx_file, filter_transactions, guarded_convert are used here, so would move to some other file?
|
|
||
| ## block | ||
|
|
||
| Running block extraction scripts requires the installaton of the local **block** package. This can be accomplished as follows: |
There was a problem hiding this comment.
btw, it should be possible (and it is already - assuming you pip install all requirement files), to run the tools like
PYTHONPATH=. python3 block/delegates/find_delegates.py
| escapechar='\\', | ||
| quoting=csv.QUOTE_MINIMAL) | ||
|
|
||
| unpacker = msgpack.Unpacker(open(args.input, 'rb'), unicode_errors=None, raw=True) |
There was a problem hiding this comment.
is unicode_errors actually needed? msgpack dock have this scary warning:
This option should be used only when you have msgpack data which contains invalid UTF-8 string.
| # pylint: disable=too-many-nested-blocks, too-many-branches | ||
|
|
||
| with open(args.input, 'rb') as file: | ||
| blocks = msgpack.unpack(file, unicode_errors=None, raw=True) |
There was a problem hiding this comment.
I've tried running extract, but it fails for me during unpack - not sure what I did wrong:
raceback (most recent call last):
File "block/nft/nember_extract.py", line 90, in <module>
main(parsed_args)
File "block/nft/nember_extract.py", line 22, in main
blocks = msgpack.unpack(file, unicode_errors=None, raw=True)
File "/usr/local/lib/python3.8/dist-packages/msgpack/__init__.py", line 58, in unpack
return unpackb(data, **kwargs)
File "msgpack/_unpacker.pyx", line 208, in msgpack._unpacker.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
block/block/nft/nember_extract.py
Outdated
| gen_tx = gen_tx[0] | ||
| meta_tx = meta_tx[0] | ||
| supply_tx = supply_tx[0] |
There was a problem hiding this comment.
we usually don't reuse variables like this.
things like blocks = sorted(blocks) are fine - does not change the type
this one changes from array to single entity (all 3 gen_tx, meta_tx, supply_tx)
…s.json using old NGL nodes
This pull request carves off individual portions of the extractor tool into their own python modules for better maintainability.
This includes:
extractor/extract:for pulling raw block and statement data from .blk files.extractor/process:for streaming data output from/extractinto block headers and chain states.delegates/find_delegates:for quickly searching for delegates associated with one or more nodes during a specific period of time.harvester/get_harvester_stats:for collecting harvesting data of an individual node.nft/nember_extract:for extracting NFT descriptions and transactions related to NEMberArt NFTsnft/nember_scrape:for pulling NEMBerArt transactions directly from API nodes.