Add CLI functionality for log preprocessing
Prior versions of the package used raw shell scripts to split and convert the raw GPFS log to a usable parquet dataset. Instead, these scripts have been converted to Python submodules to allow use within the Python REPL as well as from the shell CLI. Additionally, each command can submit a separate batch job to perform the processing if desired by the user. This allows for a bit more flexibility in how someone wants to run the preprocessing steps.
Minor changes:
- Added CLI commands
split-logandconvert-to-parquetto replacesplit-info-file.shandrun-convert-to-parquet.sh, respectively. - Converted contents of
split-info-fileto be interfaced through Python. The actual processing is still done via bash through the subprocess module - Added options to run each command using either the local compute resources or submitted through a batch job.
- Add imports to
policyforsplit,compress_logs, andconvert
Ancillary:
- Moved
parse_scontrolfromcompute.utilstoutils - Removed
create-symlinks.shscript since CLI functions are defined inpyproject.tomlnow
Edited by Matthew K Defenderfer