Resructure data dir and policy output names
The current naming scheme and structure for output policy files is somewhat confusing coming in. Most of the interaction we have had in the past with these data have been in the symlink directories with naming scheme list-policy_<device>_<date> that point to a directory already containing the chunked and parquet-converted policy data. However, the naming scheme for the raw policy logs is not similar to the previous directory name making it difficult to orient when starting from the initial policy run step. Having essentially 3 entries in the data directory for each policy run increases clutter (the initial policy log, the directory with the chunks, and the symlink pointing to the chunk dir). Instead, I propose we organize the directory to where a single subdirectory contains all relevant files for each policy run. The directory name would be descriptive of the type of policy, the device the policy was applied to, and the corresponding job ID and run datetime. The raw policy log would be named similarly and stored in the top level of the subdirectory. The split parquet dataset would be given its own subdirectory at the same level of the policy log. See below for an example.
/data/rc/gpfs-policy/data/
└── list-policy_<job_id>_<device>_%Y%m%dT%H%M%S_<policy_type>/
├── list-policy_<job_id>_<device>_%Y%m%dT%H%M%S_<policy_type>.list.gather-info.gz
├── [gz-chunks]
└── parquet/
├── list-000.parquet
├── list-001.parquet
└── ...
This would necessitate multiple changes for run-mmpol.sh. An initial look suggests the following:
- Probably converting to
getoptto pass options instead of relying on environment variable inheritance. While not necessary for the restructuring, it would improve clarity - Need to actually apply the file tag. The current output log only has the job ID as an identifier (ex.
list-29582179.list.gather-info). I don't see anything resembling the tag in the file names in/data/rc/gpfs-policy/data.- It's verified the
mvcommand in line 57 is not being run. See the end of/data/rc/list-gpfs-dirs/src/run-policy/out/pol-29582179-list-path-external-scratch.outwhere it only saysoutfile=and[[ '' != '' ]]. If anything was assigned tooutfile, it would appear in the log.
- It's verified the
- No idea what
LIST_OUTPUT_FILEis referring to since that string doesn't appear in thelist-path-externalorlist-path-dirpluspolicy definitions.-Mis just a string replacement in the policy definition based on what's passed to it. Not sure that line is doing anything - Need to check
mmapplypolicyto see exactly how to get the name of the log file. If that's not possible, can just continue to use the current bones and then perform all of the renaming and organization after the fact.