Modify churn structure to account for storage affected (!43) · Merge requests · rc / gpfs-policy

Matthew K Defenderfer requested to merge mdefende/gpfs-policy:enh-add-storage-amount-for-churned-files into main Jan 22, 2025

This MR modifies the previous churn algorithm and database to include the number of bytes affected by file deletion, creation, and modification to see exactly how storage is impacted by file volatility. This did involve changing the exact procedure of how churn is calculated. Instead of using the duplicate method described in !42 (merged), policy dataframes are merged and their sizes, access, and modification data compared. Additionally, two forms of storage change for modified files are calculated. One value is the sum of the total sizes of the new versions of the files, and the other is the net change in storage between the old and new versions.

Major Changes

Churn algorithm is changed to use dataframe merges instead of concatentation and marking duplicates
The following were added to the churn database table:
Fields for storage affected by file changes (total modified, net modified, total deleted, and total created)
Fields for files which were accessed but not churned and the corresponding sizes of those files

Minor Changes

Added example plots using churned storage and files accessed to the churn-analysis.ipynb notebook

Modify churn structure to account for storage affected

Major Changes

Minor Changes

Merge request reports