Skip to content

Add --no-clobber to hive conversion tools

Added --no-clobber option to convert_flat_to_hive and hivize. It operates per tld + acq combination. It searches the hive directory for the specific directory cell containing the given tld and acq data. If there are any parquet files in that cell directory, it removes all rows with that tld+acq combination from the dataframe. If the dataframe is empty after that, it exits without writing any data. Otherwise, it writes the remaining dataframe entries to a hive dataset like normal.

Merge request reports

Loading