Repository: rules-of-origin-builder-xi

Downloads RoOs

README

CHAPTERS_TO_PROCESS=“”
INSERT_HYPERLINKS=[0|1]
MIN_CODE=Replace this with a valid 10-digit commodity code to restart processing from a specific code
MAX_CODE=Replace this with a valid 10-digit commodity code to stop processing at a specific code
SPECIFIC_COUNTRY=Overrides the country list to pull data from a specific country only
SPECIFIC_CODE=“Overrides the commodity list to pull data for a single commodity only

This file contains a list of all of the countries for which we are scraping data
Each country item is structured as follows:

{ "code": "AF", "prefix": "gsp", "omit": 0, "source": "classic" }

where:
- code is the 2-digit ISO country code to that is used as an input into the scrape function
- prefix is the code against which the data is registered in the local database
- omit (0 | 1) which determines if the country is to be ‘skipped’ or not
- source indicates the following, dependent on the value:
  - if ‘classic’, then the RoO are structured using the old fashioned Trade Helpdesk structures
  - if ‘product’, then using the new MADB / ROSA product-specific rules (PSR)
- At the time of writing, GSP, Turkey and Kenya used the old style

python scrape_roo.py

This will download the RoO from the source
Saves as complete JSON documents locally
Use process_roo to process them: no processing done in this step (just downloading)

python process_roo_product.py – These are the more modern structured RoO documents

python process_roo_classic.py – These are the more old-structured RoO documents (like kenya, turkey)

This takes the downloaded RoO JSON source files and converts them into the necessary data objects + stores in local Postgres database

python export_to_json.py

This runs through the Postgres database and creates a JSON file that can be used in the OTT prototype

This is all a lot harder than the work on Word documents, as it does not allow you to change the originals

Having downloaded all of the JSONs for each chapter in a previous step …
Create a new ClassicRoo object against each
Main cleansing rules are stored in classic_roo > cleanse_rules
Main code to form relevant data are stored in classic_roo > deconstruct_rules_html
deconstruct_rules_html creates a ClassicRooRow object for every table row in the MADB html
ClassicRooRow is where the actual formation of destination data is completed
1. It creates a series of ClassicRooCell objects under the self.cells object