Task
You want to convert a JSON file that Unstructured produces into a separate JSON file that uses a different JSON schema than the one that Unstructured uses.Approach
Use a Python package such as json-converter in your Python code project to transform your source JSON file into a target JSON file that conforms to your own schema.The
json-converter package is not owned or supported by Unstructured. For questions and
requests, see the Issues tab of the
json-converter repository in GitHub.Code
1
Install dependencies
In your local Python code project, install the json-converter
package.
2
Identify the JSON file to transform
- Find the local source JSON file that you want to transform.
-
Note the JSON field names and structures that you want to transform. For example, the JSON file might
look like the following (the ellipses indicate content omitted for brevity):
3
Create the JSON field mappings file
-
Decide what you want the JSON schema in the transformed file to look like. For example, the
transformed JSON file might look like the following (the ellipses indicate content omitted for brevity):
-
Create the JSON field mappings file, for example:
This file declares the following mappings:
- The
typefield is renamed tocontent_type. - The
element_idfield is renamed tocontent_id. - The
textfield is renamed tocontent. - The
page_numberfield nested insidemetadatais renamed topageand is nested insidecontent_properties. - All of the other fields (
filetype,languages, andfilename) are dropped.
json-converterpage on PyPI or the README in thejson-converterrepository in GitHub. - The
4
Add and run the transform code
-
Set the following local environment variables:
- Set
LOCAL_FILE_INPUT_PATHto the local path to the source JSON file. - Set
LOCAL_FILE_OUTPUT_PATHto the local path to the target JSON file. - Set
LOCAL_FIELD_MAPPINGS_PATHto the local path to the JSON field mappings file.
- Set
-
Add the following Python code file to your project:
- Run the Python code file.
-
Check the path specified by
LOCAL_FILE_OUTPUT_PATHfor the transformed JSON file.
Troubleshooting
Error when trying to import Mapping from collections
Issue: When you run your Python code file, the following error message appears: “ImportError: cannot import name ‘Mapping’ from ‘collections’”. Cause: When you use thejson-converter package with newer versions of Python such as 3.11 and later,
Python tries to use an outdated import in this json-converter package.
Solution: Update the json-converter package’s source code to use a different import, as follows:
-
In your Python project, find the
json-converterpackage’s source location, by running thepip showcommand:Note the path in the Location field. -
Use your code editor to the open the path to the
json-converterpackage’s source code. -
In the source code, open the file named
json_mapper.py. -
Change the following line of code…
…to the following line of code, by adding
.abc: - Save this source code file.
- Run your Python code file again.

