Transforming an Eighteen-Year-Old E-Commerce Platform – Technical Insight
All The Data In One Place
The ecommerce platform we were tasked with tackling had been operational for nearly two decades, built on a do-it-yourself solution that over time became outdated and inefficient – the standard case with organic platform growth. One of the primary issues was the architecture; all product data was stored in a single field, making it nearly impossible to filter or extract specific attributes programmatically.
Think about the fairly standard scenario where a t-shirt from a particular brand is available in three colors and different sizes (S, M, L), with shared attributes like fabric composition i.e. “100% cotton”. In an ideal database schema, such details would be stored in separate fields to allow efficient querying and filtering. However, in the existing system, this information was embedded within a single string: the product description.
Using AI To Add Structure
To transition from highly unstructured data to a structured format, we initially considered using regular expressions for data parsing. However, this approach was not scalable given the volume of products – about 29,000 with 90,000 variants – and the number of data fields for each product. It would have taken two people working full-time at least three months to do it manually.
Instead, we turned to AI, leveraging ChatGPT 4o to extract and categorize product attributes dynamically. The choice of ChatGPT 4o was strategic: it provided more stable and accurate responses compared to its predecessors — and provided faster API responses and greater cost-effectiveness.
Our first step was to build a prototype that demonstrated the feasibility of using AI for data transformation. We developed a simple system that fed product data and corresponding prompts into ChatGPT and then stored the output in a staging database. Tests using approximately 100 products showed promising results, proving that AI could accurately extract product attributes.
Scaling the Solution
With the prototype validated, we scaled the solution to handle the entire product catalog. This involved several key steps:
Data Migration: We implemented a dual-level migration process, focusing on both product-level and master-level data. This involved separating general attributes from unique ones, and restructuring text formats while removing hard-linked product descriptions.
Handling Multiple Languages: Given the platform's bilingual nature (Dutch and English), our prompts were designed to be dynamic, catering to different product categories and languages. This adaptability ensured that attributes were accurately extracted, irrespective of language variations.
Minimizing Hallucinations: One of the challenges was ensuring consistency in extracted data. For instance, the washing instructions of certain clothing products proved to be a real challenge to consistently extract. We iterated on our prompt engineering to minimize hallucinations and eventually had a series of prompts that worked over 99% of the time.
Final Data Parsing: After the initial data transformation, we conducted a final parse in the staging area to ensure completeness and accuracy. This step took about three days, with a total project timeline of 1.5 months from start to finish.
Process one
Retrieving unstructured product data from source
Process two
Transforming unstructured product data
Process three
Moving transformed and reviewed product data to the commerce engine
Lessons Learned
The project demonstrated the potential that off-the-shelf LLMs can have for data transformation in modernizing legacy systems. Our approach not only solved the immediate challenge but also provided a reusable framework for future projects. We got the customer a great result in way less time than doing it manually, that’s a win for both of us.
Here are some key takeaways and next steps:
Reusability – The process we developed is adaptable and can be used as a starting point for other ecommerce transformations. We’ve already got our sights set on some other customer systems that could benefit from this.
Automation Opportunities – Future improvements could focus on automating background processes, such as integrating generative AI for product image variations as well as advanced search algorithm development.
This project was a lot of fun because we started it mostly as a test before ever showing it to the customer. We all hear a lot about platforms like ChatGPT but to be able to build a solution quickly that offered a lot of value was fantastic and we can’t wait for the next opportunity to make it even better.
Got an aging ecommerce platform in need of an update? We use the best tools for the job, including AI, so get in touch with us today to find out how we can help.