Explained: The Product Matching Process

Explained: The Product Matching Process

As a brand, you might want to monitor how your products compare between various retailers. For instance, in terms of availability, content completeness scores, review ratings and amounts, etc. Or you might want to check which part of your assortment is actually live on your reseller's websites. 

In order to do that reliably and provide the ability to compare products across channels, it is not good enough if we 'just find all products of your brand' on these sites: we need to be able to identify each product uniquely and link them together. Basically, acknowledging that these are actually the same product, instead of a bunch of separate product detail pages on different retailers which happen to be of the same brand. SiteLucent uses unique product identifiers, such as a GTIN (EAN, UPC) or a brand/MPN combination, to identify a product and be able to link products across different channels together. In most cases this product matching process goes well. In some cases it does not, because crucial information on an eCommerce product page is missing or is displayed incorrectly. 

In this article we explain what we can do to identify products and corresponding eCommerce product pages correctly, so that you see the correct data in your dashboards. 


Unique product identifiers

Unique product identifiers are assigned to each product by the manufacturer. A Unique product identifier can make it easier for shoppers to find your products or find fitting spare parts, but also makes it easier for SiteLucent to identify 'your products' - the products whose data you ultimately want to see in your dashboards.  

The most used numbers are the EAN and UPC code (both part of GTIN, in Europe and North America respectively) and the brand's MPN (Manufacturer Part Number). Most online retailers mention either one or both codes on their product detail pages. When that is the case, SiteLucent can and will use these code(s) to identify the product and link it to the proper internal (SiteLucent) product ID.

Types of unique product identifiers:

Attribute

Name

Description

gtin

UPC

  • Used in primarily North America.
  • Universal Product Code (UPC), also called GTIN-12 and UPC-A.
  • 12 numeric digits.
  • A unique numerical identifier for commercial products that's usually associated with a barcode printed on retail merchandise.

gtin

EAN

  • Used primarily outside of North America.
  • European Article Number (EAN), also called GTIN-13.
  • Typically 13 numeric digits (can occasionally be either eight or 14 numeric digits).
  • A unique numerical identifier for commercial products that's usually associated with a barcode printed on retail merchandise.

gtin

JAN

  • Used only in Japan.
  • Japanese Article Number (JAN), also called GTIN-13.
  • 8 or 13 numeric digits.
  • A unique numerical identifier for commercial products that's usually associated with a barcode printed on retail merchandise.

gtin

ISBN

  • Used globally.
  • International Standard Book Number (ISBN).
  • ISBN-10: 10 numeric digits (last digit may be "X" which represents the number "10").
  • Note that this format was deprecated in 2007, and not all books can be represented using ISBN-10.
  • ISBN-13 (recommended): 13 numeric digits and typically starts with either 978 or 979.
  • A unique numerical identifier for commercial books published since 1970 that can be found on the back of the book along with the barcode.

brand

Brand

  • Used globally.
  • The brand of the product.
  • Should be clearly visible on the front of the product packaging or label.

mpn

MPN

  • Used globally.
  • Manufacturer Part Number (MPN).
  • Alphanumeric digits (various lengths).
  • The number which uniquely identifies the product to its manufacturer.

Source: https://support.google.com/merchants/answer/160161?hl=en

 

Let's look at an example of a product detail page on retailer bol.com (image 1), where the unique identifier is clearly visible and it is no problem for SiteLucent to identify this specific product.

example unique product identifier on bol.com

Image 1: unique product identifier at bol.com product detail page
 

We recommend a retailer to display the official Brand/MPN and/or GTIN on every product detail page whenever possible. This helps customers to find and identify products more easily, aiding their research when they're comparing different products and/or shops to buy it from, or helps them recognize a product they're seeking support for, potentially reducing unnecessary customer support contacts or even returns. 

It also helps us to properly identify the products and match them to our existing database of products, and to products on the product lists you might have uploaded and are using within our tooling to segment, filter or analyze your assortment. 

 

What if products are not automatically recognized?

In some cases, SiteLucent will not be able to properly identify the product that is displayed on a product detail page. This can be due to various reasons, ranging from partial, incorrect, or even missing identification information. 

Common situations which makes automatic identification impossible: 

  • No GTIN or brand/MPN present at all on the PDP (nor in the underlying source code); 
  • Misspelled or partial MPN codes. For instance, with additional characters in between, or missing parts of the official code. 
  • Invalid GTIN code(s). We always check the validity of GTIN codes we find. Codes that fail the 'valid GTIN code test' get rejected, to avoid polluting our system with incorrect identifiers. 
  • Incorrect brand name. An MPN code is only unique in combination with the brand name. If the brand name is misspelled, or different from the official brand name, our system will 'think' it's a different product and record is as such. 

In such cases, SiteLucent will not be able to automatically identify and recognize these products. We will still record the PDP and all the product information on it (i.e. available content, price, availability, reviews, etc.), but we won't be able to link it to the proper SiteLucent product ID in our database, and as a consequence to other products across different channels. In addition, such products will also not appear in your dashboards whenever you use a product list (or tag(s) from a product list) to filter data, as that uses the same product identifiers to link products on the product list to products in our database, extracted from product detail pages. 

If automatic identification is not possible, we can help the system by telling it exactly which product a product detail page contains. The 'matching algorithm' will use such information whenever available above any (other) information found on the product detail page itself. 

Every retailer uses a proprietary, unique code (or number) to identify a product in their own systems. They need such a unique number themselves to ensure they are, for instance, putting the correct product in your basket, picking the correct product from their selves, etc. 

This proprietary, retailer-specific, product code we call the "Shop Product ID". This Shop Product ID comes in various different formats and flavors, but they all have 2 important things in common: it uniquely identifies an individual product on their site AND every retailer has such numbers, either visibly on the page and/or 'hidden' in the page's source code. 

A well-known example of such a Shop Product ID is the "ASIN" used by Amazon.

This process of 'telling the system which product is on a PDP' is called 'product mapping' and can be added to our system by uploading a so-called 'mapping file'. With such a file, you essentially establish a link between a PDP's Shop Product ID and the common identifiers GTIN and/or Brand/MPN; which our system will then use to identify and 'map' the product correctly to our database of products. 

 

How to create a mapping file?

Here are the steps to create a mapping file:

  1. Log in to your SiteLucent account
  2. Enter the default 'Products Listing & Pricing' dashboard (or any other dashboard that contains a data table widget)
  3. Set your filters: Choose the retailer that gives issues identifying products (we recommend doing one retailer at a time).
  4. Scroll down until you see a data table widget (image 2)
  5. Export the widget data into a CSV file, by clicking the 3 stripes icon (hamburger button) on the upper right corner of the data table widget and choose the option 'data export' > CSV (image 3). This CSV file is the mapping file, but we need to take a few more steps to get is ready!
  6. Open the exported CSV mapping file and delete all columns besides: Shop product ID,  MPN code, Brand name and Retailer name (see an example in image 4) 

 

Make sure you don’t change or delete the values in the column 'Shop Product ID' !

SiteLucent product listing and pricing data table in dashboard

Image 2: A data table widget in the SiteLucent dashboard 'Products Lising & Pricing'.


how to export widhget data sitelucent

Image 3: How to make a data export of a data table widget in SiteLucent

Example of CSV mapping file with 5 mandatory columns

Image 4: Example of CSV mapping file with 5 mandatory columns

 

The final (and most crucial) steps

Once we have a list of Shop Product ID's, we can manually add the correct unique product identifiers to make is easier and faster for the algorithm to make the match between a product and the corresponding eCommerce product page of the retailer in question. 

  1. Fill or correct all MPN / GTIN codes Again: Please make sure that you don’t change the Shop Product ID!
  2. Save the CSV mapping file and send it to support@sitelucent.com.
  3. Sit back and relax. We will take it from here! 

 

Identify Products on Amazon

ASIN is a unique identifier that Amazon uses to mark their products. The ASIN (Amazon Standard Identification Number) consists of 10-characters and is unique within a specific marketplace. This means the ASIN for an identical product sold on Amazon Germany is different from for instance Amazon UK. 

To track and monitor products that are listed on one of the Amazon platforms, a list of ASIN’s for a specific Amazon marketplace is needed. The easiest way to create a list with ASIN's  is to export it via the Vendor- or Seller Central. In addition to ASIN, SiteLucent needs the brand name and EAN and/or MPN.

If you only have the ASIN and do not have the EAN of a product, you can make use of this tool to convert ASIN to EAN. 

Do you have an Amazon seller account? Then you can easily export your listings. Read more about it here: How to Export Amazon Listings? - General Selling Questions