Home AI Amazon Forecast now supports accuracy measurements for individual items

Amazon Forecast now supports accuracy measurements for individual items

November 25, 2020

249

We’re excited to announce that you can now measure the accuracy of forecasts for individual items in Amazon Forecast, allowing you to better understand your forecasting model’s performance for the items that most impact your business. Improving forecast accuracy for specific items—such as those with higher prices or higher costs—is often more important than optimizing for all items. With this launch, you can now view accuracy for individual items and export forecasts generated during training. This information allows you to better interpret results by easily comparing performance against observed historical demand, aggregating accuracy metrics across custom sets of SKUs or time periods, or visualizing results without needing to hold out a separate validation dataset. From there, you can tailor your experiments to further optimize accuracy for items significant for your needs.

If a smaller set of items is more important for your business, achieving a high forecasting accuracy for those items is imperative. For retailers specifically, not all SKUs are treated equally. Usually 80% of revenue is driven by 20% of SKUs, and retailers look to optimize forecasting accuracy for those top 20% SKUs. Although you can create a separate forecasting model for the top 20% SKUs, the model’s ability to learn from relevant items outside of the top 20% is limited and accuracy may suffer. For example, a bookstore company looking to increase forecasting accuracy of best sellers can create a separate model for best sellers, but without the ability to learn from other books in the same genre, the accuracy for new best sellers might be poor. Evaluating how the model, which is trained on all the SKUs, performs against those top 20% SKUs provides more meaningful insights on how a better forecasting model can have a direct impact on business objectives.

You may instead look to optimize your forecasting models for specific departments. For example, for an electronic manufacturer, the departments selling the primary products may be more important than the departments selling accessory products, encouraging the manufacturer to optimize accuracy for those departments. Furthermore, the risk tolerance for certain SKUs might be higher than others. For long shelf life items, you may prefer to overstock because you can easily store excess inventory. For items with a short shelf life, you may prefer a lower stocking level to reduce waste. It’s ideal to train one model but assess forecasting accuracy for different SKUs at different stocking levels.

To evaluate forecasting accuracy at an item level or department level, you usually hold a validation dataset outside of Forecast and feed your training dataset to Forecast to create an optimized model. After the model is trained, you can generate multiple forecasts and compare those to the validation dataset, incurring costs during this experimentation phase, and reducing the amount of data that Forecast has to learn from.

Shivaprasad KT, Founder and CEO of Ganit, an analytics solution provider, says, “We work with customers across various domains of consumer goods, retail, hospitality, and finance on their forecasting needs. Across these industries, we see that for most customers, a small segment of SKUs drive most of their business, and optimizing the model for those SKUs is more critical than overall model accuracy. With Amazon Forecast launching the capability to measure forecast accuracy at each item, we are able to quickly evaluate the different models and provide a forecasting solution to our customers faster. This helps us focus more on helping customers with their business operation analysis and less on the manual and more cost-prohibitive tasks of generating forecasts and calculating item accuracy by ourselves. With this launch, our customers are able to experiment faster incurring low costs with Amazon Forecast.”

With today’s launch, you can now access the forecasted values from Forecast’s internal testing of splitting the data into training and backtest data groups to compare forecasts versus observed data and item-level accuracy metrics. This eliminates the need to maintain a holdout test dataset outside of Forecast. During the step of training a model, Forecast automatically splits the historical demand datasets into a training and backtesting dataset group. Forecast trains a model on the training dataset and forecasts at different specified stocking levels for the backtesting period, comparing to the observed values in the backtesting dataset group.

You can also now export the forecasts from the backtesting for each item and the accuracy metrics for each item. To evaluate the strength of your forecasting model for specific items or a custom set of items based on category, you can calculate the accuracy metrics by aggregating the backtest forecast results for those items.

Hire a Hardware Engineer.

You may group your items by department, sales velocity, or time periods. If you select different stocking levels, you can choose to assess the accuracy of certain items at certain stocking levels, while measuring accuracy of other items at different stocking levels.

Lastly, now you can easily visualize the forecasts compared to your historical demand by exporting the backtest forecasts to Amazon QuickSight or any other visualization tool of your preference.

Forecast provides different model accuracy metrics for you to assess the strength of your forecasting models. We provide the weighted quantile loss (wQL) metric for each selected distribution point, and weighted absolute percentage error (WAPE) and root mean square error (RMSE), calculated at the mean forecast. For more information about how each metric is calculated and recommendations for the best use case for each metric, see Measuring forecast model accuracy to optimize your business objectives with Amazon Forecast.

Although Forecast provides these three industry-leading forecast accuracy measures, you might prefer to calculate accuracy using different metrics. With the launch of this feature, you can use the export of forecasts from backtesting to calculate the model accuracy using your own formula, without the need to generate forecasts and incur additional cost during experimentation.

After you experiment and finalize a forecasting model that works for you, you can continue to generate forecasts on a regular basis using the CreateForecast API.

Exporting forecasts from backtesting and accuracy metrics for each item

To use this new capability, use the newly launched CreatePredictorBacktestExportJob API after training a predictor. In this section, we walk through the steps on the Forecast console using the Bike Sharing dataset example in our GitHub repo. You can also refer to this notebook in our GitHub repo to follow through these steps using the Forecast APIs.

The bike sharing dataset forecasts the number of bike rides expected in a location. There are more than 400 locations in the dataset.

On the Forecast console, create a dataset group.