Mass updating product prices

Let’s tackle a common yet deceptively complex task: updating prices for a large number of products in Magento.

If you've ever tried to update prices for, say, a million products, you might have run into some serious performance issues. Let's explore why that happens, look at some examples of code that can cause trouble, and most importantly, how to do it right.

Updating product prices in bulk isn't just about getting the job done; it's about doing it efficiently and safely. Poor code can lead to performance bottlenecks, crashes, or even data corruption. Let's look at some common mistakes developers make and how to avoid them.

Bad Practice 1: Updating Products Individually in a Loop

The Code

<?php

declare(strict_types=1);

namespace Vendor\\Module\\Model;

use Magento\\Catalog\\Api\\ProductRepositoryInterface;
use Magento\\Catalog\\Model\\ResourceModel\\Product\\CollectionFactory as ProductCollectionFactory;
use Magento\\Framework\\Exception\\CouldNotSaveException;
use Magento\\Framework\\Exception\\InputException;
use Magento\\Framework\\Exception\\StateException;

readonly class PriceUpdater
{
    public function __construct(
        private ProductRepositoryInterface $productRepository,
        private ProductCollectionFactory $productCollectionFactory,
    ) {}

    /**
     * @throws StateException
     * @throws CouldNotSaveException
     * @throws InputException
     */
    public function updatePrices(): void
    {
        $productCollection = $this->productCollectionFactory->create();

        foreach ($productCollection->getItems() as $product) {
            $product->setPrice(10);
            $this->productRepository->save($product);
        }
    }
}

What's Going On Here?

At first glance, this code seems straightforward. We're looping through each product in the collection, setting the new price, and saving it. Simple, right? But under the hood, it's doing a lot more than you might realize.

Why It Sucks

  • Performance Bottleneck: Each call to $this->productRepository->save($product) initiates a separate database transaction. Imagine doing that a million times—it's like trying to fill a swimming pool with a teaspoon.
  • Resource Hog: Loading entire product models for each product consumes a significant amount of memory and CPU resources. Magento's product model is heavy, and when you're dealing with large datasets, this can bring your server to its knees.
  • Time Sink: The process can take hours or even days to complete, which is impractical for any real-world application. During this time, your system's performance may degrade, affecting other users.

How to Fix It

Instead of processing each product individually, we can batch the updates to minimize database interactions and reduce resource consumption. We'll explore how to do this effectively in the "Good Code" section.

Bad Practice 2: Bulk Saving the Entire Collection

The Code

<?php

declare(strict_types=1);

namespace Vendor\\Module\\Model;

use Magento\\Catalog\\Model\\ResourceModel\\Product\\CollectionFactory as ProductCollectionFactory;

readonly class PriceUpdater
{
    public function __construct(
        private ProductCollectionFactory $productCollectionFactory,
    ) {}

    public function updatePrices(): void
    {
        $productCollection = $this->productCollectionFactory->create();

        foreach ($productCollection->getItems() as $product) {
            $product->setPrice(10);
        }

        $productCollection->save();
    }
}

What's Going On Here?

In an attempt to optimize, this code avoids saving each product individually. Instead, it sets the new price on each product and then tries to save the entire collection in one go.

Why It Sucks

  • Deadlocks and Failures: Saving all changes in a single transaction can overwhelm the database, leading to deadlocks or transaction timeouts. Databases are designed to handle many small transactions more efficiently than one gigantic one.
  • High Memory Usage: Loading and modifying the entire collection means all products are held in memory simultaneously. This can easily exceed your memory limits, causing crashes or slowdowns.

How to Fix It

Just like before, the solution lies in processing smaller batches. We'll break the collection into manageable chunks and process them sequentially.

Bad Practice 3: Using Direct SQL Queries for Price Updates

The Code

<?php

declare(strict_types=1);

namespace Vendor\\Module\\Model;

use Magento\\Framework\\App\\ResourceConnection;

readonly class PriceUpdater
{
    public function __construct(
        private ResourceConnection $resourceConnection,
    ) {}

    public function updatePrices(): void
    {
        $connection = $this->resourceConnection->getConnection();
        $table = $connection->getTableName('catalog_product_entity_decimal');
        $priceAttributeId = $this->getPriceAttributeId();

        $connection->update(
            $table,
            ['value' => 10],
            ['attribute_id = ?' => $priceAttributeId]
        );
    }

    private function getPriceAttributeId(): int
    {
        // Implement code to retrieve the price attribute ID
    }
}

What's Going On Here?

This code bypasses Magento's ORM entirely, executing a direct SQL update on the price attribute in the database.

Why It Sucks

  • Bypasses Magento's ORM and Events: Magento's ORM handles more than just data storage. It manages data integrity, triggers events, and ensures that other parts of the system are notified of changes. Bypassing it can lead to inconsistent application state.
  • High Risk: Direct database manipulation is risky. If the table structure changes in future Magento updates, this code could break, leading to errors or data loss.
  • No Cache Invalidation: Magento uses various caches to improve performance. Direct SQL updates won't invalidate these caches, meaning the frontend may not reflect the updated prices until the caches are manually cleared.

How to Fix It

We need to update prices using Magento's APIs, which handle all the necessary background work. We'll use methods designed for batch updates that respect the application's integrity.

The Right Way: Efficient Batch Processing

The Solution

Here's how to update product prices efficiently:

<?php

declare(strict_types=1);

namespace Vendor\\Module\\Model;

use Exception;
use Magento\\Catalog\\Model\\ResourceModel\\Product\\Action;
use Magento\\Catalog\\Model\\ResourceModel\\Product\\CollectionFactory;

readonly class PriceUpdater
{
    public function __construct(
        private Action $productAction,
        private CollectionFactory $productCollectionFactory
    ) {}

    /**
     * @throws Exception
     */
    public function updatePrices(): void
    {
        $batchSize = 1000;
        $collection = $this->productCollectionFactory->create();
        $collection->setPageSize($batchSize);
        $pages = $collection->getLastPageNumber();

        for ($page = 1; $page <= $pages; $page++) {
            $collection->setCurPage($page);
            $collection->clear();
            $collection->load();

            $productIds = $collection->getAllIds();
            $this->productAction->updateAttributes(
                $productIds,
                ['price' => 10],
                0 // Use store ID 0 for global scope when updating a website-scoped attribute
            );
        }
    }
}

What's Going On Here?

  • Batch Size: We're defining a manageable batch size (e.g., 1000 products per batch) to limit memory usage.
  • Pagination: We use pagination to process one batch at a time.
  • Update Attributes: Instead of loading each product model, we use $this->productAction->updateAttributes() to update the price attribute for all products in the batch.

Why It Doesn’t Suck

  • Efficient Use of Resources: By processing smaller batches, we reduce memory consumption and avoid overloading the database.
  • Faster Execution: Updating attributes in bulk using Magento's API is much faster than saving each product individually.
  • Data Integrity: Using Magento's methods ensures that all the necessary events are triggered, caches are invalidated, and the application's state remains consistent.

Step-by-Step Explanation

  1. Set Batch Size: Decide how many products to process at once based on your server's capabilities.
  2. Load Collection: Create a product collection and set the page size to the batch size.
  3. Calculate Pages: Determine the total number of pages needed to process all products.
  4. Iterate Through Pages: Loop through each page, loading only that subset of products.
  5. Collect Product IDs: Retrieve the IDs of the products in the current batch.
  6. Update Prices: Use $this->productAction->updateAttributes() to update the price attribute for all products in the batch.
  7. Repeat: Continue the loop until all products have been updated.

Managing Indexers

Why Indexes Matter

Magento uses indexes to optimize performance for frontend operations like product listings, searches, and more. When you bulk update products, the related indexes need to be updated to reflect those changes.

Failing to manage indexes can result in outdated data being displayed to customers, slow frontend performance, and even errors.

Disable Automatic Indexing Before Updates

Before starting mass updates, set indexers to "Update by Schedule" to prevent them from running after every small change.

Reindex After Batch Operations

Once all updates are complete, manually trigger the reindexing process.

<?php

declare(strict_types=1);

namespace Vendor\\Module\\Model;

use Magento\\Indexer\\Model\\Indexer\\CollectionFactory as IndexerCollectionFactory;

readonly class IndexerManager
{
    public function __construct(
        private IndexerCollectionFactory $indexerCollectionFactory,
    ) {}

    public function disableIndexers(): void
    {
        $indexerCollection = $this->indexerCollectionFactory->create();

        foreach ($indexerCollection as $indexer) {
            $indexer->setScheduled(true);
        }
    }

    public function reindexAll(): void
    {
        $indexerCollection = $this->indexerCollectionFactory->create();

        foreach ($indexerCollection as $indexer) {
            $indexer->reindexAll();
        }
    }
}

Benefits

  • Performance Improvement: Prevents indexers from slowing down the update process by running after each change.
  • Data Consistency: Ensures that all indexes are up-to-date once the updates are complete.

When a "Bad" Practice Might Be the Right Choice

Direct SQL Queries for High-Performance Requirements

While we've discussed why using direct SQL queries to update product prices can be risky, there are situations where this approach might be justified—especially for experienced Magento developers dealing with extreme performance demands.

When to Consider Direct SQL Updates

  • High Throughput Needs: If you're handling an enormous dataset and need to perform updates as quickly as possible, direct SQL queries can offer significant speed advantages over Magento's ORM. The overhead of the ORM can be bypassed, allowing for faster execution times.
  • Controlled Environments: In scenarios where you have full control over the database and a deep understanding of Magento's data structures, you might safely execute direct SQL queries without adversely affecting the system.
  • One-Time Bulk Operations: For infrequent, large-scale updates—such as a major price adjustment across the entire catalog—using direct SQL can be efficient if proper precautions are taken.

Precautions and Best Practices

  • Comprehensive Backups: Always ensure you have a full backup of your database before running direct SQL queries. This way, you can restore the original state if something goes wrong.
  • Disable Event-Driven Processes: Recognize that bypassing the ORM also means skipping any events or observers that normally trigger during standard operations. Be prepared to handle any processes that these events would have covered.
  • Manually Clear Caches and Reindex: After performing direct updates, manually clear relevant caches and run all necessary indexers to ensure the frontend reflects the changes.
<?php

declare(strict_types=1);

namespace Vendor\\Module\\Model;

use Magento\\Framework\\App\\Cache\\Manager as CacheManager;
use Magento\\Framework\\App\\Cache\\TypeListInterface;
use Magento\\Indexer\\Model\\Indexer\\CollectionFactory as IndexerCollectionFactory;

readonly class CleanupManager
{
    public function __construct(
        private TypeListInterface $cacheTypeList,
        private IndexerCollectionFactory $indexerCollectionFactory,
        private CacheManager $cacheManager,
    ) {}

    public function clearCache(): void
    {
        $this->cacheTypeList->cleanType('full_page');

        // You can also clear all caches if needed
        $this->cacheManager->flush($this->cacheManager->getAvailableTypes());
    }

    public function reindexAll(): void
    {
        $indexerCollection = $this->indexerCollectionFactory->create();

        foreach ($indexerCollection as $indexer) {
            $indexer->reindexAll();
        }
    }
}
  • Thorough Testing: Test your SQL queries extensively in a staging environment identical to production. Validate not only that the data updates correctly but also that the application functions as expected afterward.
  • Limit Scope and Impact: Be precise with your SQL queries to affect only the intended records. Avoid broad updates that could unintentionally modify more data than expected.

Additional Thoughts

Using direct SQL queries is a double-edged sword—while they can offer performance benefits, they come with significant risks. This approach should be reserved for advanced developers who thoroughly understand Magento's database schema and the potential implications.

For most use cases, sticking with Magento's standard practices is the safer and more maintainable choice.

Recap

Updating product prices in bulk can be tricky, but with the right approach, you can do it efficiently and safely. We've looked at common mistakes that can lead to performance issues and how to avoid them. By using efficient batch processing and properly managing indexes, you ensure that your Magento store remains fast, reliable, and consistent.

Remember, writing good code is about more than just making it work—it's about making it work well. Always consider the impact of your code on system resources, performance, and maintainability.

Complete and Continue  
Extra lesson content locked
Enroll to access all lessons, source code & comments.
Enroll now to Unlock