Loading problems...
TL;DR: Use 2D Dynamic Programming to compute the size of the largest square ending at every cell, then sum these sizes to get the total count.
The LeetCode 1277 problem, "Count Square Submatrices with All Ones," asks us to analyze a binary matrix (containing only 0s and 1s) and determine the total number of square submatrices that consist entirely of 1s. A square submatrix can be of any size (, , etc.), provided it fits within the grid boundaries.
This is a popular interview question because it requires optimizing a geometric search problem into a linear-time pass using state accumulation.
A naive approach involves iterating through every cell in the matrix and treating it as the top-left corner of a potential square. For each corner, we attempt to expand the square size () as long as all cells within that boundary are 1s.
Naive Algorithm:
(i, j) in the matrix.k.(i, j) to (i+k-1, j+k-1) contains only ones.python# Pseudo-code for Brute Force count = 0 for r in range(rows): for c in range(cols): if matrix[r][c] == 1: max_side = min(rows - r, cols - c) for k in range(1, max_side + 1): if is_all_ones(r, c, k): # This check takes O(k^2) count += 1 else: break
Why it fails: The time complexity of this approach is roughly . For a matrix, this results in roughly operations in the worst case, leading to a Time Limit Exceeded (TLE) error. The fundamental inefficiency is the redundant checking of cells; knowing that a square exists implies a square also exists, but the brute force re-verifies this from scratch.
The key intuition is to change the definition of what we are counting. Instead of asking "Does a square of size start here?", we ask "What is the maximum size of a square that ends at this cell (bottom-right corner)?"
Let dp[i][j] represent the side length of the largest square whose bottom-right corner is at (i, j).
If matrix[i][j] is 1, the size of the square ending at (i, j) is limited by its three neighbors:
(i-1, j).(i, j-1).(i-1, j-1).The Constraint:
To form a square of size at (i, j), the neighbors (Top, Left, and Top-Left) must all support squares of size at least . If any neighbor has a smaller square, the square at (i, j) is bottlenecked by that minimum size.
Therefore, the recurrence relation is:
Visual Description: Imagine building a square block by block. To place a block ending at , you need a block at the top, left, and top-left positions. To place a block, those neighbors must themselves be the corners of blocks. The smallest neighbor dictates the maximum extension possible.
Crucially, the value dp[i][j] also tells us exactly how many squares end at (i, j). If dp[i][j] = 3, it means a , a , and a square all end there. Thus, the sum of all values in the DP table is the answer.

total_squares = 0.i from 0 to m-1 and column index j from 0 to n-1.matrix[i][j] == 1:
i == 0 or j == 0, the max square size is 1. We just add 1 to total_squares. (In an in-place modification, we leave the 1 as is).i > 0 and j > 0, calculate the value: val = min(matrix[i-1][j], matrix[i][j-1], matrix[i-1][j-1]) + 1.matrix[i][j] (or the DP table) with val.val to total_squares.matrix[i][j] == 0, do nothing (it contributes 0).total_squares.The correctness relies on the geometric property of squares. A square of size ending at covers the region from to . For this to exist, the submatrix defined by must support a square of size , as must and .
By taking the minimum of these three neighbors and adding 1, we guarantee that we extend the square only as far as the "weakest" link allows, ensuring no 0s are included. Summing these values correctly counts all squares because a max-square of size implies exactly valid squares ending at that specific coordinate (sizes ).
cppclass Solution { public: int countSquares(vector<vector<int>>& matrix) { int m = matrix.size(); int n = matrix[0].size(); int totalSquares = 0; for (int i = 0; i < m; ++i) { for (int j = 0; j < n; ++j) { // Only process if the current cell is a potential corner of a square if (matrix[i][j] == 1) { // If we are not on the boundary, apply DP transition if (i > 0 && j > 0) { int top = matrix[i-1][j]; int left = matrix[i][j-1]; int diag = matrix[i-1][j-1]; // Update current cell with size of largest square ending here matrix[i][j] = min({top, left, diag}) + 1; } // Accumulate the size (which equals the count of squares ending here) totalSquares += matrix[i][j]; } } } return totalSquares; } };
javaclass Solution { public int countSquares(int[][] matrix) { int m = matrix.length; int n = matrix[0].length; int totalSquares = 0; for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { // If the cell is 1, it contributes to the count if (matrix[i][j] == 1) { // Check boundaries to avoid IndexOutOfBounds if (i > 0 && j > 0) { int top = matrix[i-1][j]; int left = matrix[i][j-1]; int diag = matrix[i-1][j-1]; // Update the matrix in-place with the recurrence relation matrix[i][j] = Math.min(Math.min(top, left), diag) + 1; } // Add the value of the current cell to total totalSquares += matrix[i][j]; } } } return totalSquares; } }
pythonclass Solution: def countSquares(self, matrix: List[List[int]]) -> int: rows = len(matrix) cols = len(matrix[0]) total_squares = 0 for i in range(rows): for j in range(cols): # We only care if the cell initiates or continues a square if matrix[i][j] == 1: # If inside the grid boundaries (not first row/col) if i > 0 and j > 0: # Find the bottleneck among the three neighbors matrix[i][j] = min(matrix[i-1][j], matrix[i][j-1], matrix[i-1][j-1]) + 1 # Add the size of the square ending at (i, j) to the total total_squares += matrix[i][j] return total_squares
Time Complexity: We iterate through each cell of the matrix exactly once. Inside the loop, we perform constant-time arithmetic operations (min and addition).
Space Complexity: or
matrix in-place to store our DP states. This is the optimal space complexity.The logic used in LeetCode 1277 Solution is a classic application of DP on grids. This specific "look back at neighbors" pattern appears in several other problems:
min of Top and Left plus current cost.Why does my solution get Index Out Of Bounds?
i-1 or j-1 without checking if i > 0 and j > 0.Why is my answer lower than expected?
min(top, left)) and ignoring the Top-Left diagonal.Why is my answer just the count of 1s?
Why does the naive solution TLE?