Clustered Index Uniquifier Overflow

I had an interesting problem at work the other day. One of our nightly ETL steps was returning this error: “The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID . Dropping and re-creating the index may resolve this; otherwise, use another clustering key.” I admit it was something I had never seen before, so of course a quick trip to Google is in order. This was my first stop. After a little while longer I found the relevant documentation here, as well.

Basically, we had a large table, over 1.4 billion rows, and a stored procedure would delete some recent records and insert a bunch more. This is a simplified version of the schema:

CREATE TABLE dbo.Large_Table (
				ModelKey		INT NOT NULL,
				CustomerKey		INT NULL,
				MonthEndDate	DATE NULL,
				Exposure		DECIMAL(20,10) NULL,
				INDEX idx_Large_Table_MK CLUSTERED ( ModelKey )
				)

Things had been running fine up until December last year I believe, when the analyst who built it started getting failures. We let them handle their errors on their own, but then he asked for help. Since the clustered index on this table was not unique - it was on a column with only 5 unique values - the engine uses a 4-byte uniqueifier column to de-duplicate the rows internally, and it looks like that had filled up after all the deletes and inserts. 4 bytes can store up to 4,294,967,295, and I believe the uniqueifier will not backfill the deleted spots.

Anway, we needed to solve this. My first thought was to rebuild the index and hope that would cause the internal uniquifier to fill in the gaps. I dismissed that idea, though, as I thought rebuilding such a large index in one go would be too costly. Plus, it would only kick the can down the road. My second thought was to introduce a clusted primary key on the table, since I was told there was actually a combination of columns that would uniquely identify a row. However, that required altering several of the columns in the table to make them NOT NULL, and that operation is extremely slow for 1.4 billions records, so I killed that after a while. I tried this:

ALTER TABLE 	dbo.Large_Table
ALTER COLUMN	CustomerKey INT NOT NULL

ALTER TABLE 	dbo.Large_Table
ALTER COLUMN	MonthEndDate DATE NOT NULL

My third solution was to create a brand new table with the correct schema and clustered primary key, then insert all of the old records. Once that was done, you can drop the old table and use sp_rename to simply rename the new one to match the old one. I knew that so many records would use a lot of memory in the inevitable sort to insert into the clustered primary key. Thankfully, the fact that the old table had a clustered index on a column with only 5 distinct values was helpful - I could break my migration into 5 stages. Something like this:

CREATE TABLE dbo.Large_Table_Staging (
				ModelKey		INT NOT NULL,
				CustomerKey		INT NOT NULL,
				MonthEndDate	DATE NOT NULL,
				Exposure		DECIMAL(20,10) NULL,
				CONSTRAINT PK_Large_Table PRIMARY KEY CLUSTERED ( CustomerKey, ModelKey, MonthEndDate )
				)
				
INSERT INTO dbo.Large_Table_Staging ( ModelKey, CustomerKey, MonthEndDate, Exposure )
SELECT	ModelKey,
		CustomerKey,
		MonthEndDate,
		Exposure
FROM	dbo.Large_Table
WHERE	CustomerKey = 1 /*Repeat this for CustomerKey = 2, 3, 4, and 5*/

Once that is all done, then perform the old switcheroo:

/*Once the above has been run for all CustomerKey values, run this*/
DROP TABLE dbo.Large_Table

EXEC sys.sp_rename @objname = N'dbo.Large_Table_Staging'
                   @newname = 'Large_Table',
                   @objtype = 'OBJECT'

In the end, this worked. It took about 9 hours in total, but it all inserted and we had no duplicate records, thankfully. Fingers crossed I don’t have to deal with that again. It’s a good reminder to set up tables the right way the first time - if you know a certain combination of columns will uniquely identify a row, it’s probably worth creating a clustered primary key from the get-go.