Troubleshooting TFS DB Growth

Spoiler: This was caused by ReleaseManager 2013 retaining builds. Read on for the work around that was put in place.

As part of a project I was working on recently; I put in a lot of work around TFS Builds, Unit/Integration Tests & continous delivery using ReleaseManager.

It was a fairly standard pipeline:
→ Check in → Unit & Integration Tests Run → Build → Release To Test

Up until this project TFS Build hadn’t been used, TFS was just used for source control and some workitem tracking, it was hosted on a VM and space was never an issue.

After a few months of the project I started getting reports that the TFS DB size was growing constantly. The IT team had to keep allocating more space to the disks used for storing the DB backups.

It turns out that the Tfs_DefaultCollection had grown from about 6GB to over 50GB and the growth was constant.

After doing a little research online I found the following query that gives some insight into which tables are taking up the most space in the db.

USE Tfs_DefaultCollection

SELECT
    t.NAME AS TableName,
    s.Name AS SchemaName,
    p.rows AS RowCounts,
    SUM(a.total_pages) * 8 AS TotalSpaceKB,
    SUM(a.used_pages) * 8 AS UsedSpaceKB,
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
    sys.tables t
INNER JOIN
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
    sys.schemas s ON t.schema_id = s.schema_id
GROUP BY
    t.Name, s.Name, p.Rows
ORDER BY
    TotalSpaceKB desc
Table 1. Tables taking up the most space
Table Row Count Size (GB)

tbl_TestResult

367386

19.660805

tbl_TestMessageLogEntry

22354468

12.085457

tbl_Content

62475

4.4598694

tbl_BuildInformation2

22736801

7.6754837

Normally TFS retains builds as defined by a build retention policy, ours tells it to only keep the last 10 builds.
However, there is a known issue with the Microsoft Release Management 2013 that causes every build that is involved in a release to be retained indefinitely, this doesn’t really fit with the continous delivery model and has been changed in newer versions.
Every check in we do triggers a build which triggers a release so all our builds are retained.

This has issues for the storage capacity on the drops folder where builds are dropped and also storage capacity in the database.

The issue was exacerbated for this project because of the logging setup in the integration tests. Each test class library gets a DB deployed and dropped and seed scripts run, all the logging from these activities is echoed to the console. The application logging is also set to console output.
All this log info for each test run is stored in the TFS database which grew unmanageably.
The logging was turned off on the automated builds so this should help keep the size down for newer builds

The application logs are driven by a config file but for tests I set the logs to ConsoleOut in [SetUpFixture] as the log information is available for each Unit Test. When I realised that all the logging info was retained in the DB I replaced the ConsoleOut logger with a NoOp logger on the build server.
    public static class TestLogSetup
    {
        public static void Logger()
        {
            Console.WriteLine(Environment.MachineName);
            if (Environment.MachineName.Contains("<ServerName>"))
            {
                Console.WriteLine("No Op Logger");
                LogManager.Adapter = new Common.Logging.Simple.NoOpLoggerFactoryAdapter();
            }
            else
            {
                Console.WriteLine("Console Logger");
                LogManager.Adapter = new Common.Logging.Simple.ConsoleOutLoggerFactoryAdapter();
            }
        }
    }

To manage the retained builds I wrote a powershell script that does the following steps for every build older than 2 weeks:

  1. Update the Retain Indefinitely flag to false

  2. Delete the build

  3. Destroy the build

Delete the build
This is what is available through the UI, it is only a logical delete and while it does delete artifacts from the network drop folders, it does not delete rows from the db

Destroy the build
This is the delete from the db. This is not available though the UI and is only available through the API

This is the script that I wrote to destroy the builds.
param ($serverName = 'http://<snip>:8080/tfs/DefaultCollection')

[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.Client")
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.Build.Client")
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.VersionControl.Client")
$tfs = [Microsoft.TeamFoundation.Client.TeamFoundationServerFactory]::GetServer($serverName)

$buildServer =  $tfs.GetService([Microsoft.TeamFoundation.Build.Client.IBuildServer])
$vcs =  $tfs.GetService([Microsoft.TeamFoundation.VersionControl.Client.VersionControlServer])

$projects = $vcs.GetAllTeamProjects($true) # We will run this for all team projects

foreach ($project in $projects){
    Write-Host $project.Name

    # Get all the build definitions for a given project
    $buildDefs = $buildServer.QueryBuildDefinitions($project.Name)

    foreach ($buildDef in $buildDefs){
        Write-Host "-" + $buildDef.Name
        # Delete all builds older than 14 days. This is a bit aggressive and can be pushed to a month once the logs are smaller on automated builds.
        $endDate = (Get-Date).AddDays(-14)
        # Keep going back until we hit dec 2014
        while($endDate.Year -ge 2014){
            $year = $endDate.Year
            $month = $endDate.Month
            write-host $year $month
            # Create a search object for the project/build def
            $buildDetailSpec = $buildServer.CreateBuildDetailSpec($project.Name, $buildDef.Name)
            $startDate = $endDate.addMonths(-1)
            $buildDetailSpec.MaxFinishTime = $endDate # Search criteria
            $buildDetailSpec.MinFinishTime = $startDate # Search criteria
            # only build info, not workitems, labels, etc
            $buildDetailSpec.InformationTypes = $null

            ## This is important for the query, builds that are deletd by retention
            ## or the ui are only logically deleted, not destroyed in the db
            $buildDetailSpec.QueryDeletedOption = [Microsoft.TeamFoundation.Build.Client.QueryDeletedOption]::IncludeDeleted

            # Search all the things
            $builds = $buildServer.QueryBuilds($buildDetailSpec)
            if($builds.Builds.Length -ge 1){
                Write-Host "before update" $builds.Builds
                foreach($build in $builds.Builds){
                    # Need to turn off the Keep Forever (Retain) flag set by Release Manager.
                    if ($build.KeepForever -eq $true){
                        Write-Host "Updating build " $build.Uri
                        # Gets an updatable ref to the build
                        $buildToEdit = $buildServer.GetBuild($build.Uri)
                        # Edit the build flag
                        $buildToEdit.KeepForever = $false;
                        # save the build back to the tfs
                        $buildServer.SaveBuilds(@($buildToEdit))
                    }
                }
                # refresh the query given that some of the builds have been updated
                $builds = $buildServer.QueryBuilds($buildDetailSpec)
                Write-Host "after refresh" $builds.Builds
                # Delete the build, test results, symbols, drop, etc.
                $buildServer.DeleteBuilds($builds.Builds,
                [Microsoft.TeamFoundation.Build.Client.DeleteOptions]::All)
                # Destroy the db records
                $buildServer.DestroyBuilds($builds.Builds)
            }
            # rolling back the years.
            $endDate = $endDate.addMonths(-1)
        }
    }
}
dummyproject.png
Figure 1. I added the script to a dummy project in source control.
schedule.png
Figure 2. This project was set to run every morning at 3am
script.png
Figure 3. The powershell script is set to run after the build
logs.png
Figure 4. The output from the script is available in the diagnostics tab of the build info from TFS.

The script could have been set to run as a scheduled task somewhere but I liked the idea of having it run within TFS as it made it easy for the dev team to monitor.

The impact of destroying the old builds was significant.

Table

Row Count

Size (GB)

tbl_Content

69733

1.889136

tbl_TestResult

174868

0.323192

tbl_LocalVersion

413449

0.122072

tbl_BuildCodeChange

148950

0.078016

tbl_Version

101484

0.068872

tbl_BuildInformation2

88673

0.064232

tbl_Command

92974

0.039192

Tracking down the cause of the size increase was an interesting exercise, hopefully some of this detail may help someone else in future.

comments powered by Disqus