Recursively Updating Git Submodules

I had fun writing about how I work with Git yesterday. I thought I’d continue on that thread.

I have a solid set of code libraries that I’ve written that latch into the WordPress themes we produce at iThemes. Each time code is duplicated across different repositories, I break that code out and make it into a separate repository. I then link it back into the project as a submodule. This makes it extremely-easy to keep duplicated code across numerous repositories updated with little or no fuss.

After cloning a repository, simply run git submodule init followed by git submodule update in order to initialize all the submodules and update their container folder with the content of the submodule’s repository. For a long time, this is exactly what I did when I would clone a theme repository to start working on it. However, this quickly wasn’t enough.

The problem happened as soon as I added a submodule to a repository that was also a submodule of other repositories. Doing the submodule init and update process wouldn’t do everything I needed in this case as there would be submodules in some subfolder that haven’t been set up.

I didn’t want to get into a habit of always switching to other directories and doing the submodule processes there as well since I 1) knew that I would forget all-too-often, thus wasting my time, and 2) knew that this would not be the last time that a submodule had submodules. Heck, there is even the possibility that I’ll have a submodule that has a submodule that has a submodule. It was immediately clear that I needed a script to do all this dirty work for me. The rest of this post will be about the script I created.

The Script

First, I’ll share the script itself. If you are interested in how it works, continue reading.

git-submodule-recursive-update (right-click > “Save Link As…” to download)

It is also available as a gist on GitHub.

The script is written in Perl and should work on most systems. I’ve only tested it on Linux and OS X, so please let me know your results if you try out on Windows.

The Description

The functionality should look very straight-forward to anyone that knows Perl.

I store the current directory in the $start_path variable in order to always know where home is.
I start a wrapper loop that keeps running until all the possible submodules are initialized and updated.
Using the find command, I look for all the instances of .gitmodules and store the results in $data. The .gitmodules file exists if a repo has submodules.
I remove all the .gitmodules file references from the $data to leave just the paths.
I split the paths into an array and initialize the %paths hash to have a blank value for new paths (stored in the key). Setting this value to blank will flag the following loop that the submodules in that path have not been set up yet.
I create a tracker variable, $updated, to check if anything happened in the loop.
I then loop through the %paths hash to work on each path. If the path’s hash value is blank, I process that path.
I cd into the repo path, init and update the submodules, and switch back to the starting folder.
If the script is called with the optional --remove-gitmodules argument, I remove the .gitmodules folder while I’m focused on that folder. I use this for other automation scripts, so it may or may not be of value to you.
I then set the path’s hash value to 1 to flag it as done.
Closing out the loop, I update the $updated variable to show that something was updated in this pass.
Finishing up the do loop towards the top, I have while($updated). Basically, as long as something was updated in the core update loop, I’ll run everything again. This means that the loop will keep running until it didn’t find anything else that needed to be updated. When that point is reached, the main loop ends, and the script is finished.

I know that there are a number of things I could have done to make for a much more brief, compact script, but I was going to quick production with solid functionality, not brevity. In addition, there are unnecessary elements such as incrementing the $updated variable rather than just setting it to some value. I thought might want to know how many things were updated at some point, so I left it as a counter.

If you found this script helpful, please leave a comment. The more interest these Git-related posts receive, the more motivated I’ll be to share other processes, developments I’ve made to make working Git easier.

Did I help you?

Just for the heck of it, I decided to work on a version with a more direct hierarchy. I want to make absolutely sure that nested submodules are followed properly. This should also be easier to convert to other languages (I’ll bet that you’d never guess that Perl’s not my native language ;):

#!/usr/bin/perl

use strict;
use Cwd;

# You can specify a starting directory as the script argument.
if ( $_[0] )
{
chdir ( $_[0] );
}

my $global_indent = 0;
print ( ‘Searching the base project at “‘, cwd(), ‘”‘ );
init_and_update();
print ( “\n” );

exit;

# This function is a recursive function that will traverse a submodule hierarchy,
# and will update them, from the bottom up.
sub init_and_update
{
my @submodules;
# First, you must have submodules.
if ( open ( GITFILE, ‘.gitmodules’ ) )
{
my $heading;

# If so, we parse the .gitmodules file, and get the important parts.
# This is a REAL DUMB parser. It counts on the lines being in a particular order.
# This is Perl. We can do better.
while ( $heading = )
{
my $pathname;
my $url;

# The heading is the submodule header (and name).
chomp ( $heading );
$heading =~ s/^\s+//;
$heading =~ s/\s+$//;

# If we have a submodule…
if ( $heading =~ m/\[submodule / )
{
# Strip off the extra
$heading =~ s/\[submodule “(.*?)”\]/$1/;

# Get the pathname
$pathname = ;
chomp ( $pathname );
$pathname =~ s/^\s+//;
$pathname =~ s/\s+$//;
$pathname =~ s/path = (.*?)$/$1/;

# Get the URL of the origin
$url = ;
chomp ( $url );
$url =~ s/^\s+//;
$url =~ s/\s+$//;
$url =~ s/url = (.*?)$/$1/;

# Add it to our stack.
push @submodules, { ‘submodule’ => $heading, ‘pathname’ => $pathname, ‘url’ => $url } ;
}
}

close ( GITFILE );
}

# Make sure that we got some submodules.
if ( @submodules > 0 )
{
output_indents();
print ( “This directory has the following submodules:” );

$global_indent++;
for my $index ( 0 .. $#submodules )
{
output_indents();
print ( $index + 1, ‘) ‘, $submodules[$index] { ‘submodule’ } );
}
$global_indent–;

# Now, we simply go through the list, recursing all the way…
for my $index ( 0 .. $#submodules )
{
# Recursion
my $start_path = cwd();
chdir ( $submodules[$index] { ‘pathname’ } );
output_indents();
print ( “Looking for submodules under the “, $submodules[$index] { ‘submodule’ }, ” submodule” );
$global_indent++;
init_and_update();
$global_indent–;
chdir ( $start_path );
}

# Lets do our own.
`git submodule init 2>&1`;
`git submodule update 2>&1`;
`git submodule foreach ‘git checkout HEAD’ 2>&1`;
output_indents();
print ( “Updated the submodules in the \””, cwd(), “\” directory” );
}
else
{
output_indents();
print ( “No further submodules under this directory” );
}
}

# This simply helps the user to see the hierarchy of the operation.
sub output_indents
{
print ( “\n” );

for my $index ( 0 .. $global_indent )
{
print ( ” ” );
}
}

Comments

Dustin Bolton says:

February 26, 2010 at 9:01 am

This script works perfectly for me on Windows Server 2003 R2 Standard Edition SP2 running Perl 5.10.0.1004 [ActiveState].

naja says:

November 5, 2010 at 7:52 pm

wow, cool stuff. I might test that on windows as soon as the situation occurs for me. Why not put this script on github in its own repository. I am currently doing that with my code snippets, and than anybody can easily compile their own superrepository with all the snippets they like as submodules. This way we get nice version control for all the nice snippets out there. Put the text of this blog page in the wiki page on github and hopla…

I would happily pull in this script in my own snippets library.

- gaarai says:
  
  November 8, 2010 at 9:03 am
  
  Yeah. I have a number of projects that I really need to add to repos on GitHub. I’m in a crunch for time this week. Hopefully I can get it done soon.
  
naja says:

November 8, 2010 at 7:01 pm

note that on windows git-extensions also has this functionality

Aaron Forgue says:

December 2, 2010 at 8:43 am

This is fantastic and will help us solve a similar scenario. Thank you so much for posting this!

Oliver Schrenk says:

March 31, 2011 at 8:30 am

Thanks a bunch. I have a deeply nested project and when I started writing the script to and thought there had to be someone wit a similar problem.

I just added one line

[…]
`git submodule init 2>&1`;
`git submodule update 2>&1`;
`git submodule foreach ‘git checkout master’ 2>&1`;

- Chris Jean says:
  
  April 4, 2011 at 8:14 am
  
  Thanks a good addition. Thanks for sharing it.
  
Chris M. says:

June 20, 2011 at 7:30 am

Howdy,

Quick typo correction on your new line:

`git submodule foreach ‘git checkout master’ 2>&1 Is in the script.

It should be:

`git submodule foreach ‘git checkout master’ 2>&1`;

- Chris Jean says:
  
  June 23, 2011 at 1:09 pm
  
  Updated. Thanks for pointing that out.
  
Chris M. says:

June 21, 2011 at 3:22 pm

Just for the heck of it, I decided to work on a version with a more direct hierarchy. I want to make absolutely sure that nested submodules are followed properly. This should also be easier to convert to other languages (I’ll bet that you’d never guess that Perl’s not my native language ;):

#!/usr/bin/perl

use strict;
use Cwd;

# You can specify a starting directory as the script argument.
if ( $_[0] )
{
chdir ( $_[0] );
}

my $global_indent = 0;
print ( ‘Searching the base project at “‘, cwd(), ‘”‘ );
init_and_update();
print ( “\n” );

exit;

# This function is a recursive function that will traverse a submodule hierarchy,
# and will update them, from the bottom up.
sub init_and_update
{
my @submodules;
# First, you must have submodules.
if ( open ( GITFILE, ‘.gitmodules’ ) )
{
my $heading;

# If so, we parse the .gitmodules file, and get the important parts.
# This is a REAL DUMB parser. It counts on the lines being in a particular order.
# This is Perl. We can do better.
while ( $heading = )
{
my $pathname;
my $url;

# The heading is the submodule header (and name).
chomp ( $heading );
$heading =~ s/^\s+//;
$heading =~ s/\s+$//;

# If we have a submodule…
if ( $heading =~ m/\[submodule / )
{
# Strip off the extra
$heading =~ s/\[submodule “(.*?)”\]/$1/;

# Get the pathname
$pathname = ;
chomp ( $pathname );
$pathname =~ s/^\s+//;
$pathname =~ s/\s+$//;
$pathname =~ s/path = (.*?)$/$1/;

# Get the URL of the origin
$url = ;
chomp ( $url );
$url =~ s/^\s+//;
$url =~ s/\s+$//;
$url =~ s/url = (.*?)$/$1/;

# Add it to our stack.
push @submodules, { ‘submodule’ => $heading, ‘pathname’ => $pathname, ‘url’ => $url } ;
}
}

close ( GITFILE );
}

# Make sure that we got some submodules.
if ( @submodules > 0 )
{
output_indents();
print ( “This directory has the following submodules:” );

$global_indent++;
for my $index ( 0 .. $#submodules )
{
output_indents();
print ( $index + 1, ‘) ‘, $submodules[$index] { ‘submodule’ } );
}
$global_indent–;

# Now, we simply go through the list, recursing all the way…
for my $index ( 0 .. $#submodules )
{
# Recursion
my $start_path = cwd();
chdir ( $submodules[$index] { ‘pathname’ } );
output_indents();
print ( “Looking for submodules under the “, $submodules[$index] { ‘submodule’ }, ” submodule” );
$global_indent++;
init_and_update();
$global_indent–;
chdir ( $start_path );
}

# Lets do our own.
`git submodule init 2>&1`;
`git submodule update 2>&1`;
`git submodule foreach ‘git checkout HEAD’ 2>&1`;
output_indents();
print ( “Updated the submodules in the \””, cwd(), “\” directory” );
}
else
{
output_indents();
print ( “No further submodules under this directory” );
}
}

# This simply helps the user to see the hierarchy of the operation.
sub output_indents
{
print ( “\n” );

for my $index ( 0 .. $global_indent )
{
print ( ” ” );
}
}

Chris M. says:

June 23, 2011 at 1:08 pm

I guess this comment will never see the light of day (might as well delete the previous two, if you ever get around to it).

You guys might find this script useful. I read up on submodules, figgered out where I was wrong, and adjusted the script accordingly. I also figured out that I like the way Git handles submodules a bit better, after reading this article.

- Chris Jean says:
  
  June 23, 2011 at 1:10 pm
  
  A bit dramatic are we?
  
  - Chris M. says:
    
    June 23, 2011 at 1:43 pm
    
    Yes and no.
    
    My apologies. That part wasn’t really meant to be posted. I figgered you would remove it. I mainly added it because I just noticed that the last couple of posts had been hanging around for a couple of days, and wanted to find out if this thing was on.
    
    It is. Thanks for unclogging the pipes.
    
    I did send an apology to you privately, but I believe that apologies should be delivered in the same venue as the transgression, so I’ll repeat it here.
    
    - Chris Jean says:
      
      June 23, 2011 at 1:44 pm
      
      No biggie. I wasn’t put off. Just commentating. 🙂

Recursively Updating Git Submodules

The Script

The Description

Related

Comments

Leave a Reply