Counting files in multiple directories

Last week I discussed counting files in the current directory. It was
a quick and dirty way to get the number of files in a directory with some
pretty strong assumptions about the type of files / directories that your
current directory contained.

Sure, it was a short sighted in the sense that it didn’t factor out directories
and didn’t traverse the file structure at all.

Thing is though, there’s nothing wrong with that. Keeping it simple means less
to remember and less cognitive load is always a good thing.

That said, I was still curious what my annual breakdown of posts was, so I
expanded things a bit this week.

If you remember, my blog runs on Jekyll and my posts are grouped by
year. My _posts directory looks something like this:

% tree
.
├── 2008
│   ├── 2008-09-26-re-establishing-my-web-presence.md
│   └── more posts
├── many many years
└── 2020
    ├── 2020-01-06-vps-showdown-digitalocean-lightsail-linode-upcloud-vultr.md
    ├── even more posts
    └── 2020-12-27-counting-files-in-multiple-directories.md

13 directories, 747 files

If I were to hit that directory with ls | wc -l it would count the number of
directories. If I were to pipe find -type f to wc -l it would count the
number of files across all of the directories.

To be able to get the number of files in each directory, we can expand on find
and manipulate the output with the cut command. Using cut will allow us to
grab the “year” directory from the output.

We can then pipe that output to sort to get things in order and finally,
uniq with the count argument to count how many times each directory appears.

To put it all together, it looks like this:

find -type f | cut -d / -f 2 | sort | uniq -c

I’d recommend starting with find and running it, then tacking on the cut
piece and repeating with the final two pipes to get a feel for what’s actually
happening here.

I’d say the most complex piece is the cut command. -d tells it what
character to use as the delimiter to split (which defaults to the TAB
character). Then the -f argument tells it which field(s) to grab. In this
case, we’re grabbing the second field, which is the year from
./2020/name-of-post.md.

Josh Sherman - The Man, The Myth, The Avatar

About Josh

Husband. Father. Pug dad. Musician. Founder of Holiday API, Head of Engineering and Emoji Specialist at Mailshake, and author of the best damn Lorem Ipsum Library for PHP.


If you found this article helpful, please consider buying me a coffee.