Counting files in multiple directories

Josh Sherman
2 min read
Command-line Interface

Last week I discussed counting files in the current directory. It was a quick and dirty way to get the number of files in a directory with some pretty strong assumptions about the type of files / directories that your current directory contained.

Sure, it was a short sighted in the sense that it didn’t factor out directories and didn’t traverse the file structure at all.

Thing is though, there’s nothing wrong with that. Keeping it simple means less to remember and less cognitive load is always a good thing.

That said, I was still curious what my annual breakdown of posts was, so I expanded things a bit this week.

If you remember, my blog runs on Jekyll and my posts are grouped by year. My _posts directory looks something like this:

% tree
.
├── 2008
│   ├── 2008-09-26-re-establishing-my-web-presence.md
│   └── more posts
├── many many years
└── 2020
    ├── 2020-01-06-vps-showdown-digitalocean-lightsail-linode-upcloud-vultr.md
    ├── even more posts
    └── 2020-12-27-counting-files-in-multiple-directories.md

13 directories, 747 files

If I were to hit that directory with ls | wc -l it would count the number of directories. If I were to pipe find -type f to wc -l it would count the number of files across all of the directories.

To be able to get the number of files in each directory, we can expand on find and manipulate the output with the cut command. Using cut will allow us to grab the “year” directory from the output.

We can then pipe that output to sort to get things in order and finally, uniq with the count argument to count how many times each directory appears.

To put it all together, it looks like this:

find -type f | cut -d / -f 2 | sort | uniq -c

I’d recommend starting with find and running it, then tacking on the cut piece and repeating with the final two pipes to get a feel for what’s actually happening here.

I’d say the most complex piece is the cut command. -d tells it what character to use as the delimiter to split (which defaults to the TAB character). Then the -f argument tells it which field(s) to grab. In this case, we’re grabbing the second field, which is the year from ./2020/name-of-post.md.

Join the Conversation

Good stuff? Want more?

Weekly emails about technology, development, and sometimes sauerkraut.

100% Fresh, Grade A Content, Never Spam.

Related Articles