Last week I discussed counting files in the current directory. It was
a quick and dirty way to get the number of files in a directory with some
pretty strong assumptions about the type of files / directories that your
current directory contained.
Sure, it was a short sighted in the sense that it didn’t factor out directories
and didn’t traverse the file structure at all.
Thing is though, there’s nothing wrong with that. Keeping it simple means less
to remember and less cognitive load is always a good thing.
That said, I was still curious what my annual breakdown of posts was, so I
expanded things a bit this week.
If you remember, my blog runs on Jekyll and my posts are grouped by
year. My _posts
directory looks something like this:
% tree
.
├── 2008
│ ├── 2008-09-26-re-establishing-my-web-presence.md
│ └── more posts
├── many many years
└── 2020
├── 2020-01-06-vps-showdown-digitalocean-lightsail-linode-upcloud-vultr.md
├── even more posts
└── 2020-12-27-counting-files-in-multiple-directories.md
13 directories, 747 files
If I were to hit that directory with ls | wc -l
it would count the number of
directories. If I were to pipe find -type f
to wc -l
it would count the
number of files across all of the directories.
To be able to get the number of files in each directory, we can expand on find
and manipulate the output with the cut
command. Using cut
will allow us to
grab the “year” directory from the output.
We can then pipe that output to sort
to get things in order and finally,
uniq
with the count argument to count how many times each directory appears.
To put it all together, it looks like this:
find -type f | cut -d / -f 2 | sort | uniq -c
I’d recommend starting with find
and running it, then tacking on the cut
piece and repeating with the final two pipes to get a feel for what’s actually
happening here.
I’d say the most complex piece is the cut
command. -d
tells it what
character to use as the delimiter to split (which defaults to the TAB
character). Then the -f
argument tells it which field(s) to grab. In this
case, we’re grabbing the second field, which is the year from
./2020/name-of-post.md
.