Last week I discussed counting files in the current directory. It was
a quick and dirty way to get the number of files in a directory with some
pretty strong assumptions about the type of files / directories that your
current directory contained.
Sure, it was a short sighted in the sense that it didn’t factor out directories
and didn’t traverse the file structure at all.
Thing is though, there’s nothing wrong with that. Keeping it simple means less
to remember and less cognitive load is always a good thing.
That said, I was still curious what my annual breakdown of posts was, so I
expanded things a bit this week.
If you remember, my blog runs on Jekyll and my posts are grouped by
_posts directory looks something like this:
│ ├── 2008-09-26-re-establishing-my-web-presence.md
│ └── more posts
├── many many years
├── even more posts
13 directories, 747 files
If I were to hit that directory with
ls | wc -l it would count the number of
directories. If I were to pipe
find -type f to
wc -l it would count the
number of files across all of the directories.
To be able to get the number of files in each directory, we can expand on
and manipulate the output with the
cut command. Using
cut will allow us to
grab the “year” directory from the output.
We can then pipe that output to
sort to get things in order and finally,
uniq with the count argument to count how many times each directory appears.
To put it all together, it looks like this:
find -type f | cut -d / -f 2 | sort | uniq -c
I’d recommend starting with
find and running it, then tacking on the
piece and repeating with the final two pipes to get a feel for what’s actually
I’d say the most complex piece is the
-d tells it what
character to use as the delimiter to split (which defaults to the
character). Then the
-f argument tells it which field(s) to grab. In this
case, we’re grabbing the second field, which is the year from