Basic analytics in VIM using file searches

2020-01-26 @Technology
By Vitaly Parnas

VIM provides a handful of lesser known search features that I generally don’t see in crash course tutorials.

The standard search capabilities using the ‘/’ and ‘?’ for explicit search, ‘*’ and ‘#’ for the word-hovering search, and the ‘n’/‘N’ keys for navigation are fairly evident.

In addition, VIM offers a grep/vimgrep interface to search over multiple files, rendering the results to what are called the ‘error list’ (accessible via :clist) or the ‘quick list’ (:llist).

However, I found something else even more seamless for my purposes. Rather than explain, I’ll demonstrate.

I’ve leveraged the greatest book list aggregator CSV databases for the task. Each line of the CSV consists of a tuple rank,title,author,year.

The include-file search

The second-ranked book is Ulysses. Suppose I want a quick overview of all other works by James Joyce. Two methods are available:

  1. [I : Hovering over the word Joyce, type the left bracket followed by I. A comprehensive result window immediately appears at the bottom:

    tgb_1.csv
      1:    3 2,Ulysses,James Joyce,1922
      2:   45 44,A Portrait of the Artist as a Young Man,James Joyce,1917
      3:  240 237,Dubliners,James Joyce,1914
      4:  340 337,Finnegans Wake,James Joyce,1939
      5:  569 566,Them,Joyce Carol Oates,1969
      6:  890 887,The Horse's Mouth,Joyce Cary,1944
      7: 2351 2348,Black Water,Joyce Carol Oates,1992
    Press ENTER or type command to continue
    

    This has become one of my most frequently used strokes in VIM.

  2. ilist <pattern>: The search above returned extraneous results also containing ‘Joyce’. For a more explicit search, it suffices to type the abbreviated :il /James Joyce/ to obtain the following, what we wanted:

    tgb_1.csv
      1:    3 2,Ulysses,James Joyce,1922
      2:   45 44,A Portrait of the Artist as a Young Man,James Joyce,1917
      3:  240 237,Dubliners,James Joyce,1914
      4:  340 337,Finnegans Wake,James Joyce,1939
    Press ENTER or type command to continue
    

Both of these methods are way quicker for most purposes compared to the grep options. Leveraging this functionality, we can execute some rather rudimentary analysis similar to select statements in a relational database.

Examples:

Books published between 1300 - 1799 ranked in the first 200.

Note that we prepend the command with a custom range, in this case corresponding to the first 200 lines.

:0,200il /1[3-7]..$/

tgb_1.csv
  1:    4 3,Don Quixote,Miguel de Cervantes,1605
  2:   10 9,Hamlet,William Shakespeare,1601
  3:   16 15,The Divine Comedy,Dante Alighieri,1472
  4:   33 32,Gulliver's Travels,Jonathan Swift,1726
  5:   38 37,One Thousand and One Nights,India/Iran/Iraq/Egypt,1706
  6:   48 47,Tristram Shandy,Laurence Sterne,1759
  7:   51 50,Candide,Voltaire,1759
  8:   76 75,The Canterbury Tales,Geoffrey Chaucer,1380
  9:   91 90,Paradise Lost,John Milton,1667
 10:   92 91,Gargantua and Pantagruel,Francois Rabelais,1532
 11:   99 98,Robinson Crusoe,Daniel Defoe,1719
 12:  101 100,Tom Jones,Henry Fielding,1749
 13:  132 131,Decameron,Giovanni Boccaccio,1350
 14:  137 136,Dangerous Liaison,Pierre Choderlos de Laclos,1782
 15:  150 148,First Folio,William Shakespeare,1623
 16:  159 157,King Lear,William Shakespeare,1608
 17:  180 178,The Princess of Cleves,Madame de La Fayette,1678
 18:  196 194,Clarissa,Samuel Richardson,1748
Press ENTER or type command to continue

All Charles Dickens works not published in the 1850s:

:il /dickens.*18[^5].$/
tgb_1.csv
  1:   32 31,Great Expectations,Charles Dickens,1861
  2:  306 303,Our Mutual Friend,Charles Dickens,1865
  3:  319 316,The Pickwick Papers,Charles Dickens,1837
  4:  371 368,The Adventures of Oliver Twist,Charles Dickens,1837
  5:  664 661,A Christmas Carol,Charles Dickens,1843
  6: 1529 1526,Dombey and Son,Charles Dickens,1848
  7: 2247 2244,Nicholas Nickleby,Charles Dickens,1838
Press ENTER or type command to continue

Effectively anything we can structure using a regular expression search we can quickly render to this instantaneous result window, which also indicates the total number of results matched.

Jump-based search

There’s an even more obscure search technique using the jump list.

  1. ] CTRL+I and [ CTRL+I : Hover over the word and type one of these combinations. The buffer immediately jumps to the previous or the next result. You can continue this repeatedly.

  2. :ijump /pattern/ : The explicit variant, abbreviated as ‘ij’. Upon the initial execution of the command, you jump back and forth as per the above keystrokes.

These kinds of ‘jump’ searches impact the jump list, not the official search space. In effect, you can then iterate back and forth over the previously conducted jumps using CTRL+O and CTRL+I, entirely separate from the (primary) ‘/’ and ‘?’ search space. This enables you to conduct two parallel searches, using the two different methods.

Count

For a pure count of matched lines, the following ‘search’ command does the trick:

:%s/<pattern>/&/gn

Alternatively, with a search term already registered in the search space, it suffices to type the following, or better yet assign it to a mapping.

:%s//&/gn

To decipher, the search targets the whole file (the ‘%’ range), replacing the term by ‘&’, symbolizing the same matched term, except the ‘n’ flag prevents the replacement from even taking place.

With ‘James Joyce’ in our VIM search space, executing the above renders the following:

4 matches on 4 lines

Conclusion

The simple analytics functions above are handy in that you can execute them directly within the VIM buffer without additional plugins, vimscript functions, or external piping, and using very minimal strokes.

Questions, comments? Connect.