Main Content
RegExp101 Exercise
1. Go to this website (regex101.com) and explore it thoroughly. The "Quick Reference" section on the bottom right may be of particular interest.
2. Now, make sure you are on the RegEx editor component of the website. It is the page that first somes up when you go to the website. You should see a box in which to insert a Regular Expression and a box in which to insert a Test String.
3. Insert the following into the TestString box: We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the
Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
4. On the left hand side of the page, make sure the "Python" Flavor is checked.
5. Write a regular expression into the box on top that finds every instance in your text that consists of the letter e followed by the letter s followed by the letter t.
6. Now find every instance in your text that consists of the letter e followed by any single character (including whitespace characters) followed by the letter t.
7. Notice that there is a tool on the page that says "Code Generator." Click it. Now make sure that the left hand side of the new page has "Python" checked. The result should be some computer code that looks through the passage from the Constitution and finds all the matches. Look through that computer code and see if you can roughly figure out what it is doing.
8. Now click on a different computer language up on the left such as "AutoIt." It generates a different form of computer code that accomplishes the same task: finding all instances in the text that consists of the letter e followed by any single character (including whitespace characters) followed by the letter t. Click through all the available languages. Which computer language produces code that is easiest to understand? Which produces the most inscrutable code? Which produces the shortest code?
9. As you know, I like Wolfram Language. Here's the Wolfram Language code to perform the same task:
StringCases["We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America", RegularExpression["e.t"]]
10. Find every instance of your text that consists of two consecutive vowels.
11. Do a "case sensitive" match on the letter P followed by the letter e.
12. Do a "case insensitive" match on the letter p followed by the letter e.
13. Now let's start ratcheting up the difficulty. Find every capital letter in the document.
14. Now find every word in the document that starts with a capital letter. Hints: Here's how I would designate the capital letters from B through F [B-F]. Here's how I designate a character that is not whitespace \S.
15. Now "capture" every capital letter in the document that starts a word.
This book, and all H2O books, are Creative Commons licensed for sharing and re-use with the exception of certain excerpts. Any excerpts from the Restatements of the Law, Principles of the Law, and the Model Penal Code are copyright by The American Law Institute. Excerpts are reproduced with permission, not as part of a Creative Commons license.