New! H2O now has access to new and up-to-date cases via CourtListener and the Caselaw Access Project. Click here for more info.

Main Content

Modern Computation

RegExp101 Exercise

1. Go to this website (regex101.com) and explore it thoroughly. The "Quick Reference" section on the bottom right may be of particular interest.

2. Now, make sure you are on the RegEx editor component of the website. It is the page that first somes up when you go to the website. You should see a box in which to insert a Regular Expression and a box in which to insert a Test String.

3. Insert the following into the TestString box: We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the
Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

4. On the left hand side of the page, make sure the "Python" Flavor is checked.

5. Write a regular expression into the box on top that finds every instance in your text that consists of the letter e followed by the letter s followed by the letter t. 

6. Now find every instance in your text that consists of the letter e followed by any single character (including whitespace characters) followed by the letter t.

7. Notice that there is a tool on the page that says "Code Generator." Click it. Now make sure that the left hand side of the new page has "Python" checked. The result should be some computer code that looks through the passage from the Constitution and finds all the matches. Look through that computer code and see if you can roughly figure out what it is doing.

8. Now click on a different computer language up on the left such as "AutoIt." It generates a different form of computer code that accomplishes the same task: finding all instances in the text that consists of the letter e followed by any single character (including whitespace characters) followed by the letter t.  Click through all the available languages. Which computer language produces code that is easiest to understand? Which produces the most inscrutable code? Which produces the shortest code?

9. As you know, I like Wolfram Language. Here's the Wolfram Language code to perform the same task: 

StringCases["We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America", RegularExpression["e.t"]]

10. Find every instance of your text that consists of two consecutive vowels.

11. Do a "case sensitive" match on the letter P followed by the letter e.

12. Do a "case insensitive" match on the letter p followed by the letter e.

13. Now let's start ratcheting up the difficulty. Find every capital letter in the document.

14. Now find every word in the document that starts with a capital letter.  Hints: Here's how I would designate the capital letters from B through F [B-F]. Here's how I designate a character that is not whitespace \S.

15. Now "capture" every capital letter in the document that starts a word.