Codebox Software
OSX Shell Script to Read a Web Page Aloud
Published:
This one-line shell script for OSX will do its best to read a web page aloud to you. The results can be a little variable, depending on exactly how the page's HTML has been structured, but it usually works quite well if semantic markup has been used.
curl -L $URL | tr '\n' ' ' | egrep -o '<(title|h\d|p|li)( [^>]*>|>).*?</\1>' | sed -E 's/<[^>]*>//g' | say
Notes
Before running the script you will need to set the URL variable to the address of the page you want it to
read, for example:
URL=http://pun.me/pages/dad-jokes.php
The script contains 5 separate commands, explained below:
-
curl -L $URLdownloads the HTML source code for the page and sends it on to the next command. -
tr '\n' ' 'removes any newline characters from the HTML, so that all the markup is on a single line -
egrep -o '<(title|h\d|p|li)( [^>]*>|>).*?</\1>'this rather hairy regular expression strips away everything that is not contained inside one of the following<h2><h3><h4><h5><h6><p><li> -
sed -E 's/<[^>]*>//g'removes any remaining HTML tags from the page, but leaves the text that was inside them -
sayreads the resulting text out loud. Thesaycommand has a few nice options such as the ability to change the voice that it uses, and it can also save the audio into a file rather than playing it aloud.