edbrowse documentation

edbrowse Documentation, a User's Guide

Chapter 0, Contents

Chapter 1, Preface

Author

Karl Dahlke eklhad@gmail.com. Please see the wikipedia article for a history of edbrowse, and an overview of its features.

This program is copyright © Karl Dahlke (and other authors and contributors), 2000-2014. It is made available by the authors under the terms of the GNU General Public License (GPL), as articulated by the Free Software Foundation. It may be used for any purpose, and redistributed, provided this copyright notice is included.

Disclaimer

Edbrowse is provided as-is, with no implied warranty of fitness for any particular purpose. It might trash your precious files. It might send bad data across the Internet, causing you to buy a $37,000 elephant instead of $37 worth of printer supplies. It may delete all the rows in your mysql customer table. This is a spare-time project written by a couple of volunteers, that (understandably) cannot attain the quality and rigor of corporate or government software. By using this program, you agree to use it as-is.

Acknowledgements

Chris Brannon wrote or modified at least 20% of the code, and provided valuable ideas for the overall design. He now maintains edbrowse.

Adam Thompson converted the javascript interface from C to C++, as required by Mozilla JS versions 2.4 and higher. He has become a valuable member of the development team.

Jeremy O'Brien ported the software to Mac OS X.

Several people translated the output and error messages into other languages, and they are given due credit in the Languages section below.

Overview

This program is, at first glance, a reimplementation of /bin/ed. In fact you might issue a few ed commands and not realize that you are actually running this program. But as you proceed you will eventually discover some discrepancies, areas where edbrowse differs from ed. These are discussed below.

Reinventing ed seems like a complete waste of time, until you realize that this program also acts as a browser - a browser embedded inside ed. You can edit a URL as easily as a local file, and activate browse mode to render the html tags in a manner that is appropriate for a command-response program such as this. In other words, we discard most of the formatting information and retain the links and fill-out forms. This allows blind users to access the Internet through an application that is compatible with the linear nature of speech or braille.

If edbrowse is not included in your distribution, there is a perl version, with fewer features, that you can bring up right away on any computer: Linux, Unix, Mac, Windows, etc. Give edbrowse.pl a try, and if you like it then you can git the package and build it from source to realize the full-featured C version.

If you are a Linux user, and your distribution doesn't package edbrowse, you can use the aforementioned perl version of course, but you can also run the full-featured C version, without going through the hassle of building it yourself. Statically-linked executables for 32 bit and 64 bit architectures are maintained on the edbrowse home page.

This documentation assumes you are familiar with ed. In fact it helps if you are fluent in ed. Experience with internet browsers and the associated terminology is also helpful.

Other Languages

First, a few words about character sets (charsets). English is easily contained within a byte stream, one letter per byte. In fact, each letter fits in 7 bits; the eighth bit is not needed, and is set to 0. This system is called ascii, and as you can see, it is English specific.

Indo-european languages bring in more characters, such as ñ (Spanish), è (French), and ö (German). These can still fit within single bytes, by setting the eighth bit to 1, according to the ISO8859-1 standard. Still other languages, such as Czech and Hungarian, fit within the iso8859-2 standard, which assigns different characters to the bytes above 127. And there is iso8859-3, and so on. Select your code page, and all the letters of your language still fit within one byte. This is the iso8859 standard, and it is backward compatible with ascii. In other words, z is 122 in ascii and in every iso8859-x code page, but the higher numbers, above 127, could represent different letters in different languages.

This worked well for a while, but what if you want to write one paragraph in French and one in Czech? You either switch from iso8859-1 to iso8859-2 in mid stream, or we come up with a new standard that represents all letters in all languages simultaneously. This charset, which obviously will not fit into a single byte, is known as utf8, and it has become the new standard across the computer industry. Software does not have to select an iso8859 page and map numbers to letters in a manner that depends on the country you live in. Instead, ñ is represented by two bytes, not one. This is transparent to you; you see the same letter on the screen, and you hear the same sounds if your screen reader passes these letters to a speech synthesizer. A quick way to tell which system you are on is to echo $LANG. If it ends in utf8 or utf-8, and it probably will, then your console is using utf8, and it expects two-byte sequences. Your files will contain these underlying sequences, and you probably aren't even aware of it. Older pre-utf8 systems store each character in a single byte, with an iso8859 codepage doing the translation.

There's much more to say about charsets; this is merely a brief introduction. I need not go further, because edbrowse only supports iso8859 ⇔ utf8 at this time. Chinese, for example, requires 3 and 4 byte sequences, which map into unicode. Edbrowse doesn't handle this level of complexity at this time.

The output and error messages, such as "search string not found", have been internationalized, so that edbrowse can support most European languages. Set the environment variable LANG to interact with edbrowse in your home language. Supported languages are shown below. If you can translate edbrowse into additional languages, please let me know.

English: LANG=en (this is the default)

French: LANG=fr by Erwin Bliesenick including documentation

Brazilian Portuguese: LANG=pt_br by Cleverson Casarin Uliana

Polish: LANG=pl_pl by Wojciech Gac

When an output or error message is displayed, accented letters are printed using single bytes, vectoring through an iso8859 page, unless the string utf8 or utf-8 appears in $LANG, whence the nonascii characters are generated using utf8. LANG=fr-FR.UTF-8 is a common setting in France. Similarly, the contents of a buffer, be it a local document or an internet website, are displayed as single bytes or two-byte sequences, according to $LANG. Bear in mind, utf8 has become the standard, and edbrowse may not display text or error messages through iso8859 for long. In other words, iso8859 is deprecated.

If an older file is read into edbrowse, i.e. read into an empty buffer, and that file is in iso8859, while your computer is set to run in utf8, then that file is converted on the fly, before you ever see it. Thus it will look normal to you. If I did everything right, you shouldn't notice any difference. (Use the iu command to disable this feature.)

When you write data out to the same file, e.g. if you have made some corrections or additions, I convert it back to its original single-byte iso8859 charset. Thus you can send the edited file back to your friend, and it will be in his charset as he expects. However, if you write the data, or any portion of that data, into a new file, I will leave it in the charset that is used by your computer.

These conversions should never take place on zip files, or executable files, or other forms of binary data. If you see the words "converting to iso8859" or "converting to utf8", and the file is something other than international text, we have a problem. Don't try to run the converted executable; it won't work.

If your world is utf8, the search function can lead to some confusion. Consider the Spanish word niño, for a boy child. If you search for ni.o, you may not find this line of text. The dot stands for one character, and should match ñ, but this accented letter takes up two bytes. Ironically, you have to search for ni..o, and you will find what you are looking for. Needless to say, this is very confusing.

Search and substitute is performed by the pcre library, and fortunately for us, the latest version supports utf8. I pass pcre an option that tells it to treat certain two-byte sequences as single letters, and it behaves the way you want it to. Searching for ni.o works again. If you want to disable utf8 search and substitute temporarily, use the su8 command.

Some websites offer their contents in multiple languages. For example, twitter.com has an English version, a French version, and so forth. It is possible to select the language when requesting the page. Edbrowse supports this via the localizeweb keyword in its configuration file. For instance, the following entry in .ebrc indicates that you want the French version, when it is available.

localizeweb = fr

Chapter 2, Quick Reference Guide

Quick Reference Guide

Here are the ed and edbrowse commands, all in one place. This is a quick reference guide. Most of these commands will not make sense until you read the rest of the documentation.

q: quit the current session
qt: quit the program completely, whether you've written your files or not
!command: shell escape
p: print the current line
4,7p: print lines 4 through 7
+3p: advance 3 lines and print
+3: same as +3p, print is default command
-: previous line
---: back up three lines
'a,'bp: print a range of lines, marked with labels a and b
kb: mark the current line as b
l: list the current line, showing invisible chars and end markers
eo: end markers off
el: show end markers ^$ when a line is listed
ep: show end markers when a line is listed or printed
lna: expand all nonascii chars into hex when a line is listed (toggle)
n: print the current line with its line number
z22: print the next 22 lines
z: print another 22 lines
=: print the number of lines in the file
X: make this the current line (a no-op)
s/x/y/: replace x with y on the current line
s/x/y: replace x with y and print the result
s//y/: use the last substitution string, in this case x
s/x/%/: use the last replacement string
s: repeat the previous substitute command
s/x/y/2: replace the second instance of x with y on the current line
4,7s/x/y/g: replace all instances of x with y on lines 4 through 7
/x/: look for the line containing x
/x: same as /x/
/x/i: look for the line containing x or X
?x?: look backwards for x
ci: searches and substitutions are case insensitive
cs: searches and substitutions are case sensitive
sg: substitution strings are global across sessions
sl: substitution strings are local to their sessions
su8: search and substitute uses utf8 character sequences (toggle)
lc: convert line to lower case
mc: convert line to mixed case
uc: convert line to upper case
s/x/uc: convert x to X on the current line
h: help, explain the last question mark
f: print the name of the current file
f foo: set the file name to foo
f/: retain only the lass component of the filename
e: print the number of the current session
e3: move to session 3
e foo: edit the file named foo
r foo: read the contents of foo into the current buffer
w foo: write the current buffer to foo
w+ foo: append to foo
w/: write to the lass component of the filename
d: delete the current line
1,$d: delete all the lines, 1 through eof
D: delete the current line and print the next line
dr: directory is readonly
dw: directory is writable, and d moves files to your trash bin
dx: directory is writable, and d deletes files
hf: show hidden files in directory listing (toggle)
u: undo the last command
i: insert text before the current line, end with a period
c: change the current line, enter a new block of text, end with period
a: add text after the current line, end with a period
a+: include the line you just typed in, when you thought you were in append mode
4,7m11: move lines 4 through 7 to line 11
4,7t11: copy lines 4 through 7 to line 11
3,4j: join lines 3 and 4 together
3,4J: join lines 3 and 4 together with a space between
g/x/ p: print every line that has an x
v/x/ p: print every line that does not have an x
B: find the line with the balancing brace
b: browse the current file, which is assumed to be in html
b foo.html: edit the file foo.html and browse it
b url: fetch url from the internet and browse it
ub: unbrowse a file
ft: show the title of the current web page
fd: show the description of the current web page
fk: show the keywords of the current web page
hr: http redirection (toggle)
js: allow javascript (toggle)
sr: send referrer (toggle)
fma: ftp mode active
fmp: ftp mode passive
rf: refresh the web page or directory listing
et: edit this web page as pure text
vs: verify ssl connections (toggle)
ua3: pretend to be the third user agent in your config file
g: go to the link on the current line
g2: go to the second link on the current line
^: the back key, go back to the web page you were looking at before
i=xyz: set the input field on the current line to xyz
i2=xyz: set the second input field on the current line to xyz
i2*: push the second button on the current line, usually submit or reset
i3?: describe the third input field on the current line
db: set debug level [0-7]
cd: change directory
bl: break line into sentences and phrases
bd: binary detection on files (toggle)
rl: use readline() on input (toggle)
pb: play buffer (audio)
iu: automatically convert between iso8859 and utf8 (toggle)
su8: search/substitute using utf8 (toggle)
ds=source,login,password: set the data source
sht: show tables
shc: show columns (and primary key) for the current table
shf: show foreign keys for the current table
fbc: fetch blob columns (toggle)
sm: send mail [account number]
re: reply to a mail message
rea: reply to all
ip: show referenced ip numbers, usually for saved mail messages
Tips for Avoiding Line Numbers If you're new to ed, you may find this program awkward. I often receive complaints about line numbers. People hate line numbers. (Remember basic and fortran?) They don't want to read the first page line by line, 1p 2p 3p 4p 5p etc. Well I hate line numbers too, and I never use them. Haven't for years.

If you just want to read the whole document, type ,p. That works, if you use a command line speech adapter. The whole document is in buffer, and you can read through it using the function keys on your adapter. Now I realize most people still use screen readers, so this won't work. Still, there's an easy way to step through screen by screen. Start with 0z24 to get the first 24 lines. Then the z command will give you the next 24, and the next 24, and so on. You may want to use 22, or 23, or whatever makes sense relative to your screen.

Another approach is to simply hit return, again and again, and proceed line by line.

Once you are use to the regular expressions, you can jump to any part of the document, even a large document, in record time simply by searching for a unique text fragment. This comes with practice. Sometimes I guess wrong, and my search string is not unique. I wind up somewhere else and have to search again. This doesn't happen very often. I usually get to the right place in one or two tries.

If you want to mark certain lines of text, please don't try to remember the line numbers. Use the k command to mark them. I usually use ka and kb to mark the start and end of a block of text, while kc marks the new location. The move command is then 'a,'bm'c - with absolutely no line numbers. This is standard ed fair, though most people never take advantage of it.

To look for links on a web page, search for the left brace. Yes, you may stumble across a literal left brace in the text, but this doesn't happen very often. You might access a particular link by typing /{Next}/g. Similarly, you can look for input fields by searching for the less than sign. This will make sense as you read about the representation of web pages below. And of course, multiple operations can be scripted, a feature unique to this browser.

These are just some of the tips and tricks that will make you as fast and efficient as anybody using a screen editor or browser, provided you are familiar with the page. My wife is always amazed at how quickly I can negotiate websites, or edit the common documents that we work on together. However, you will never be faster than your sighted colleague when traveling through unfamiliar territory, no matter what system you use. That is a pipe dream.

Mailing List

There is a mailing list for users of edbrowse and other command line utilities. You can join by sending mail to commandline-subscribe@yahoogroups.com.

Chapter 3, The Editor

Important Deviations From /bin/ed

Certain search/substitute commands may behave differently under this editor. This is because the regular expressions are interpreted by the perl compatible regular expression (pcre) library, rather than the traditional regexp library. Hence regular expressions have more features, and more power, than the regular expressions employed by /bin/ed. The syntax is also somewhat different. For instance, perl uses bare parentheses where ed uses escaped braces -- to delimit sections of matched text. And perl uses $1 ... $9 to reference the matched substrings, whereas ed uses \1 ... \9. Also, perl supports the i suffix, for case insensitive search, along with the traditional g suffix for global substitute. There is no reason to describe all the nuances here. Please read the perlre man page `man perlre' for a full description of regular expressions under perl. Once you are accustomed to their power and flexibility, you'll never go back to ed.

Great! You've read the perlre man page, and you're back. Here are a few changes that I've made to perl regular expressions. I have found that ( and ) are almost always meant to be literal, as in searching for myFunction(), so I reverse the sense of escaped parentheses in perl. That is, ( and ) now match the literal characters, and $ and $ are used to demark substrings of the matched text. These substrings are then referenced, in the replacement string, by $1 through $9. Similarly, | means a literal |, and \| is alternation. I also change the sense of &, on the right hand side, to mean what it means in ed. I leave ^ $ . [ ] + * ? and {m,n} alone, to be interpreted by perl, as described in the perlre man page. However, if * is the first character, it is treated as a literal star. This makes sense, as there is no previous character to modify. Some versions of ed do this, some don't. I find it convenient; when I want to replace * + or ? I don't have to escape it just because it is a modifier. Similarly, an open bracket by itself is treated as literal. These changes to regular expressions, to look more like ed, may be confusing if you are a perl expert. Sorry about that, but I think these changes make this editor easier to use for everyone, especially the experienced ed users. Below are some additional differences between this program and /bin/ed.

Lines beginning with # are ignored, making it easier to comment your edbrowse scripts. The # character has no special significance in the middle of a line.
Lines beginning with ! implement a shell escape. The ! character has no special significance in the middle of a line. The ! alone spawns an interactive subshell - type exit to return to edbrowse. The work "ok" is printed when the shell command is finished - thus you can tell when a no-output command is done.
Type `cd dirname' to change directories. The new directory is always printed. Type cd alone to find out where you are. I don't know what happens under dos if you type cd f:/this/that, I never tested it.
Unlike bash, edbrowse does not retrace your steps back through symbolic links. Thus .. is always the physical parent directory.
environment variables are expanded before the cd command is applied, including the leading ~. Thus cd ~/work takes you to the work directory under your home directory.
This command does not change any filenames that may be active. You can edit foo, cd .., and write, and foo will be copied to the parent directory. That's probably not what you want, so be careful.
r operates on the current line by default, rather then the last line. Use $r to read a file at the end of your working text.
The w+ command appends to the file. Some versions of ed use w> for this operation, but for 40 years > has been the industry standard for write with truncate, so using > for append is somewhat confusing. And w>> is just too clunky, so I use w+.
w/ writes the data into a file whose name is the last component of the current file name. This is useful when you've just downloaded this.that.com/foo/bar/package-2.7.7-22.tar.gz, and you want to write the file locally, but don't want to retype the stuff at the end. Alternatively, f/ changes the filename, keeping only the last component.
Whenever a file is read from or written to disk, $var, in the filename, is replaced with the corresponding environment variable. Thus you can edit your address book at any time via `e $adbook', provided $adbook has been set in your environment. Also, a leading ~/ is replaced with $HOME/, making it easy to edit files in your home directory such as ~/.profile.
Shell meta characters are also expanded, provided the result is one file name. You can read or write a file by typing a minimal portion of its name. Neither $variables nor stars are expanded for files on the command line, as this expansion is already done for you by the Unix shell. Windows users should compile using the setargv.obj utility, which performs wildcard expansion on command line arguments. Thus you should be able to edit *.c in any operating system and get all the C source files in the current directory.
Many versions of ed place a $ at the end of a listed line, but this is not one of them, at least not by default. I use a linear speech adapter, rather than a screen reader, so the embedded newlines tell me exactly where the line boundaries are. The extraneous $ character just gets in my way. However, I realize most people still use screen readers, where trailing whitespace is indistinguishable from the blank screen, and a wrapped fragment is sometimes mistaken for a second line. Therefore, you can use the command `el' to place end markers around listed lines. Listed lines begin with ^ and end with $. Enter `ep' to place end markers around all printed lines. Use `eo' to turn end markers off.
q quits without a warning message if the text has never been associated with a file.
Capital Q does not quit the editor absolutely. This is because I often hit caps lock by mistake, or even shift q by mistake, and if I've forgotten about some important changes that I've made, those changes are gone! I know, this seems contrived, like it would never happen, but it has happened to me many times, so I disabled capital Q. Type qt to quit absolute.
Capital J joins lines together with spaces between them.
x (encryption) is not implemented.
P (prompt) is not implemented.
missing line numbers before or after the comma are assumed to be 1 and $. This is consistent with ,p -- to print the entire file.
You cannot enter one command across two physical lines by putting a backslash at the end of the first line. And there's no need to in any case, because perl supports \n translation. To split a line in the middle of the word doghouse, you would type:
s/doghouse/dog-\nhouse/
Only the first 500 characters of a line are displayed. The rest of the line is in the buffer, and can even be modified via a substitute command, but if you want to see it, you will need to split it, as in the doghouse example above. You can change the number of characters displayed by setting the linelength parameter in your config file.
a+ adds text, like a, but also adds the line you last typed, when you thought you were in append mode, but you weren't.
This program is less tolerant of whitespace than /bin/ed.
57 , 63 p will not fly.
A single % on the right hand side of a substitution is replaced with the last right hand side. Some versions of ed do this, some don't; I find it a convenient feature.
s, is shorthand for s/, +/,\n This is used to split lines at phrase boundaries. You can also use s. to split a line after the first period -- at a sentence boundary. s; s: s) and s" can also be used. s,3 splits the line after the third comma. You might need to use s.2 if the sentence begins with Mr. Flintstone.
Type s by itself for s//%.
The commands sg and sl make the remembered substitution and replacement strings global and local respectively. If you want to look at all instances of "foo" in all the files in the current directory, and change some of them to bar at your discretion, edit *, then enter sg to make substitution strings global to all edit sessions. In the first session, search for foo, and replace some of them with bar. Type e2 to move to the next session, whence you can search using slash alone, because the string "foo" is applied to all sessions. Similarly, you can use % to refer to "bar". The sl command returns this editor to its local behavior, where each file has its own search/replace strings.
Errors associated with reading or writing files, or switching sessions, are always printed. Other errors elicit the usual question mark, whence you must type h to read the explanation. Type capital H if you always want to see the error messages.
In most versions of ed, the command z7 means .,+6p, making the current line +7. I think this is inconsistent, having one and only one ed command that leaves dot somewhere other than the last line printed. The confusion is compounded when z prints the last lines in the file, whence dot actually is the last line printed. So I have changed the z command slightly. In this program z7 means +,+7p, and the current line becomes the last line printed, just like the other commands you know and love. Without a number, z prints the previous number of lines. Thus you can read your file a chunk or screen at a time.

Subsequent sections describe new and interesting features, completely foreign to ed. These include the simultaneous edit of multiple files similar to emacs and vi, and the ability to browse an html file and edit its fill-out form. That's why I wrote the program in the first place.

Balancing Braces

The capital B command is of interest to programmers, and will probably not be used by casual home users. It locates the line with the balancing brace, parenthesis, or bracket. Consider the following code fragment.

    if(x == 3 &&
    y == 7) {
        printf("hello\n");
    } else {
        printf("world\n");
        exit(1);
    }

The capital B command, on either the second or the last line, moves to the middle line "} else {", because that balances the open brace. On the first line, B moves to the second line, which balances the open parenthesis. The second line balances {, rather than ), because braces have precedence over parentheses, which have precedence over brackets. You can force a parenthesis match by typing B), which moves from line 2 back to line 1.

The B command on the else line is ambiguous - I don't know whether to look backwards or forwards. You must type B{ or B}.

You can explicitly balance <>, as in multiline html tags, or `', used in some preprocessors such as m4.

Comments or literal strings that contain balancing punctuation marks will definitely throw edbrowse off the track. If you are the author of the source, you might want to avoid braces in comments, or use comments to keep braces in balance.

static char openstring[] = "{block"; /* closing } is found elsewhere */

Context Switch

This program allows you to edit multiple files at the same time, and transfer text between them. This is similar to the world of virtual terminals (Linux), where you switch between sessions via alt-f1 through alt-f6. In this case you switch to a different editing session via the commands e1 through e6. Note that `e 2' edits a file whose name is "2", whereas `e2' (without the space) switches to session 2. Similarly, you can read the contents of session 3 into the current buffer via r3, and you can write the current buffer into session 5 via w5. The latter command will produce a warning if session five already exists, and you have made changes to its text, but have not saved those changes. In other words, you are about to lose your edits in session 5. Typing h will produce the explanation: "Expecting `w' on session 5".

If you quit a session you are moved to the next valid editing session, wrapping around to session 1 if necessary. The program exits when the last session quits.

Warning, the program contains a bug regarding the undo command. If you switch to another session, then switch back, you cannot undo your last edit. You'd think this would be easy to fix, but it is trickier than it seems, so I haven't gotten around to it. I just wanted you to know. Make sure everything is copacetic before you switch to another session.

Let's run through a cut&paste example. You are editing file foo in session 1, and you realize that a paragraph from file bar would fit perfectly right here. Here is how it might look. Lines beginning with < are the user's input, and lines beginning with > form the program's responses. The # sign delimits my injected comments, which would not normally appear in the middle of a line.

< e2   # switch to session 2
> new session
#  Unlike ed, the r command does not establish a file name, even if the
#  buffer is empty.
#  Thus "r bar" is safer than "e bar".
#  The text is not linked to the file bar,
#  and we cannot accidentally corrupt this file.
#  After all, we don't want to change bar, we just want to steal from it.
< r bar
> 28719
< /start/
> This is the start of the cool paragraph that you want to copy.
< 1,-d  # don't need the stuff before it
< /end/
> This is the end of the cool paragraph that you want to copy.
< +,$d  # don't need the stuff after it
< e1
> foo
< r2
> 3279  # size of text read from session 2
< q2  # clean house, get rid of session 2
< w  # write foo, with the new paragraph included
> 62121

The following moves the data from one file to another.

< e2
> new session
< e bar  # this time I'm going to change bar
> 28719
< /start/
> This is the start of the cool paragraph that you want to move.
< ka  # mark the paragraph
< /end/
> This is the end of the cool paragraph that you want to move.
< kb
< 'a,'bw3
> 3279
< 'a,'bd
< w  # write bar, without the cool paragraph
> 25440
< q
> no file  # now in session 3
< e1
> foo  # back to session 1
< r3
> 3279
< q3  # quit session 3 remotely, while still in session 1
< w  # write foo, with the new paragraph included
> 62121

An e command, by itself, tells you the current session, in case you've forgotten. This is similar to f, by itself, which tells you the current file.

Usage

type `edbrowse -h' to produce the usage message. You will see the -f, -fm, and -m options used in several different ways; just ignore them for now. These three options cause edbrowse to act as a mail retriever or interactive mail client. This will be discussed later.

The -dx option sets the debug level to x, where x is between 0 and 9. The default is -d1, which prints the sizes of files as they are written and read. Some people like -d2, which prints the URLs as you jump to new web pages or submit forms online. Unless you are debugging the program, you probably don't want to go any higher than -d3. On rare occasions you might want to set -d4, to see the http headers in and out. Remember, the debug level can be changed on the fly by using the dbx command (x between 0 and 9).

The -e option causes edbrowse to exit when it encounters an error. This is usually used by batch scripts. If there is a problem, you don't want to march on, executing the rest of the edbrowse commands. Note that set -e has the same effect in a bash script.

Use -c to suppress processing of, and edit, the .ebrc configuration file. This config file will be described later. And why would you want to do this? Suppose you have made a change to this file, and thereby produced a syntax error, so that edbrowse cannot even get started. Now you can't use edbrowse to fix your config file. Of course you could rename the config file to something else, fix it, and put it back; but then you might discover another syntax error, and so on. Instead, use the -c option to edit the config file directly. It is automatically loaded into buffer 1. Note that -c must be the first option.

The arguments are the files to edit. Edbrowse reads these files into corresponding sessions and starts you off in session 1. If there are no arguments, you start in session 1, but there is no text and no associated file.

If you like this program, and you want it to be your primary editor, you can set the following Unix alias.

alias e="/usr/local/bin/edbrowse"

If you do this, you can use `e filename', to edit a new file, whether you are inside edbrowse or at the shell prompt. Very convenient.

Binary Characters

At all times, even when entering a file name, this program scans its input for binary codes. Use the three character sequence ~bd to enter the nonascii character 0xbd, which is the code for 1/2. Similarly, if you list a line with lna active, the 1/2 character is displayed as ~bd. All nonascii and most control characters are entered and displayed in this manner. Tab and newline must be entered directly from the keyboard. Tab and backspace are displayed as > and < respectively. If the following line is entered,

Hello~07 ~x is ~bd of y

And then listed, you will see the very same text, but there is a bell and a 1/2 character inside. The ~x is not encoded into anything, because x is not a hex digit. If you want to force a ~, even though there are hex digits following, use two tildes, ~~.

Be careful however; the above example assumes iso8859, and as mentioned earlier, utf8 is the new standard. The control g bell is still a single byte, but 1/2 is a two-byte sequence, and must be entered as ~c2~bd. Of course characters are not usually entered in this awkward manner. There are easier ways to enter accented letters into your document, depending on your native language and the configuration of your system.

When you are entering a regular expression, you have the choice, hex or octal. This program converts ~xx, as a hex value, and the perl regexp machinery converts \nnn, as octal. Thus any of the following will undos a file. The first is translated by my software, the second and third by perl regular expressions.

,s/~0d$//

,s/\15$//

,s/\r$//

Embedded escape characters are always displayed in hex, whether the line is listed or not. Most terminals and terminal emulators, including the Linux console, interpret various escape sequences as control commands. Thus an errant escape sequence from a binary file could send your terminal into an unexpected state, making recovery difficult. At times I have had to log out and log in again, to reset the tty. Thus it seems prudent to render escapes as ascii characters all the time. If you have no idea where that ~1b came from, it's probably a literal escape character in your file.

Returns and nulls are also converted into hex all the time. Thus an embedded return will not make one line look like two lines. You will usually see this when importing a dos or Windows text file. Every line ends in ~0d. Issue one of the three commands shown above to undos the file.

Binary Files

Data is considered binary if it is sufficiently large (more than 50 bytes) and it contains a significant fraction of non-ascii or null characters (more than 25%). International text may contain scattered binary codes for accented letters, but most of the characters should still be ascii. Therefore binary data is not international text. In fact you probably won't be able to display or edit binary data effectively, at least not by this program. But hey, don't let that stop you. As an exercise, create an executable program that prints "hello world", then edit the executable using this editor. Look for the string "hello world" and replace world with jorld. Write the file and run the executable. You should now see "hello jorld".

When binary data is first read into the buffer, you will see the words "binary data". After that the buffer remains "binary", even if you delete all the data and read in ascii text. You must use the `e' command to get a fresh text buffer.

For the most part it doesn't really matter if the data is considered binary or text. Either way you can display and edit the data, and write it to a file.

This program tries to "do the right thing" under DOS/Windows. That is, it converts crlf to and from newline if it believes the file is text; and it leaves binary data alone. These distinctions are not relevant on Unix/Linux.

Although this approach is satisfactory for English and most European languages, it fails miserably for Asian languages, which definitely look like binary data. You can disable binary detection by entering the `bd' command. If you speak an Asian language, you may want to put this command in your init script, so edbrowse comes up the way you want - treating your international files as text files.

If you speak an Asian language, and you are running Windows, and binary detection is disabled, don't use this program to manipulate binary files, as they will get corrupted!

Directory Scan, File Manager

If you edit a directory you will see a list of all the visible files in that directory, in alphabetical order. Use the `hf' option to see the hidden files as well. Type g to go to one of these files or sub directories. Type ^ to return to the parent directory. This is consistent with the browser, where g is the go command and ^ is the back key; more on this later. Thus you can traverse an entire directory tree as though you were inside a file manager.

Like `ls -F', a subdirectory is indicated by a trailing slash. This slash is not part of the filename. Similarly, named pipe is indicated by |, symbolic link by @, block special by *, character special by <, and socket by ^. If a regular file ends in one of these characters, it may confuse you, but it won't confuse this program. Edbrowse knows whether that trailing | is part of the filename or a pipe indicator. Since each file is represented by a single line of text, files with newlines embedded in their names cannot be accessed.

If you read a directory into a preexisting file it is just text. You can't visit any of the underlying files, because they are just words. You must edit a directory in its own session or read a directory into an empty session if you want to access the underlying files. Note that you can write the buffer to another editing session, and in that session the words are just words. This distinction is important as we start to edit the text.

By default, directories are readonly. If you try to delete a line, and hence the associated file, it will tell you that you are still in directory read mode. I'm trying to save you from yourself! Type dw to enable directory writes, and dr to make directories readonly again.

When directory writes are enabled, you can remove files using the d command. For instance, g/\.o$/d removes all the object files. Since these edits have implications outside the scope of this program, there is no undo capability. When you make a change it is made. With this in mind, I borrowed a good idea from Microsoft / Apple. The deleted file isn't actually deleted; it is moved to your trash bin, located in $HOME/.Trash. This is consistent with the Mac and many versions of Linux. So if you accidentally type ,d and remove all your files, you can recover them from your trash bin. You may want to set up a cron job that removes all the files from your trash bin once a week. This directory is created mode 700, so nobody else can look at your deleted files. If you create this directory yourself, please make it 700. After all, some of your deleted files might be private.

Because this operation is a move, rather than a true delete, there are a few restrictions based on your operating system. If your OS can move directories, this program will be able to delete a subdirectory as easily as a file. The entire subtree is moved to your trash bin. Make sure your cleanup cron job is capable of removing directory trees, not just files.

If the trash bin is on another file system, the file is copied, rather than moved. It's practically the same; though the file will have your permissions, and a current time stamp. Directories cannot be copied in this way. You must copy the directory tree yourself, then delete it, using cp -r and rm -r.

Note that the dx command, wherein files are truly deleted, is the only way to free up space on the disk. Symbolic links and special files are always deleted; there isn't much point in moving a link to the trash bin.

"What's the point of all this?" you may ask. "What's wrong with the shell?"

Nothing, as long as the file names are small and familiar. But sometimes the file names are long and cumbersome, and it is nearly impossible to type those names into the shell, character for character, upper and lower case, with no mistakes. Meta characters such as the * can help, but only when the file you want has a name radically different from the other files in the directory. This isn't always the case. Suppose an application generates log files as follows.

ProgramFooBar.-04-04-1998.06:31:59.log
ProgramFooBar.-04-11-1998.11:37:14.log
ProgramFooBar.-04-18-1998.16:22:51.log

How do you delete the old ones and keep the most recent, or rename them to something more manageable? Stars are a bit risky; you can access multiple files without realizing it. And we're not even talking about those pesky files with spaces or invisible control characters in their names. Our sighted friend calls up his file manager and simply clicks on the file he wants to view or edit or remove. Sometimes I want/need that kind of power.

When the substitute command changes text, it renames the underlying file. This won't move the file on top of another existing file, so you can't lose any data this way. Once again I am saving you from yourself.

The search and substitute commands ignore the trailing filetype characters. If you want to rename a directory from foo/ to foobar/, you can type s/$/bar/. The bar will be placed at the end of the word foo, because the trailing / isn't really there.

Now suppose you want to run an arbitrary program on some of these files. This could be a print utility,a compiler, whatever. Sometimes you can rename the files for your convenience, then work in the shell. But sometimes you don't own the files, and sometimes they must retain their original names. This happens when several html documents reference each other through hyperlinks, using their existing filenames. So you can't rename the files, yet you still want to run your program on one or two of them.

You can run any program on any file without retyping that filename via the shell escape. Use kx to assign the label x to the file you are interested in. This is standard ed syntax. Then run !program 'x to invoke your program on that file. This sounds involved, but it is merely macro substitution, implemented in a few lines of code. If 'x is present in a shell escape, and is not next to any letters or digits, I replace it with the text on the line labeled x. Thus if your filename contains spaces, you'd better run !program "'x", to make sure the entire file name is one argument to the running program.

The token '. is replaced with the text on the current line, and the token '_ is replaced with the current filename. If you try to write a file, and remember that you left it readonly, you can make it writable via !chmod +w '_, then write the text to the file.

You can expand multiple tokens in one shell command. Use kx and ky to mark two files that you want to compare, then run !diff 'x 'y.

This feature is not limited to directory scans. You may be editing a simple file, but you can still paste the contents of a line into your shell command. Off hand I don't know why you'd want to do this, but you can.

Upper/Lower Case

The `lc' command converts a line to lower case, and `uc' converts it to upper case. Perl users will recognize these directives. As an extension, `mc' converts to mixed case, capitalizing the first letter of each word, and the d in mcdonald.

This is especially useful in a directory scan. The last thing a blind person wants to worry about is whether some of the letters in a file name are upper case. If directory write mode is enabled, type ,lc to convert all the file names to lower case. It's that simple.

If you want to upcase a particular word, type s/word/uc/. This converts the word to upper case. All the other substitution suffixes apply. To change foo, Foo, FOo, and FOO to FOO, everywhere, type ,s/\bfoo\b/uc/ig.

Break Line

The `bl' command breaks the current line into sentences and phrases, each about 70 characters long. It also compresses white space and strips white space from the end of the line. If the line contains return characters, these are turned into line separators - places where the line will definitely be cut. The only white space that is preserved is the tabs or spaces at the beginning of the line, or after each return character. This is a modest attempt to keep indented text indented, if that makes any sense.

I use this feature in two different ways. If I am familiar with the document, (I probably wrote it), I may use the bl command on a line of text that seems rather long. I typed it in quickly, as an uninterrupted thought, and now I want to break it up. But I don't want to count punctuation marks and say, "I think we need a break after the third comma and the period following that and then at the next comma", issuing the s punctuation commands along the way. Oh I like the s commands well enough - they put you in complete control - but it's easier to type bl - and bl usually does the right thing. Also, bl compresses accidental double spaces, a typo that I will never hear if I simply read the line as a whole.

When the document comes in from the outside, usually from another word processor such as MS-Word, bl serves a completely different function. Paragraphs are often stored on a single physical line. Sometimes the entire document is on a single line, with return characters, \r, separating paragraphs. Wysiwyg word processors don't worry about separating sentences and phrases - that's what word wrap is for. Well - bl is my version of word wrap. It doesn't try to conform to any screen; it merely cuts the text into manageable chunks, each piece a separate semantic unit. When bl is issued, physical lines will contain sentences or phrases, as delimited by punctuation, or by the newline/return characters embedded in the original document.

If one of the original lines, delimited by newline or return, is long, i.e. more than 120 characters, it is assumed to be a self-contained paragraph, and a blank line is added before and after. Thus a disassembled paragraph containing 20 sentences does not simply flow into the next disassembled paragraph containing 18 more sentences. An empty line separates the two paragraphs. This is only applicable if bl is applied to a range of lines, or the entire document, as might occur when making an outside document readable.

Don't apply the bl command to a preformatted section, such as a table or ascii art. If you're not sure what to expect, i.e. you didn't write the file, scan through it first, and apply bl to the range of lines that actually represents text. Often this is the entire document (,bl). The following commands do a pretty good job of cleaning up a typical Microsoft Word document.

e whatever.doc or whatever.wps
# change filename, so you don't accidentally overwrite the microsoft document
f _
,s/[~80-~ff~00-~0c~0e-~1f]//g  # strip out non ascii control/formatting codes
g/^\s*$/d  # these blank lines use to contain non ascii codes
,bl  # break lines and paragraphs
1,20p  # first couple lines are often garbage, but then the text begins.

Of course the program catdoc does a better job of converting word documents into text. This is often bundled with xls2cvs. These are must-have programs for people who want a command line environment.

Race Conditions

Suppose you are writing a file, and edbrowse truncates the existing file, then the computer crashes before edbrowse can write the new data. When you bring your computer back to life, your file is empty, zero bytes, and all your work is lost.

This is a narrow window to be sure; the computer has to fail at precisely the wrong millisecond. To guard against this improbable calamity, some editors write your data to a temp file, remove the true file, and move the temp file over to the true file. This way your data cannot be lost. Either the new or the old file will survive.

Then links came on the scene, hard links, and then symbolic links. Authors of ed, and other editors, had to scramble. You can't remove a link, write to temp, and move the temp file over to the link. It isn't a link any more, it's a regular file, and your filesystem is not what it use to be. For one thing, the true file, pointed to by the (symbolic) link, has not been changed at all. This is not what you want! So people rewrote there editors to disable this feature if the named file is a link to some other file. They had to revert back to the old truncate and write paradigm, and hope that nothing bad happens in between. And you know what, it never does. The window is just too small.

With this in mind, edbrowse doesn't mess with temp files at all. I just don't bother. I truncate the file and write out the data, and I don't expect anything to go wrong during the critical millisecond.

Another race condition is more subtle. Suppose you are editing a file and your friend, or a system program, edits the same file. Your file has actually been changed out from under you, while you held it in memory. When you go to write your changes, they will clobber any changes made by your friend, or the system utility. Most text editors guard against this by watching the timestamp. When you first edit the file foo, an editor might remember the timestamp on foo. then, when you are ready to write your changes, it checks the timestamp, and if foo has been updated in the interim, it issues a warning message. "File has been updated by someone else - do you really want to write?"

This is a good feature, but edbrowse doesn't have it, simply because I haven't gotten round to writing it. I'm the only user on my PC, and you're probably the only user on your PC too, so this feature is not in high demand. Still, I should implement it some day.

Chapter 4, Web Browser

Accessing A URL

Instead of invoking `e filename', you can invoke `e http://this.that.com/file.html', and the editor will retrieve the named file using the http protocol. The source (i.e. raw html) is made available for edit. You can modify it or save it on your local machine. Because the text was retrieve from another machine, it cannot be written back to that machine, hence the `w' command will not work. You must specify a local file `w myfile.html', or another editing session `w3'.

Note that this is not browsing, we are simply retrieving text from another machine and editing it locally. The text need not be html, it could be (for instance) a plain ascii document. Many people, myself included, put various types of files, even executables, on their websites for retrieval. Of course you wouldn't want to edit a binary file, but you can still use this editor to retrieve the file and save it locally, thus implementing an http download.

While inside the editor, you can type `e URL' to leave the current buffer and retrieve text from a remote machine. Or you can type `r URL' to retrieve remote text and add it to the current buffer. There is no `w URL' command, because the http protocol does not allow you to write html source back to a remote machine.

As a convenience, any filename with two or more embedded dots and a standard suffix (such as .com or .net) is treated as a URL. You can usually omit the http:// prefix. Try invoking `e www.space.com', as an example. But again, you are looking at html source, which probably isn't what you want. Browsing will be discussed later.

Whenever you retrieve data from a URL, the editor, directed by the http protocol, might change the filename out from under you. This is because the resource has moved and the original computer was kind enough to give you the new address. If debugging is set to 2 or higher, you might see a series of three or four different URLs as the editor is redirected across the internet. Finally it retrieves your document, and the current file name holds the correct and latest URL. You might want to update your bookmark file accordingly. Then again, you might not. Sometimes the initial url is the "public" location of the web page, and subsequent redirections occur inside the company. In this case you'll want to retain the public url, which will always work, even if the company relocates its web server. Use your best judgment.

Browse Mode

If the editor contains html text, from any source, even html that you wrote yourself, you can type `b' to activate browse mode. The command will be rejected only if the buffer is lacking in common html tags, or the editor is already in browse mode. You can force its hand by adding <html> at the top, or any other tag we recognize - it will always try to convert such a file. Now the transformed text is readable, without any visible html tags. In other words, <P> has been turned into a paragraph break, <OL> has become an ordered list, and so on. The filename is also changed; a .browse suffix has been appended. If you write the transformed data, deliberately or accidentally, the reformatted text will be saved in a new file, whatever.html.browse, without disturbing the original html. This protects you if you are developing your own web pages. BTW, I believe blind people should write raw html, rather than wielding a wysiwyg web development tool such as Front Page. In fact I write all my documents in html, even short business letters. I can create headings, lists, tables, etc, without using a wysiwyg editor or a screen reader. This excellent tutorial will get you started.

When the browse conversion is executed, the system checks for common syntax errors, such as a numbered list that is never closed. If the file name is a URL, these syntax errors are not reported. After all, it's not your web page, and there's nothing you can do about it. However, if the web page is yours, as indicated by a local filename, the first syntax error is displayed, whence you can return to the html source and fix it. Type `ub' to undo the browse conversion. This takes you back to the raw html text under its original filename. Now you can correct the error and try the `b' command again. For your convenience, the label 'e is set to the line containing the error. Repeat this process until `b' runs without errors.

If you try to quit, and the editor says "expecting `w'", remember that you should be back in raw html before you issue the write command. You could write the browsed text into file.browse, and that will satisfy the "write" criteria, but this isn't really what you want. You've corrected errors in the html source, and that's what you need to save, so remember to undo the browse reformatting before you write the file.

Note that you can issue the unbrowse command even if there were no errors. If, for instance, you are looking at a well-constructed page on some other website, and you'd like to read or save the raw html, just type ub. As an exercise, invoke `e www.space.com', and use the `b' and `ub' commands to switch between the raw html and the browsable text.

The browse reformatting is relatively simple, because a blind person doesn't want complexity. We don't care about fonts and italics etc, and if we do, the best way to obtain this information is by reading the raw html. So most tags are discarded, except those related to headers, paragraphs, and lists.

I don't indent subsections or list items. The visual effect is lost on us, and sometimes the extra spaces get in the way.

Because the physical line is, for us, the unit of thought, i.e. the atomic construct that is modified or moved or copied, lines are cut at approximately 80 characters, give or take a few, usually at a sentence or phrase boundary. Thus reading line by line often reveals a sequence of sentences, or at least self-contained phrases within a larger sentence. I consider this the optimal way to view or edit a document -- any document. If you read this manual raw, without doing the browse on the file, you'll see what I mean. Review the break line command above.

The layout of a preformatted section, <pre>, is honored, although sequences of blank lines are compressed down to one blank line, and whitespace at the end of lines is stripped. This preserves the structure of street addresses, and other preformatted blocks.

Tables are formatted like an ascii unload from a spreadsheet or sql database. Pipes separate the fields on each row. There is no whitespace around the pipes, and the fields of a given row probably won't line up with the fields below. It isn't pretty, but a blind user can't really trace down a column in any case, especially when using a line editor such as this. Better to write the table to a local file and use cut, sort, join, etc. Here is a sample table.

part number|quantity|price
2635|2|$34.80
1398|1|$67.50
8118|5|$125.00

Empty fields at the end of a row are dropped. These are almost always images -- sometimes an entire row of images -- sometimes an entire table of images. The blind user doesn't need to read the no-content pipes.

Note that the browsable text is readonly. After all, it's not the "source" -- why should you edit it? There are ways to enter and edit the input fields of an on-line form, but this will be discussed later. For now, you can think of the text as readonly. Issue a copy or insert or substitute command, and you'll get an error.

If you do want to edit the text, as pure text, enter the `et' command (edit as text). You will not be able to return to the html that produced this page. Nor can you follow a hyperlink or submit a fill-out form. The browsable text has become plain text, with no internet semantics.

The command `b file.html' is shorthand for `e file.html', followed by `b'. Remember that the ub command reverses the browse conversion, and reproduces the original html text, as though you had entered `e file.html' alone.

If a url is opened from the command line, as in `e www.google.com', it is automatically browsed. Type `ub' to revert back to the raw html.

Technical, Math

Most people never read technical web pages, but if you do...

A subscript, as indicated by html tags, is enclosed in brackets. Thus x<span class=sub>n</span> becomes x[n]. This transformation is not done if the subscript is a one or two digit number. Thus x subscript 1 is rendered x1, just like your professor would say it. This is not ambiguous, as you might first think; only programmers use x1 as a variable name, not mathematicians. If you see x1 in a formula, it means x subscript 1. Even 17a3b3 is not ambiguous; it is a translation of 17 times a[3] times b[3].

Superscripts, as indicated by <span class=sup>, are enclosed in parentheses, with a preceding arrow. The parentheses are omitted if the superscript is a number. Thus x cubed looks like x^3, while x to the n-1 power looks like x^(n-1).

There are, sad to say, three different ways to encode mathematical symbols in html. At present edbrowse only supports one of them, though it is the most common, and the most portable among all browsers. This is the unicode system, where the Greek letter theta is specified as θ. Explorer turns this expression into θ, one character on the screen, while edbrowse turns it into the word theta. I also put spaces around the word if its neighbors are also words. This is illustrated by the circumference of a circle, which is 2 times pi times r. These three tokens are usually squashed together, and there is no confusion in the sighted world, where pi is a separate Greek letter. But if pi is spelled out, and the tokens are left together, the result is 2pir. Now pir looks like a three letter word. To avoid this, edbrowse inserts spaces, giving 2 pi r.

These translations are designed to work with the pages of the Math Reference Project, an archive of advanced mathematics that attempts to be both sighted and blind friendly at the same time.

Title, Description, Keywords

While in browse mode, the commands ft, fd, and fk produce the title, description, and keywords of the current web page respectively. These are normally not visible to the user. The title describes the web page in 80 characters or less. The description is a more complete explanation, which is displayed by a search engine such as yahoo or altavista. The user reads the description via the search engine and decides whether to read that web page. Finally, the keywords are used by search engines to facilitate keyword searches. Like the rest of the browsable text, these three attributes are readonly. If it is your web page, you can modify them by returning to the raw html. Web designers should pay close attention to the description and the keywords, else your pages will not be accessible via the standard search engines.

Note that `ft' prints the title of the web page, whereas `f t' (with a space) renames the current file to "t", which is probably not what you want.

The Refresh Command

Type `rf' to refresh the current file. This rereads the file or url into the current buffer. It does not push a new editing session onto the stack. This is analogous to the refresh button on Netscape and Explorer.

If a web page is updated every minute, e.g. with the latest stock prices for your favorite companies, you can type rf to fetch the latest copy of this web page. This assumes the intervening internet servers are not caching the web page and handing you the same out-of-date copy over and over again.

On your local machine, you can use this feature to read the latest version of a dynamic file, such as a log file. Or you can reread a directory, to incorporate any new files that have been placed in that directory. For example, you might use the shell escape to execute `cat x y >z', yet z will not appear in your directory scan until you type rf.

Hyperlinks

A link to another web page is enclosed in braces, like this:

{Recent reports} suggest a connection between health and intestinal bacteria.

Behind the scenes, "recent reports" is linked to http://www.sciam.com/article.cfm?id=jeremy-nicholsons-gut-instincts, but you don't see that unless you activate the link or view the raw html.

Of course the browsable text might also contain words inside braces, especially if the web page is technical in nature. Hence there is some ambiguity. However, I believe it is clear from context. {More information} is probably a link, whereas ${HOME}/.profile is probably not.

Some web pages present a series of icons that are actually links to other pages. That is, you click on a picture, rather than a phrase, to go somewhere else. These icons are suppose to be intuitive. Sometimes they are - sometimes they're not. Sometimes the web designer is kind enough to supply a text phrase that roughly describes the image. In this case the phrase is used as the link. If there is no alternate phrase, the filename of the hyperlink reference is used. This name can be surprisingly helpful, or it can be utterly useless, as in "index.html". If this name cannot be determined, the generic link {image} is used. In this case you will have to go to the web page to find out what it contains.

Note, an image that is not part of a hyperlink has its alt text enclosed in brackets, as in [girl with a long red braid].

To follow a link, enter the `g' (go) command. Yes, `g' also initiates a global substitute command, but only when it is followed by a regular expression. By itself, g follows the link on the current line, g2 follows the second link on the current line, and 4g follows the link on line 4. If a link spreads across multiple lines, you must be on the first of these lines, the line containing the left brace.

The g command can also act on a link that is written in raw text, as long as it looks like a valid url. If your friend sends you an interesting url via email, and you save it to a text file, you can go to that link, even though the file is not html and you've never issued a browse command.

Internal Links

Although most links lead to other web pages, some links point to other sections within the current web page. Again, you will be able to tell by context. Links in the table of contents are usually shortcuts to chapters in the current document. The same holds for links that look like: see {Appendix I}, or, see the section on {Hardware Configuration}.

The g command follows an internal link or an external link. Either way you find yourself in a different place. However, if the link is internal, you are still browsing the same file. In fact, the only thing that has changed is the current line number. The new line is displayed, and should correspond to the link you activated. Often the words are the same. Activate {Appendix I}, and you'll probably see the section heading "Appendix I". Enter z10 to read the first few lines of the appendix.

The Back Key

If you edit a new file via the `e', `b', or `g' commands, and you already have text in the buffer, that text is bundled up and pushed onto an internal stack. You can pop the stack by issuing the `^' command. This is suppose to be intuitive -- the up arrow points to the previous page that rolled off your screen.

This feature seems rather silly if you're just editing files, but it makes sense when surfing the net. Often we descend through two or three links, only to find ourselves at a dead end. "I didn't want to go here." So we hit the back key again and again, until we reach familiar territory. We can now proceed in a new direction. The command ^3 or ^^^ backs up through three pages. Don't use this iterative feature unless you know exactly how many times you need to back up.

Note that the entire state of an editing session is saved and reproduced, including the file name, the last search/replace strings for substitutions, the hyperlinks and forms, the compiled javascript, everything!

Unlike lynx, I don't keep a running history of every web page visited. I never really saw a need for this feature. 99% of the time I simply want to back up one or two pages, and that's it.

The stack should not be confused with parallel edits, as described in an earlier section. In fact each editing session, e1 e2 e3 ..., has its own internal stack. Parallel sessions are appropriate when you need to move back and forth between two files, or cut&paste between them. However, one session, with its internal stack, is usually sufficient to surf the net.

If a browse command fails completely, giving you a rather uninteresting empty buffer, the stack is popped automatically, taking you back to the previous web page. Now you can retry the link by typing `g' again, or follow a different link on the page. Note that a browse command can fail, and still give you text explaining why it failed, if the remote server is well-designed. In this case you may see the error message "file not found", yet you will be viewing a new web page, which explains the problem. After you've read the explanation, follow its directions, or type ^ to back up and try again.

If you are presented with a number, even 0, the stack has been pushed, and you are in a new file or url. The number is the size of the new file. Use the ^ command to get back. If there is no number, merely an error message, then edbrowse did not create a new buffer. It probably didn't get that far. Typing . will produce the same line you saw before.

Following an internal link to another section in the current document does not push anything onto the stack. In other words, ^ will not take you back to where you were. In fact, it will take you up to the previous web page, which is not what you want. If you want to take a glance at Appendix I, and then return, mark the current position with `kr'. After you've visited the appendix, use the label 'r to return to your original location in the file.

The M Command

If you want to read and/or interact with several web pages in parallel, pages that would normally stack up, you can move each one to another session using the capital M command. The tags and links are transferred along with the rendered text. Once the web page has moved to another session, edbrowse issues the ^ command for you. Now you are back to the previous page.

It is generally unsafe to make a copy of a running web page, with all its javascript objects etc, so the M command moves the page out of the way, and takes you back to the previous page. Note, this command works just as well with files.

Suppose a web page presents

{planes}
{trains}
{automobiles}

If you are curious about all three topics, issue these commands in this order.

1g
M2
2g
M3
3g
M4

Now sessions 2 3 and 4 are the subpages about plains trains and automobiles respectively. You can fill out forms or follow hyperlinks in any of them, or stay in session 1 and do something else.

Background Music

If you are trying to listen to a speech synthesizer, the last thing you need is background music. Instead of playing the song, I make it available to you through a hyperlink.

{Background Music}

This always appears at or near the top of the page. Click on this link and download the wave or mp3 file, and play it at your convenience. Use the play buffer `pb' command. Normally pb uses the name of the file to infer the audio format. If the filename ends in .wav, it's a wave file, and so on. If the filename is not particularly helpful, and you know the audio format, you can specify it by typing pb.wav for a wave file, pb.mp3 for an mp3 file, and so on.

The config file (described below) includes mime type descriptors, which tell edbrowse how to play wave and mp3 files etc. These must be set up, or the pb command won't work. It will say something like, "I don't know how to process an mp3 file". This is consistent with other browsers, which use "plugins" to play multimedia files that are retrieved from the internet.

Input Fields

The input fields of an on-line form are usually indicated by angle brackets. For example, a search engine might present the following form.

Keywords: <>
Advanced parsing: <->
Language: <en>
Search now: <GO>
Clear form: <RESET>

The first line in this sample form is a simple text field, which is initially empty. You supply the keywords to search for. Entering and editing input fields is discussed later.

The second line is a checkbox. This field tells the search engine to use advanced boolean features, such as this keyword and that, or this, but not that, etc. The feature is disabled, indicated by -. (Most people don't know how to use advanced search anyways.) A + means the checkbox is on.

The third line determines the language of the keywords, English by default. This isn't a free text field, you can't just type in anything you want. It is a dropdown list of languages. I'll describe how to view the options later.

The fourth line is the submit button, which sends the form to the search engine and retrieves the results. This field cannot be edited; it is merely a button to push.

The fifth line is also a button to push. It clears all the data you have entered, so you can start over. Default values will be restored. Thus the third line goes back to <en>, rather than <>.

Data Entry

Filling out a form is relatively easy, once you are familiar with the overloaded `i' command. Yes, i by itself means insert text, but in browse mode, i refers to the input fields.

If there is only one input field on the current line, i? displays information about that input field. If the line contains multiple input fields, you will need to use a number, as in i3? for the third field. The type of input field is displayed, then its size, then the field name. If the input field is drawn from a set of options, the option list is displayed as well, with menu numbers prepended. When you want to select an option, you can either type in a substring that determines that option uniquely, such as mich for Michigan, or you can type in its menu number. Needless to say, the latter is often easier. Recall the sample form in the previous section. If you type i? at the third field, you might see the following.

select[7] language
1: english
2: french
3: german
4: italian
5: spanish

If a select list contains hundreds of options, type i?string to see only those options that contain the specified string. Type I?mi in a state field and get Michigan, Mississippi, Missouri, and Minnesota. Then you can select the option you want by name or by number.

Now let's do some data entry. Type i=xyz to place xyz in the input field. Type i3=xyz to put information into the third input field on the current line. If you get an error, it is probably because the field has a fixed set of options, and you didn't pick one of those options. You can either type in one of the options or its menu number. You can also type in a fragment of the option you want, and edbrowse will fill in the rest. This is done whenever one and only one option contains a copy (case insensitive) of the string you entered. Thus you could enter tali above and get Italian, as that is the only language with those four letters. This is useful when you are entering your address, and they ask for the state. Type in a few letters of your state name, enough to be unique, and you'll probably glom onto the correct option in the list. Note the paradigm here: blind people don't want to wade through a menu unless they absolutely have to!

There is some ambiguity when the option is itself a number. In this case I perform three matches. If you type in the number exactly as it appears, that option is selected. If the number you entered is not a perfect match for one of the options, it is treated as a menu number. If it is not a valid menu number (e.g. out of range), I perform a partial match on the options, looking for those digits as a substring. This may seem confusing, but it is usually what you want.

You can use i<7 to pull the contents of session 7 into the current input field. Session 7 must have one line of text. Similarly, i<filename reads the contents of the file into the current input field. Again, the file should contain one line of text. The filename is expanded in the usual way. This includes wildcard expansion, as long as the expansion leads to one and only one file. Put enough characters around the * to designate a single file.

Now suppose you are entering your credit card number, all 16 digits, into a free text field. If you've made a typo, you don't really want to enter the entire string again. No problem -- use the substitute command. You can write this as i/x/y/ or s/x/y/ -- as you prefer. Remember, you may need to specify a field, as in s3/x/y/. The usual substitution syntax is honored. Don't overgeneralize the g suffix. s3/x/y/g changes every x to y in the third input field, but does not affect the other fields on the current line.

If the submit button is the third field on the current line, you can press it via i3*. However, i* is sufficient when there is only one button on the line. Similarly, you can establish a text field by entering i=kangaroo, rather than i1=kangaroo, if the second field on the current line is a submit button. You only need specify a field number when there are multiple input fields, or multiple buttons, on the current line.

Text Areas

Some internet forms allow you to type freely, as in "Please enter your comments here." This is done inside a window within the screen, having a fixed number of rows and columns, although that is usually an artificial constraint. The sighted user can type more lines than the window will hold, and the window scrolls appropriately. Fortunately the blind user can ignore the artificial window and type freely. Still, the i? directive tells you how big the window would be if you were running a visual browser. You might see something like "area[7x40]", which indicates a window 7 rows by 40 columns.

The lynx implementation of the text area is particularly hideous. This is not surprising, since lynx is not an editor. You can correct small typos on the current line, but you can't actually edit the text you are working on. Once you hit return, that line is done, and you're on to the next line. You can't move lines around or insert lines, nor can you prepare your comments ahead of time and read them into the text area from a file.

In edbrowse, the text area is managed from another editing session. This allows you to use the full power of the editor. You can move text, make global substitutions, or read comments in from a prepared file. The editing session is chosen for you, and appears in the input field. Consider the following form.

Enter your email address: <>
Enter your comments: <buffer 2>

In this example, session 2 was not active when browsing began. The browser allocated session 2 specifically for this input field. Type e2 to move to session 2, prepare your comments, and type e1 to return to the input form. On most web pages the text area starts out blank, whence buffer 2 will be empty, but this is not always the case. Be sure to check for pre-existing text before you start typing your thoughts. A particularly arrogant site might preload the text area with: "I love your website because:".

When you finally submit the form, as discussed in the next section, text buffer 2, associated with the second editing session, will replace the words "bufffer 2" in the input field. Thus your carefully crafted comments are on their way.

Push The Button

If the third input field on the current line is a reset or submit button, you can press the button via i3*. The reset button puts the input fields back to their original values, as supplied by the web page when it was first loaded.

The submit button sends the form to the remote server and waits for a response. This is similar to following an internet link, but in this case you are sending some data along with the request. Type "kangaroo" into a search engine and you'll soon be reading a web page about kangaroos. As with any other link, you can use the ^ key to go back. In this case you will return to the on-line form. You can change the data and submit the form again, asking about another animal.

I have implemented the "get" and "post" methods, the most common http protocols, and they seem to work on most sites.

Once you have submitted your form, and you are viewing the results, you may notice some strange characters at the end of the filename. If you have retrieved information on kangaroos, the filename might look like: www.search-engine.com?keywords=kangaroo. The text after the question mark is an encoded version of the data you entered into the form. It becomes part of the virtual URL. This is actually a good thing, as we shall see in the next section.

Web And Email Addresses

The capital A command shows you the web addresses behind the links on the current line. Each web address will be surrounded by <a> and </a> tags, ready to be pasted into a bookmark file, if that is what you wish. These addresses exist in a new editing session; the previous session has been pushed onto the stack. You can add these to your bookmark file via w+ $bookmarks, assuming you have set the environment variable bookmarks appropriately. They will be appended at the end; you can move them to a more appropriate place in the file later on, when you're not "on line". For those with dial up connections, connect time is precious, and should not be spent rearranging bookmark files. Finally, use the ^ key to return to the web page you were viewing. Here is how it might look.

< b this.that.com/whatever  # browse a web page
> 16834  # size of the raw html
> 7855  # size of the browsable text
< /kangaroo/i  # looking for kangaroo on the page
> Click here for {more information about kangaroos}, or {send us mail}.
< A  # capture the URLs
> 144  # size of the URLs
< ,p  # let's see them
> <A HREF=www.kangaroo-info.com>
> more information about kangaroos
>  </A>
> send us mail:info@kangaroo.org
< 4d  # don't need the email address
< w+ $bookmarks  # append this url to the bookmark file
> 336
< ^  # back to browsing
> Click here for {more information about kangaroos}, or {send us mail}.

I suppose I could interrogate the environment variable $bookmarks myself, and append the URL to that file automatically, but as this example shows, you might not want all the links. In fact the email link makes no sense in a bookmark file. Also, you may want to change the description of the link, though in this example the description is pretty reasonable.

Alternatively, you might discard the url and retain the email address, appending it to your address book. Again, you will want to change the generic phrase "send us mail" to a brief string that is meaningful to you, such as kangaroo-mail. This becomes the alias, which you can use to send mail to that recipient. Subsequent sections will describe the use of edbrowse as a mail client.

If there are no links on the current line, or you are not in browse mode, the current filename is used. This is useful when you want to bookmark the current page, rather than some other page pointed to by a link.

If the current page is the result of a form submission, the filename may include your input fields after the question mark. If it does, that's a feature, not a bug. This exact URL, with the data at the end, can be stored as a bookmark and activated again and again, as though you had filled out the form each time. Every week you can call up this virtual URL to see if there is any new information on kangaroos. A more practical example might be a canned query that retrieves the weather for a certain city or the stock prices for the companies in your portfolio. You can also write concise scripts that fill in the virtual form, simply by modifying the information after the question mark. This provides a simple command to retrieve the weather from any major city or the current price of any stock.

If the form uses the post, rather than the get method, the same data will appear, but the question mark is replaced with a control a. Unfortunately the control a is not visible, and this could cause confusion. When in doubt, list the line.

One last warning about adding links to your bookmark file. Let's say you've issued the A command, and tweaked the description just a bit. Now the link is just write, and you want to save it. You accidentally type `w $bookmarks', forgetting the plus. Instead of appending the link to the end, you have clobbered your entire bookmark file. Years of accumulated links are gone. To avoid this disastrous typo, create a macro to append to your bookmark file. I know, we haven't talked about user defined macros yet, but we will. And when we do, you should write a "bookmark append" macro that looks like this.

function+bma {
  w+ $bookmarks
}

Now you can type <bma to add a link to your favorites, and you don't have to worry about typos. It's shorter than `w+ $bookmarks' anyways. We'll return to this topic when i introduce macros, actually functions, that are defined in your config file.

Cookies

Some websites serve cookies, which your browser is expected to retain and pass back during subsequent exchanges. In fact many websites simply won't work without cookie support. Therefore edbrowse always accepts cookies.

Note that only Netscape-style cookies are supported. However, this is the most common flavor of cookie. It will probably meet your needs.

Persistent cookies are stored in a file, usually $HOME/.cookies, and are thus available for subsequent edbrowse sessions. These cookies are used to store long-term information about you, such as your login and password into amazon.com. Hence your .cookies file should be mode 0600. In fact the file is created mode 0600 for your own protection.

You probably won't need to view your .cookies file, ever, but it is text based, and can be edited directly if you wish.

Secure Connections

Edbrowse supports the most common method of encrypting web traffic, HTTP over SSL/TLS, colloquially known as secure http. Websites that support secure http have URLs of the form: https://secure.server.com. Notice the protocol is https:// rather than http://. The extra s stands for "secure". The traffic is encrypted, i.e. mathematically scrambled, and cannot be intercepted by a nefarious third party.

Edbrowse will verify ssl connections if you supply a file of ssl certificates. This is an antispoofing measure, to make sure a hacker isn't posing as your bank, trying to steal your account numbers and passwords. You can grab a certificate file here, but I don't always keep it up to date. On some Linux distributions, you can run `cd /etc/ssl/certs ; cat * >../edbrowse-certs' to capture ssl certificates that are as current as your linux system. If you don't have or create this file, or, if you don't specify its location in your config file, you will not be able to verify secure connections, and you will be warned accordingly. Some browsers don't have this feature at all, so it's not the end of the world, but in general it's a good idea to verify your secure connections, unless it prevents you from getting to a website whose authenticity you accept at face value. In that case you can use the vs command to turn the feature off. This is a toggle command; type vs again to turn the feature on. For another method of disabling verification on a site-by-site basis, see the novs directive in the configuration file.

Never send sensitive information, such as social security numbers or credit card numbers, over an insecure channel. Make sure the form is using ssl. How can you tell? The submit button will have the word "secure" added to its text.

This is similar to the lock icon that Explorer uses to tell you that your connection is secure, although my system is not quite as foolproof. A website could fake you out by putting the word secure in the submit text.

Note that generic buttons, besides the submit button, can also submit your form, through javascript. I don't know if that button is going to submit the form or not, and I don't want to put the word "secure" on every button on the page. I only add it to the submit button, but if that button is secure then the others are probably secure too.

If you have logins on secure servers, such as PayPal.com, you must keep your password absolutely safe. Never send that password over an insecure connection. It becomes as valuable as your credit card numbers. I have a special password that I use for my secure logins, and only for those logins. I use other, expendable passwords when the connection is not secure.

Please don't fall for all those phishing email scams that tell you your login has expired, and would you please log in again using this convenient form. The mail is forged to look legitimate, and the form actually sends your secret password to a thief, who then raids your account. A reputable company will never ask you to login through an email form. They will always tell you to go back to the website and log in there.

Internet security is complex, to say the least, and it is beyond the scope of this document. If you have any questions about it, please send them to me directly. As a general rule, secure http is really quite safe, and you can use it to send sensitive information across the Net. It's probably safer than giving your credit card number to the clerk on the phone, who use to take your order before there was e-commerce. so it's ok to be a little bit paranoid, in fact it's probably a good idea, but don't let that stop you from making your online purchases.

FTP Retrievals

This browser supports the retrieval of ftp files and directories. You can provide an FTP URL like: ftp://ftp.random.com/tarball.tar.gz and the file will be fetched. It doesn't matter whether you type in the url yourself, or it is a hyperlink on a web page. The file is retrieved, and placed in a new buffer. Type w/ to save it locally, which is what a traditional ftp client would do. Of course the download could fail, in which case you will receive an error message. If it was simply interrupted, due to some internet glitch, you can always issue the command again and hope for better luck.

By default, edbrowse uses the account name "anonymous" and the password "ftp@example.com" for ftp connections. However, you can override this in the url, and some web pages take advantage of this feature. For example, let's say you want to access the file /opt/foobar on whatever.localdomain. This file isn't readable by anonymous users. You have to log in as a real person. Within edbrowse, you might use the command:

e ftp://chris:xxx@whatever.localdomain/opt/foobar

The ftp connection will be made as user "Chris", with password "XXX".

Some ftp URLs point at directories, not files. If you visit one of these, and it is located on a Unix-like server, you will receive the listing as an html file with hyperlinks. You can visit the directory members just as though you were exploring a website. If the server does not run some flavour of Unix, you will receive the directory listing in plain text.

The ftp mode, i.e. the style of data connection, can be either active or passive. One works well when the client is behind a router, and the other works well when the server is behind a router. You can specify ftp mode active by entering the command `fma', or ftp mode passive by `fmp'.

Proxy Servers

A proxy server is a web server that sits between your web browser and remote websites. It intercepts your requests for web pages and forwards them to the system that hosts the site you are browsing. Proxy servers are used for a variety of reasons. Here are just a few of them:

Efficiency. The proxy server may be able to store previously-accessed webpages (known as caching). If your connection to the proxy is faster than your connection to the rest of the network, then caching insures that frequently-accessed web pages load quickly.
Policies. Some firewall administrators require their users to use a proxy server.
Anonymity. There are so-called anonymizing proxy servers that hide your IP address from the websites that you browse.

If you wish to use a proxy server for http traffic, simply set the proxy option in your configuration file. Provide the proxy's hostname and port, separated by a colon. For example:

proxy = http * proxy.campus.edu:3128

All http traffic, for any domain (indicated by *), is routed through proxy.campus.edu on port 3128. Note that proxies often listen on ports other than port 80. Squid is a proxy server that comes bundled with some Linux distributions, and it uses port 3128 by default.

Protocol and domain can be specified, or either can be replaced with a * for any protocol or any domain. A missing domain is treated as a * (all domains), and a missing protocol and domain matches everything. Such an entry should be last in the list of proxies in your config file, since proxies beyond this point have no meaning.

The word DIRECT in the third position is a direct connection, with no proxy server. These are usually placed at the top of the list, to access certain internal domains; then the proxy server is specified for all others.

proxy = http|https hr.mycompany.com DIRECT
proxy = http|https|ftp * proxy.mycompany.com

As shown in this example, different protocols can be separated by pipes. Beware, placing a * in the protocol field embraces all protocols, including ftp, pop3, and smtp. Mail will attempt to pass through this proxy, just like web traffic.

Frames

Frames are a mechanism whereby a web page can fetch and display several other web pages on the screen at once. Each subpage is called a frame, and lives in its own space on the screen. Sometimes the frames are top middle and bottom; sometimes they are left middle and right. Edbrowse presents these frames as hyperlinks, and you can call up each in turn, or jump straight to a specific frame if you are familiar with the website. Usually the top frame is navigational in nature, and the bottom is a legal disclaimer/copyright notice, so you're just as happy to skip these and cut to the chase. On rare occasions, and I've only seen this once, you must open the top frame, whether you are interested in it or not, because that particular html page sets some cookies that you need to run the website.

A page of frames might look like this. I think you can guess which one to click on.

Frame {navigation}
Frame {main}
Frame {bottom}

I thought about a FetchFrame feature that fetches all the frames and presents them in one go, just as they are all displayed on the screen for a sighted user, but this feature is very difficult to implement, and so far nobody seems to want it. So as you might imagine, it's way down on the list.

PDF

PDF is a portable document format developed by adobe.com. A document, using this format, and designated with the suffix .pdf, prints the same way, every time, on every printer. That is its principle advantage. However, pdf files are often unreadable by blind individuals. Fortunately, there is a utility, pdftohtml, that converts pdf to html. Sometimes it works well, and sometimes lines are broken in bizarre places; but it's better than nothing. If you install this package, edbrowse will automatically convert pdf files into html files, and then render them as text.

Note, there are also pdf to text converters that skip the middle html step, but I wanted to preserve the functionality of any hyperlinks that might be embedded within pdf. So I thought it worthwhile going through html, even though it adds a little complexity.

Chapter 5, Javascript

Introduction to Javascript

Javascript is software, embedded in the web page, that runs on your computer. These functions do not run on the web server, they run right on your box. Hence it is sometimes called client side javascript. And javascript can do almost anything. You could, for instance, download a web page that includes a javascript function to compute the digits of pi, right on your computer, although that would be rather silly. Most of the time javascript is used to validate and/or modify forms or create fancy visual effects.

The first version of edbrowse, written in perl, ignored javascript completely, and that was ok for a while, but more and more sites use javascript, and these websites were simply inaccessible. Most of the e-commerce sites fall into this category. If you want to make purchases, or manage your bank account online, you need a javascript enabled browser.

The second version of edbrowse, written in C, and indicated by a version number that starts with 2, included a home grown javascript compiler and engine that I wrote myself. This worked pretty well, for a spare time project, but javascript evolves, like any other language or standard, and I just couldn't keep up.

The third version, which is the "latest and greatest", uses a javascript engine from Mozilla.com, which is open source under the Mozzilla Public License. This allows me to leverage, rather than reinvent, some 70,000 lines of code - and somebody else is maintaining that code as javascript evolves. this illustrates the power of the open source community.

Edbrowse does not support all the features of client side DOM javascript, and it never will. For example, many websites use javascript to change images on the fly as you move your mouse around the screen. This has no meaning in edbrowse. Other websites bring up multiple windows, and let you control the contents of subwindows using icons in a master window. This would be very difficult to simulate in a command-line environment - so I don't try.

As a rough approximation, I expect to implement about half of javascript, hoping that will satisfy most of the websites we care about. This is still a work in progress. If you submit a form, or go to a hyperlink, and nothing happens, absolutely nothing, then the web page is probably trying to use javascript features that are not yet implemented. Raise the debug level to 2 or higher and push the button again to see the javascript errors. Then, if you wish, disable javascript with the js command and try again. However, the website may not behave properly or as expected with javascript disabled. See the disclaimer at the top of this users guide.

You can also disable javascript for specific domains. This will be discussed later in the edbrowse config file.

Validating Forms

When a web page asks for user input, it often includes a "validate&submit" function. This function checks your entries: have you filled in all the required fields - is there an @ sign in your email address - are there 5 digits in your zip code - and so on. If there are no errors, it submits the form. These functions usually behave well under edbrowse. When you push the button, you will either see the error message, or the form will be submitted, and a confirmation page should appear shortly.

In some cases the javascript function reformats your data. It may fill in some of the hidden fields for you, or it may compute sales tax and adjust the purchase price accordingly. This is more than form validation, this is active javascript, and the data won't be right unless the javascript runs properly on your computer. More and more sites are using active javascript, so a javascript enabled browser is a must.

Some javascript functions manage menus dynamically. Make a primary selection, and javascript populates a second dropdown list with options corresponding to your first selection. You can now make a second selection, which further refines your search. If the first menu presents "meats", "vegetables", "fruits", and "grains", and you select fruits, the second menu might contain "apples", "oranges", "lemons", etc. Javascript makes this possible. Unfortunately these dynamic menus are not yet supported by edbrowse.

Popups and Popunders

A popup is a window that suddenly appears in front of the window you really want to see. It usually advertises something, and is often annoying, although in rare cases it is a necessary aspect of the website.

You have a distinct advantage over all those other surfers with their graphical browsers. The popup window does not open automatically. Windows are not well defined in a command line browser anyways, so it would be folly to try to implement this feature, or any other aspect of multi-windowing for that matter. Instead, the popup appears as a hyperlink at or near the top of the page, and you can click on it if you like, or ignore it. This is similar to the background music, described in an earlier section. The popup link might look like this.

{Popup: specials()}

Popunders are not as common. They appear after you have closed the window. In some sense they are hidden "under" your web page, and when you close the page they pop out. In edbrowse, this does not happen automatically. When you type q, you quit, and that's the end of it. As you might expect, the popunder function appears as a hyperlink. It might look like this.

{On Close: foo()}

Remember, the popup link is a simple html link to another web page, while the Close link calls a javascript function on the current page. However, this javascript function usually links to another web page, so don't be surprised if you find yourself somewhere else on the internet. In either case, popup or popunder, you can use the back key to return to the page you were looking at. If you need access to a popup window and the main page in parallel, use the M command.

Javascript also includes timer functions, that fire after a specified number of seconds. These are implemented as hyperlinks at the top of the page. They usually manage visual effects, and are rather pointless. The following timer might draw a star burst on the screen in 16 seconds.

{Timer 16: starburst()}

Onchange and Undo

When you set or change the value of an input field, the form can optionally call a javascript routine. It doesn't usually, but it can. In an earlier example, I described a primary and secondary menu. When the first selection is made, e.g. fruits, javascript sets up the second menu commensurate with your primary selection, using the onchange feature.

This is all well and good, but edbrowse has something your graphical browser does not, the undo command. And in this context, it doesn't really work. Change fruits to vegetables, and the second menu presents carrots and peas and the like. Now type u for undo, and the first field reverts back to fruits, but the second menu still contains vegetables. This is because the undo feature was originally written for the text editor. It simply puts the text back the way it was, and has no capacity to "undo" the side effects of javascript code. So the moral of the story, if there is one, is to set and change the values of an input form directly, and avoid the undo command, unless you're pretty sure there are no javascript side effects associated with that field.

Chapter 6, Edbrowse Scripts and the Configuration File

Config File

At startup, edbrowse reads and parses a config file. It's ok if this file is missing, but if it is present, it has to be syntactically correct. An error in your config file will cause edbrowse to abort. If this happens, use the -c option to edit the config file directly. This bypasses initialization, and places you in the editor, with the config file preloaded. Make your corrections, write the file, quit, and invoke edbrowse again. Repeat these steps until there are no errors, and you see the words "edbrowse ready".

The config file is in $HOME/.ebrc. The "eb" is shorthand for edbrowse. You cannot rename the config file; it is what it is. This is true on Windows as well, so make sure HOME is set.

The config file is line oriented. Lines beginning with # are comments, and are ignored. Blank lines are also ignored. All other lines fall into one of 6 categories.

Define an option, using the keyword=value syntax.
Define an edbrowse script that can be invoked from the command line, or from another script.
An edbrowse command, that becomes part of an edbrowse script.
Establish an email account. This will be described later, when we talk about the email client.
A mail filtering rule.
Describe a table or a view in an sql database.

Keyword = Value

The best documentation is an example, so let's dive right in.

Recall the section on cookies. You'll need a file, often called a cookie jar, to store your cookies. The line that establishes this cookie jar might look like this.

jar = /home/eklhad/.ebsys/cookie-jar

This is a simple keyword = value syntax. It's ok if the filename has embedded spaces, or even an equals sign. No need to quote it.

When edbrowse sees this line in its config file, it records the location of the cookie jar, and it checks the validity of that file. If the file is a directory (or something weird), or is otherwise inaccessible, edbrowse prints an error message and aborts. If this happens, use the -c option to edit your config file, and change the cookie jar.

Here are some additional name=value directives. Some of these are used to set up an email account. This will become clearer when we talk about the mail client.

certfile = /home/eklhad/.ebsys/ssl-certs

Specify the file that holds the certificates for secure connections. This was explained in the section on secure connections.

maildir = /home/eklhad/mbox

Go to this directory when fetching mail. thus, if you save a mail message, you'll always know where it is.

webtimer = 30
mailtimer = 180

Wait 30 seconds for a response from a web server, and 3 minutes for a response from the mail server. A time value of 0 waits forever. Sorry, there seems to be no way to interrupt a socket call, other than control backslash (quit), which kills the entire program. That's why these timers are here - so you don't hang forever. The defaults are 20 and 0 respectively.

linelength = 1000

Change the length of a printed line. The default is 500, and the minimum is 80. A line that exceeds its length is faithfully represented internally, but is truncated on the display, as indicated by trailing dots...

nojs = space.com

Specify domains that don't need javascript. You can eliminate annoying error messages and speed up access by disabling javascript for certain websites. Javascript will not be run on pages within these domains, nor will it be fetched from these domains.

The above directive will also drop javascript from subdomains such as www.space.com.

You can include a path or partial path after the domain, as in space.com/popups. This will block the popup ads that you don't want to see, which often generate edbrowse errors in any case. Subdomains are not considered when a path is given; the domain must match exactly.

jspool = 32

Allocate this many megabytes for javascript use. The default is 32 meg, as shown above. The minimum is 2 and the maximum is 1000. A couple of youtube pages will consume 4 meg of javascript, so don't aim low unless you are just using edbrowse to edit files. If you spend all day browsing, you better aim high, because edbrowse could unceremoniously exit if it runs out of javascript space.

novs = somesite.com

Indicate hostnames for which SSL certificate verification should never be performed. This directive is useful for sites that use self-signed certificates, since these cannot be verified. It should probably not be used for anything serious, such as a site that is going to receive your credit card number.

inserver = pop3.some-domain.com
inport = 110
outserver = smtp.some-domain.com
outport = 25

Specify the machines and ports that you use to fetch mail and send mail respectively. You can use the fully qualified domain names, or aliases, as defined in /etc/hosts. The ports shown here are standard, and usually correct. They are also default in edbrowse, so you need not set inport and outport unless they are different from that shown above. Note, these keywords are only valid in the context of a mail account, as indicated by mail{}.

A star in front of the port number, e.g. outport *465, means the socket is to be encrypted for security. When the smtp port is encrypted, login authentication is assumed. No other authentication method is implemented at this time.

An arrow in front of the port number, e.g. outport ^587, encrypts the socket, but only after an initial handshake in the clear. This is the hotmail protocol, and it is as secure as *465; just different.

Use +587 to authenticate yourself without encryption. This is sometimes done when you are directly connected to the mail server, and traffic is not flowing across the internet; but the server still wants to make sure you are you.

nofetch

Do not fetch mail from this account through the -f option.

Specify the login and password that edbrowse uses to fetch your mail.

from = Karl Dahlke
reply = karl.dahlke@some-domain.com

These lines are added in to the emails that you send. They tell the recipient who you are, and how to reply. It is illegal to use these lines for deceptive purposes. Make sure they identify you, and that the reply address is indeed one of your email accounts.

adbook = /home/eklhad/.ebsys/address-book

When specifying recipients, you can use aliases instead of full email addresses. Aliases are checked against your address book, a line oriented text file that is specified here. If your address book contains the line

fred : fred.flintstone@bedrock.us : 226 cobblestone way : 5553827

then you can use the alias fred, and edbrowse will substitute Fred's email address when sending mail. Only the first two fields in the address book are significant as far as edbrowse is concerned. Other fields might hold phone/fax numbers, street address, anything you like.

User Agent

Every time you fetch a web page from the internet, your browser identifies itself to the host. This is done automatically. Edbrowse identifies itself as "edbrowse/2.2.9", where the number after the slash indicates the current version of edbrowse.

All well and good, but some websites have no respect for edbrowse, and no concern for Internet accessibility. They won't even let you in the door unless you look like Explorer or Netscape or one of the major players. StartPage.com, a front end to Google, is one example.

So what do we do? We lie of course.

agent = Lynx/2.8.4rel.1 libwww-FM/2.14
agent = Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)

You can specify different agents in your .ebrc file, and activate them with the `ua' (user agent) command. If the following lines are in your .ebrc file, you can type ua1 to pretend to bee lynx, and ua2 to pretend to be Mozilla. Type ua0 to resurrect the standard edbrowse identification. Let's hope there aren't too many asinine websites out there that force us to lie.

This feature was written pre-javascript, and is not 100% compatible with the navigator object. Navigator.userAgent returns the correct string, according to the agent you select, but other aspects of the navigator object do not change with the agent, and they should.

Edbrowse Functions

You can bundle a set of edbrowse commands together under one name, similar to a macro. If the following appears in your .ebrc file, you can type <ud to undos a file.

function:ud {
,s/\r$//
}

The new < command is suppose to remind you of redirection, i.e. read input commands from this macro. And macros can invoke other macros, by using a < command in the body. Almost any edbrowse command is fair game. A macro can fetch web pages from the internet, fill out forms, submit requests, and send mail.

Unlike many things in the Unix world, macro names are case-insensitive. Thus: dostuff, DoStuff, doStuff, ad nauseum, are all equivalent. Also, if you define a macro with the same name multiple times, the first definition wins. Edbrowse silently ignores subsequent definitions.

Normally, edbrowse marches along, whether a command succeeds or not. However, you can tell a macro to stop if it encounters an error by using this syntax.

function+hw {
/hello/p
/world/p
}

The plus sign after the word function means each command in that function must succeed. If there is no line containing the word hello, the function stops. If there is such a line, then the function moves on, and looks for a line containing the word world.

Other than some indenting, the format is fixed and unforgiving. You cannot, for instance, put the opening brace on its own line, as K&R would suggest.

These functions, or macros, can accept parameters. Let's make the previous function a bit more general.

function+hw {
/~1/p
/~2/p
}

Reproduce the earlier behavior by typing <hw hello world. Or search for different lines by invoking <hw foo bar. The latter looks for a line containing foo and prints it, and if this succeeds it looks for a line containing bar and prints that. Let's build a more useful function, a shortcut to google. The variable ~0 represents all the arguments together. In this case ~0 is the keywords you pass to google, for your search.

function+gg {   b http://www.google.com
  /<>/ i=~0
  +i1*
  /^1/+ }

With this in place, you simply type `<gg kangaroo habitat' to find out where kangaroos live.

Finally, an edbrowse function can branch, based upon the success or failure of the previous command. Use if(*) for success, and if(?) for failure. The ? is suppose to remind you of the question mark that you get when an edbrowse command fails. The following looks for a line containing foo, and if it finds one, it advances to the next line, and if that line contains bar, it deletes it.

function+silly {
  /foo/
  if(*) {
    +s/bar//
    if(*) {
      d
    }
  }
}

I deliberately used function+ instead of function: in the above example. Normally the + will cause the function to abort if an edbrowse command fails. However, if the result of that command is used by a control statement, the function does not abort. This is similar to set -e in the shell, which causes a script to abort, unless the result of the command is used by an if or while statement.

Other control statements include while(*) while(?) until(*) and until(?). The following deletes lines from the top of the file, as long as they contain foo or bar. It then deletes the blank lines at the top.

function+topclean {
  until(?) {
    1g/foo\|bar/d
  }
  until(?) {
    1g/^$/d
  }
}

You can use loop(100){ ... } to repeat a set of commands 100 times. I haven't used this feature very often.

The Init Script

The script named "init" is run at edbrowse startup. Use this to establish your default settings - even read in your bookmark file, so your favorites are close at hand. Here is an example.

function+init {
#  turn debug off, so we don't see any status messages from this script
db0
#  Assume directories can be modified
dw
#  Put beginning and end markers around listed lines
el
#  Let session 99 hold your favorites, ready to surf.
e99
b $bookmarks
#  back to session 1, ready to go to work
e1
#  Restore debug level to something reasonable, 1 or 2
db1
}

This is just a sample. Put anything you like in your init script, or leave it out altogether if you are happy with edbrowse out of the box.

Mail Accounts

The next chapter describes edbrowse as a mail client, so let's use the config file to define some email accounts. You can define several accounts as necessary. They are implicitly numbered, in the order they appear in the config file. So the first mail account becomes #1, the second becomes #2, and so on.

We already discussed the relevant keywords for an email account. All you have to do is enclose them in mail{...}, like this.

mail {
  default
  inserver = pop3.some-domain.com
  outserver = smtp.some-domain.com
  login = eklhad
  password = secret
  from = Karl Dahlke
  reply = karl.dahlke@some-domain.com
}

The "default" directive makes this account the default. One and only one account should be labeled default. If you do not specify an account when fetching or sending mail, the default account is used. Beyond this, the default smtp server is always used to send mail, no matter which account you specify. If account #1 is default, and you send mail using account #3, the name and reply address from account #3 will be sent to the recipient, and if he replies, his reply will be sent to your third email account. However, the smtp server from your default account is used to physically transmit the message. There are technical reasons for doing this, having to do with security. However, if an account has its sendmail stream encrypted, then security is not an issue, and we can use these settings to send and receive mail. Here is a typical configuration for Google's gmail.

mail {
  inserver = pop.gmail.com
  outserver = smtp.gmail.com
  inport = *995
  outport = *465
#  Google also supports outport = ^587
  login = somebody@gmail.com
  password = secret
  reply = somebody@gmail.com
  from = Karl Dahlke
}

Mail filtering, by sender and/or subject, is controlled by your config file as well. This will be described later, as part of the fetchmail client.

Mime Descriptors

Mime types are determined by the extension of the file, or in some cases the protocol. They might tell edbrowse to use /usr/bin/play to play file.wav or file.voc, and /usr/bin/mpg123 to play file.mp3, and so on. Rather than repeat it all here, I suggest you look at the mime {...} sections in the sample config file provided below. Linux users can probably copy this part directly into their own config file. It generally does the right thing. Here is one example.

mime {
type = audio/mp3
desc = audio file in mp3 format
suffix = mp3
program = mpg123 -q -
}

If you have pulled down a file from the internet that ends in .mp3, type `pb' to play the contents of the buffer. The data is piped into the program, whose options tell it to expect data from standard in. If the mp3 player works better from a file, use a trailing percent to turn the buffer into a temp file with the proper suffix.

program = mpg123 -q %

The command `pb' could mean process buffer,as well as play buffer. For instance, a mime descriptor might unzip a zip archive.

mime {
type = data/zip
desc = a compressed zip archive
suffix = zip
#  use %, since unzip cannot read data from standard in
program = unzip %
}

A Sample Config File

The best documentation is an example, so I have provided a sample config file with fake data. It is well commented. You can download a copy here.

Chapter 7, Mail Client

Send Mail

Email the contents of your current editing session to someone else via the `sm' command. Your email accounts are described in your config file.

Most mail clients can automatically append a signature to outgoing email messages; this one is no exception. In fact, you may have a different signature for each of your mail accounts. Thus, you can use one signature for work email, and another for personal email. When sending mail from account N, edbrowse first checks for a file named .signatureN, in your home directory. E.G., when sending from account 2, edbrowse looks for .signature2. If that file is not found, edbrowse looks for a file named .signature in your home directory, appending its contents if it is found.

The recipients, attachments, and subject must appear at the top of your file. The sm command is picky, so observe the following syntax carefully.

To: fred.flintstone@bedrock.us
CC: barney.rubble@bedrock.us
account: 1
attach: hollyrock-brochure.pdf
Subject: Hollyrock Vacation
Come visit Hollyrock.
Brochure attached.
Sincerely,
Rock studios incorporated.

The account line is optional. It tells edbrowse to use the first mail account specified in your .ebrc config file. If you don't include an account: line, edbrowse uses the default account, indicated by "default" in your .ebrc file.

Typing sm5 causes edbrowse to use account number 5. This overrides the account: line, if there is one. It is often easier to type sm5 than to insert an account:5 line. Note, sm-5 is the same as sm5, but the .signature file is not included. Sometimes you want a different ending on your email, for a particular situation.

Use the attach: lines to add attachments to your email. Each line should specify a file to attach, and they must appear before the subject line. If the filename is simply a number, the corresponding edbrowse session is used instead. Return to the earlier example, where we are trying to attach a Hollyrock brochure. Another way to do this is to switch to session 2 and read in the pdf file. This is a binary file, but that doesn't matter. Don't try to edit it, just hold it in session 2. Then switch back to session 1 and use the line attach:2.

If you use attach:2, instead of attach:hollyrock-brochure.pdf, Fred will notice one difference. The attachment is not prenamed for him. If he wants to save the attachment, he'll have to come up with a filename himself. Other than that, the email looks the same.

The alt: directive is almost the same as the attach: directive. If you use alt:, the attachment is not treated as an adjunct file. Instead, it is an alternate representation of the same email. The mail client will use the alternate representation if it can. This is usually used to send multimedia email, with hyperlinks and pictures etc. The primary email is in plain text, but the alternate attachment is in html or rich text. Unless something is amiss, the user sees the alternate presentation, complete with graphics and hyperlinks.

Like attachments, the alt: line can refer to a file or an edbrowse session.

As you may have guessed, the to: lines establish the recipients. Please don't specify more than a few recipients. Some servers, my mail server included, set a hard limit of 100 on the number of recipients. If you exceed this number, the remaining recipients simply don't get their mail. Best to limit your "to:" lines to a couple dozen.

Remember that CC stands for carbon copy. This tells the recipient, in this case Barney Rubble, that he is receiving a copy of the email for his convenience; he need not respond. Use BCC for blind carbon copy, so that each person does not see all the other email addresses.

When specifying recipients, you can use aliases instead of full email addresses. Aliases are checked against your address book, a text file that is specified in your .ebrc file. If your address book contains the line

fred : fred.flintstone@bedrock.us : 226 cobblestone way : 5553827

then you can simply write "To:fred" at the top of your file. Only the first two fields in the address book are significant as far as edbrowse is concerned. Other fields might hold phone/fax numbers, street address, etc.

Note that "Reply to fred" is an alternate syntax for "to: fred".

Some web pages include sendmail links. They look just like other hyperlinks, but they send email to the appropriate person. Click here for technical details.

If you activate a sendmail link, you will be placed in a new editing session with the "to" and "subject" lines preloaded. If the url did not specify a subject, the subject is simply "Hello". You will probably want to replace this with a better subject line. Write your mail message and type `sm' to send it on its way. Then type ^ to return to the web page you were looking at. Note that the body of your email may also be preloaded with some default text, so be sure to check before you write and send.

You can include attachments by placing "attach:" lines at the top of the file, assuming the recipient can handle these attachments. This might make sense when the sendmail link is asking for {bug reports} - you might attach a program and/or its output. Yet this is somewhat unusual. Most sendmail links expect a few sentences of feedback, and nothing more.

Some web forms are submitted via email, rather than a direct http transmission. Edbrowse handles this properly. It shows you the destination email address, sends the mail through smtp, and tells you to watch for a reply. This reply could be an email response, or even a phone call if you provided your phone number in the form. But remember, nothing happens immediately. You are still on the same web page, still looking at the same submit button. Don't push the button again! The mail has been sent, and you'll be hearing from the company in the next few days.

Send Mail Client

as described in the previous section, edbrowse incorporates the features of a mail client. In addition to the interactive `sm' command, you can send mail in a batch fashion from the command line. If fred and barney are in your address book, and you want to send them mail from the command line, with an attachment, using your primary email account, do this. (I'm assuming e has been aliased to an edbrowse executable.)

e -m1 fred ^barney hollyrock-notice +hollyrock-brochure.pdf

The ^ in front of barney means he is a CC recipient. Use "?barney" for BCC.

Files with a leading + are assumed to be attachments. If they are binary they will be encoded properly, according to the mime standard. A leading - indicates an alternate format, like this.

e -m1 fred ^barney hollyrock-notice -hollyrock-graphical.html

Remember, you can specify several mail accounts in your .ebrc file. The first account is indicated by index 1, as in -m1, and so on. You can make life easier with some aliases in your .bashrc file.

#  My mail, home account
alias mymail="e -m1"
#  My wife's account.
alias wifemail="e -m2"
#  My work account.
alias workmail="e -m3"
#  mail is obsolete
alias mail="echo use mymail, wifemail, or workmail"

Retrieving Mail

If edbrowse is invoked with the -f option, it will fetch mail from all accounts, except the ones that you have marked nofetch. Alternatively, you may specify a number following -f, in order to fetch mail from a single account. For instance, -f1 will fetch mail from your first pop3 account, ignoring all the rest. When it has finished retrieving mail, the program prints the total number of messages that it retrieved. Fetched messages are stored in a directory named unread/, relative to the directory specified with the maildir setting in your .ebrc file. You may read them, as described in the next section.

Interactive Mail Reader

If edbrowse is run with the -m option, and no other arguments, it is an interactive mail reader, allowing you to examine mail from your directory of unread messages. If you wish to retrieve and read in one step, you can combine the -f and -m options. In edbrowse version 3.4.6 and earlier, fetching and reading were not separate operations; -m performed both. Existing users beware, this is a notable change to the interface!

The first thing it tells you is how many messages you have. If there are no messages it says "No mail", and exits. If there are unread messages, it shows each one in turn. For each message, it displays some header information (such as subject and sender) and the first page of text, and then presents a prompt. A '?' prompt means the message is complete -- a '*' prompt means there is more text to read. You respond by hitting a key. Keys have the following meaning.

?	summary of key commands
q	quit the program
space	display more text
n	read the next message
d	delete this message
j	junk this message, and any messages with this subject, for 10 days
J	junk this subject for a year
w	write this message to a file and delete it
u	write this message unformatted to a file and delete it

The last two commands, w and u, require a filename, which you enter. The reserved filename "x" is essentially /dev/null, whence the mail message is discarded. You can save the mail message to x (discard) and still save the attachments. If the file is anything other than x, and the program cannot write to the specified file, it asks you for a new filename. Note that capital X will do the same thing.

In practice, you might save a message with w, later realizing that you need something, such as a hyperlink or attachment, which is only available in the unformatted message. When you use the w command to write a formatted message to a file, edbrowse retains an unformatted copy as well. These copies are placed in the directory $HOME/.Trash/rawmail, with file names consisting of 5 digit numbers. When you save a formatted message, you'll notice some text like "Unformatted 12345" at the end of the file. This tells you where to find the original, unformatted message: $HOME/.Trash/rawmail/12345. As mentioned previously, it's a good idea to run a weekly cron job to clean out the trash bin; if that cron job removes subdirectories, it will insure that raw mail does not accumulate indefinitely.

The junk command adds a filter rule to your config file, sending any message with this subject to oblivion. This is useful when you don't want to read a particular discussion thread in a mailing list. Use the j command to junk it for ten days. If the subject pops up again in two months, you might be interested. Use the capital J command to junk a subject for a year. This is typically used for spam subjects, such as "Cheap meds for you."

Formatted Mail

When mail is retrieved, it is saved in the directory of unread messages without any formatting applied. In other words, it is a faithful copy of the message, as it existed on the server. When you read it by invoking edbrowse with the -m option, edbrowse displays it, after applying various formatting rules. You can save the message, in either its raw or formatted state. Selecting `w' at the interactive mail prompt writes the formatted version to disk, while selecting `u' saves the unformatted version.

When an html mail message is rendered, javascript is disabled. A bug in the JavaScript machinery cannot crash the client, and malicious JavaScript cannot cause problems. There really isn't much loss here, because you couldn't activate the links or fill out the form anyways. If you want to "interact" with this email message, you have to save it unformatted to a file, finish your email session, edit that file, and type b to browse. Now the html is active, as though you were looking at a web page on somebody's site.

Mail Filtering

The config file supports a modest level of mail filtering. You can redirect incoming mail based upon the sender, the receiver, or the subject. These parameters are established in your config file. A mail filtering rule has the form:

matchString > destinationFile

Actually the > is a bit misleading. If the file exists, the email is appended to the end; the file is not truncated. So perhaps we should use >>, but I didn't want to bother with the extra greater, over and over again.

The destination file is interpreted relative to the mail directory, which is set in your config file. Of course you can override with an absolute path if you wish.

A mail filtering rule always occurs in the context of a filter block. For instance, if you wish to redirect mail from certain people, do this.

fromfilter {
fred flintstone > fredmail
fred.flintstone@bedrock.us > fredmail
jerk@hotmail.com > x
word@m-w.com > -wod
}

You can specify the sender's name, or his email address. It's not a bad idea to do both, in case he sends mail from some other account.

Notice that I didn't capitalize Fred Flintstone. Matches are case insensitive.

The file name "x" is special; it discards the mail entirely. You can use this to throw away mail from people who are constantly harassing you or sending you spam.

The last entry sends mail to -wod. The leading - is special; it means the mail should be saved to wod unformatted. This happens to be the word of the day from Merriam Webster. I like to save it unformatted, so I can browse it, and click on {audio} to hear the word pronounced. If an email contains hyperlinks, you may want to save it unformatted, so you can browse it later.

You can also filter mail based on the to: field. This is useful if you have several mail accounts, or mail aliases that are forwarded to your primary account. Here is a sample block.

tofilter {
support@my-side-business.com > support
sales@my-side-business.com > sales
@my-side-business.com > business
me@my-regular-dayjob.com > work
}

The third entry is a catchall address, saving any mail that is sent to that domain. Since rules are applied in order, support requests are stored in a file called "support", sales are stored in a file called "sales", and all other emails sent to your business are stored in "business".

You can use catchall addresses in the fromfilter block as well. Anything from this domain goes here.

You can filter based on subject, using the subjfilter{...} block. This can close the door on the virus de jure. If a virus uses a subject line of "Come Kiss Me", you can redirect "come kiss me" to x, and it's gone.

You can also use this feature to block warnings from other ISPs, complaining that you sent them emails with virus attachments. You didn't, of course, because you run linux, and a nonstandard mail client to boot. Your reply address was forged, so the virus warning was sent back to you, but you really had nothing to do with it. this is called backscatter. Lines like this one can throw these spurious warnings away.

subjfilter {
Come Kiss Me > x
Net Integrator Virus Alert > x
}

Finally, the reply address is checked against your address book. If there is a match, the mail is saved in a file whose name is the email alias. Consider a line in your address book that looks like fred:Fred.Flintstone@SomeDomain.com. When you receive email from this particular address, it is saved to the file fred. Thus you don't have to enter and maintain redundant entries in the filter. There is no need to include Fred.Flintstone@SomeDomain.com > fred. It's taken care of.

If you want to save mail from Fred unformatted, place a minus sign, i.e. -fred, in your address book. This is the same convention as the from filter. If you don't want mail from Fred to be redirected, but you still want to use the alias fred when sending mail, place an exclamation mark at the start, i.e. !fred.

If an email is redirected to a file, and it includes attachments, edbrowse will ask you what to do with those attachments, as though you had used the w command to save the mail yourself. If your friend has send you a program (attached) that he wants you to look at, just hit return to save it to the default filename. If your friend's mail has some kind of logo, or background image, that you don't care about, just type x and it will go away. If the image has a recognizable suffix, such as gif, I discard it automatically. If you really want these images, you'll have to save the email unformatted, and browse it later.

When browsing an email inside the editor, edbrowse offers you all the attachments, be they images or not. You can discard a single attachment by entering x, or all the image attachments by entering capital X.

Use the -p option to pass over the filters, as in `e -pm1'. I set this when looking at other people's mail, such as my wife's account. I don't want her mail sent somewhere else because it matches one of my filter rules.

Mail Reply

The `re' command prepares a formatted email for reply. The "Reply to" line (which must exist) is moved to the top. This contains the email address that you will reply to, and it is created when you format (i.e. browse) your email message. If this line is not present, the reply command will fail.

The "Subject:" line must also be present. This too is created when the email is formatted. After the re command is issued, the subject may move down the page, to make room for other email headers as follows.

If this email has just been browsed, and the unformatted data still exists within the current edbrowse session, re looks for the message id of the original email. This should be referenced in the reply. The resulting lines might look like this.

Reply to somebody@foo.bar.com
references: <4387A55E6AF43C4F9830C74EFECE9132022D0638@foo-bar.net>
Subject: What's in a name?

The reference line is not a line you should ever type in, edit, or delete. Just leave it be. If you participate in a discussion list, this line is important. It tells the server that your reply is indeed a reply, and that it should be linked to the referenced message. Using this information, the server maintains discussion threads. If you delete this line before sending your response, you will create a new thread, and that will only confuse and annoy the other participants. So - if you are going to reply to a message on a discussion list, take the time to save it unformatted, then browse, then reply. Leave the References: line alone, edit the body of the email, add your comments, and send.

Sometimes the references line will have two IDs separated by white space. The first is the beginning of the thread, the message that started this topic, and the second is the comment that you are replying to directly. Again, this helps list servers organize the emails into threads.

The command `rea' means reply to all, and this also uses the original email headers. All the recipients will appear at the top of your file. Some will be indicated by cc, if they were carbon copied. You can delete any of these recipients before sending your response. Of course you probably don't want to delete the first line, as that is the reply to address.

Note that the re command takes the file out of browse mode, and turns it into a plain text file. This supports text editing, to write your reply in the body of the message. If you want to start over from scratch, you can't just unbrowse, because it is not in browse mode. You must re-edit the saved mail message, browse, and reply.

Like everything else in edbrowse, you'll get use to it once you play with it.

Chapter 8, Database Access

Building edbrowse with Database Access

If you simply type make, you get edbrowse, with no database functionality. A separate target supports database access through odbc. Run `make edbrowseodbc' to link edbrowse with odbc. This assumes you have the unixODBC and unixODBC-devel packages installed on your machine. Another target, edbrowseinf, provides a direct link to an Informix database. This works, but is not generally supported. Other database specific edbrowse connectors could be built. You are basically implementing the interface described in dbapi.h, using the C database development toolkit provided by the vendor. For now, I will concentrate on the odbc interface.

Reading Tables

When a file name is of a certain format, with the http:// in front etc, it is deemed to be a url. Edbrowse does not look on your computer for the file; it goes out to the internet. Similarly, when the file name has a certain format, it is assumed to be a table or view in the database. If you have a table called customers, follow it up with a right bracket.

e customers]

This allows you to bring in the entire table, or portions thereof, one row per line, with fields delimited by pipes. If the result looks like a bunch of numbers and pipes, and you have forgotten the structure of the table, use the shc (show columns) command. The output might look like this.

Table customers, 536281 rows
1 *custnum int
2 firstname string
3 lastname string
4 birthdate date
5 sex char
6 email string
7 picture blob

The first column is a unique number that designates this particular customer. After all, two customers could have the same first and last name, and even the same birthdate. Serial numbers are always a good idea, and that usually becomes the primary key. This is indicated by a star just before the column name. When edbrowse changes or deletes a record, the primary key is used. I assume, at all times, that the key determines a unique record in the database, and that each record appears at most once in an editing session. You could read customer 37 in twice, thus having two copies in your buffer, but don't do it!

Note that edbrowse can support a primary key with two or three columns, such as a serial number and a modifier. I actually have some tables at work that look like this. However, more than three key columns are not supported. If the primary key comprises more than three columns, or if the table has no primary key, you will not be able to update or delete. Rows in the table are readonly.

The table syntax is more than just an identifier and a right bracket. You can follow the right bracket with a where clause. This is important if you don't want the entire table, especially if there are millions of rows. Here are some table commands and their meanings.

customers]
Set the buffer up for the customers table, but don't fetch any rows.

customers]*
Fetch all the rows in the table.

customers]37
Fetch the customer whose serial number is 37. The primary key is assumed; your table has to have a primary key if you are going to use this syntax.

customers]1=37
Fetch the row whose first column is 37.

customers]37-59
Fetch the customers with serial numbers between 37 and 59 inclusive.

customers]3=Smith
Fetch the customers whose last name is Smith.

customers]lastname=Smith
Same as above.

customers]last=Smith
Same as above. If the string uniquely gloms onto a column name, we're all set.

customers]last=Barn*
Fetch the customers whose last names begin with Barn.

customers]birth=01/01/1960-12/31/1960
Fetch the customers who were born in 1960.

It is usually best to edit with a blank template, i.e. without a where clause. Then you can read in whatever rows you like. Type an r before any of the strings shown above to read rows into your buffer. Note, you cannot read data from different tables into the same buffer, but you can switch to another editing session to look at another table, without losing the rows you are working on.

When reading rows into a growing buffer, you can usually omit the table, since it has to be customers] every time. For instance, you can bring in customer #738 by typing `r customers]738' or `r 738'.

If you want a clean slate, type `rf' to refresh the buffer. This brings you back to a template for the table, with no rows. WARNING - do not clear your buffer by deleting all the rows, as that will delete the corresponding entries in the database. This feature works just like directory mode - your edits are translated into actions in the real world, so be careful! Referential integrity might save you from this accidental delete disaster, if you routinely use this sql feature to link tables together, which is a good idea at many levels. But don't rely on it!

Now, how about the seventh column in our example, the one called "picture"? This is the customer's picture, a jpg image that is in binary, and cannot be easily folded into an editing session. Instead, it is stored in another buffer, e.g. buffer 9, and this is indicated by <9>. You can switch to session 9 and save the file, or throw it away.

2139|Fred|Flintstone|08/21/1969|M|foo@bar.bar.com|<9>

Binary columns are not fetched by default. You usually don't want them anyways. To fetch binary columns, use the fbc command. It is not possible to fetch more than one binary column at a time, so make sure your select only grabs one such column.

Data source

To do anything with the database, your config file must specify the name of the data source, the login, and the password. Data source must match one of the entries in your .odbc.ini file. Login and password can sometimes be omitted, if they are inferred from your identity on the computer, or they are set in the data source in your .odbc.ini file. Here is how the line might look if you are tapping into the retail database, where the customers table resides.

datasource = retail,mylogin,mypassword

This can be changed at run time by the ds= command. Make sure you do not refer to any old rows in buffer after you have switched to a new data source.

In some cases you can access other databases without changing the data source. For instance, you can read the parts table in the inventory database by calling up inventory:parts]. This is standard sql syntax for looking at tables in another database; I just pass it through.

Insert, Update, Delete

Now that we have run a few selects, let's modify some data. These operations are known as insert, update, and delete in the database world.

Adding database rows is substantially different from adding text. Since a row may contain a dozen fields, and you may not remember what goes where, edbrowse prompts you for each field in turn. It also checks the integrity of each field as you go, e.g. a date has to look like mm/dd/yyyy etc. If a row cannot be added because of a database error, edbrowse prints the error, and data entry continues; giving you a chance to reenter the row. Data entry stops when you enter a period all by itself, no matter what field you are on. The rows that were entered successfully will be present in your buffer, and the current line is the last entered row. Note that blobs cannot be entered at this time.

A row appears as you typed it; and this may differ from the actual values in the database. for instance, you might put a null into a field that is "default 3". Within the database, the value is 3, but there is nothing in that field in your buffer. Another field might truncate a floating point number, according to the precision of that column. Another field might be type serial, and turn 0 into the next serial number. And then there are triggers. There are many ways data can be modified as it enters the database. It would be better to refresh each row as it is inserted, so you could see exactly what is there, but this is not implemented yet. remember, you can always type `rf' to get an empty buffer, and then reread the rows you just inserted.

If the first column of the primary key is an integer, and you enter a 0, I select the next number in sequence. Some databases do this internally; some don't. I thought it would be convenient if I did it at the front end. There is a possible race condition here, if you and somebody else glom onto the same serial number, but it's not likely, and it will create a "duplicate key" error in any case.

Use the substitute command to update a row. Make sure you don't accidentally introduce an additional pipe, or remove a pipe. Key columns cannot be modified. If you are updating many rows with one command, through a range or through g//s, and an error occurs while updating the database, substitution stops in its tracks. The editing session will reflect the database, with some rows changed and others untouched. There are many reasons for these update errors, including datatype mismatch (e.g. pushing an integer into a date field), and check constraints (e.g. putting J in for sex instead of M or F). If you have any say in the database design, apply check constraints wherever they make sense. They will protect you from erroneous substitutions that would produce inconsistent updates.

Delete works as you would expect; delete a row, and the corresponding entry disappears. There is no undo command. It couldn't be done in any case, since you may have selected only part of the row (see below), and I wouldn't have all the data to put the row back. As mentioned before, referential integrity should be employed wherever it makes sense. As a last check, I only let you delete 100 rows at a time. Be careful, and run regular backups.

Table Descriptors

Suppose a table contains 100 fields. Displaying all those fields is awkward, to say the least. Sometimes you are interested in a group of 6 fields, and sometimes you are interested in another group of 8. You can set up virtual tables, similar to views, in your config file. The short name is the alias, and you can call up the table using this alias. It will contain only the columns you specify. Here are two descriptors for the aforementioned customers table.

table {
    tname = customers
#  cnm is my cryptic shorthand for customer name
#  I want to be cryptic here, cause I'm going to be typing this a lot.
    tshort = cnm
    cols = custnum,firstname,lastname
#  Specify the primary key, in this case, the first column selected.
    keycol = 1
}

table {
    tname = customers
#  All I care about here is customer and birthdate.
    tshort = cbd
    cols = birthdate,custnum
    keycol = 2
}

When inserting a row through one of these descriptors, remember that you are only specifying a subset of the columns in the table. The other columns will be null, or they will take on their default values as specified by the schema. If you receive a Not-Null error, it could be due to one of the other columns, which requires an entered value. It is usually safer to insert a row using the complete table.

Go SQL

If you know the trick, you can feed sql statements directly to the database, similar to the isql program that ships with odbc. Within a text buffer (not a table buffer), place a right bracket at the beginning of a line, then write your sql statement. Your statement can run across many lines, but it must have a semicolon at the end of the last line, or a leading right bracket at the beginning of the following line. Type g by itself to go, thus sending the statement to the database. This is similar to g on a web page, which goes to a hyperlink. Edbrowse reports any errors, or the number of rows modified. In a select statement, the fetched rows will appear just below the statement, with pipes delimiting the columns. All this happens in the current buffer. Delete what you don't need (it's just text), or save the data to a file and import it into a spreadsheet. For your convenience, fetched rows will be delimited by the labels 'a and 'b. Thus you can save the data with a 'a,'bw command.

Canned queries can be saved in a file for future use. Call them up, modify parameters, and go again, like a qbe screen.

] select * from customers, address
where custnum = addrnum and addrtype = "HOME"
and custnum between 500 and 600;

Wrap up

That concludes the user's guide. As you can see, edbrowse is a difficult program to master, but an easy program to use. I believe this is the key to success for any blind user or programmer. One can certainly paste a screen reader on top of an existing 2 dimensional program such as emacs or lynx, and get up and running quickly, but to be truly competitive in the workplace, or efficient at home, you need a command line interface. Edbrowse is an important step in this direction. It doesn't address the speech adapter, or other common applications such as spreadsheets, finances, or audio systems, but it does provide a high quality text editor and a fair browser and mail client. It's a good start.

Chapter 9, Other Command Line Utilities

A Linear Speech Adapter

There are many screen readers for many operating systems, but only one linear adapter, the Jupiter Speech System, which is available from github.

git clone https://github.com/eklhad/acsint

This is specific to Linux, and has not been ported to any other operating system. The user-space program should compile on any unix-like platform, but the trick is porting the device driver, which requires an in-depth knowledge of the OS internals.

Jupiter is unique among all adapters in that it does not (by default) transform the words or icons on the screen into speech or braille. It is not typically used as a screen reader. Yes, it is capable of reading screen memory and interacting with certain curses programs, but it typically runs in linear mode, capturing all tty output like a paper teletype and making it available to the blind user through speech. The buffer is 64K, and could represent several hours of work, depending on the nature and quantity of output during that time. Or it could represent just a few seconds of output, if a program prints "hello world" in an infinite loop.

I call this a linear adapter, rather than a screen reader, because it maintains a linear buffer of tty output, and allows the blind user to move about within this buffer and read the accumulated text. this is perfect if you spend most of your day in bash, edbrowse, sic, and other command line programs. You might wonder exactly what happened two hours ago, when you typed rm *. Were you in the right directory? Did you accidentally delete something important? Use the search function to find rm * in the tty log, then read the line before to see what directory you were in, then read the line after to review the computer's response, exactly as it took place two hours ago (assuming not a lot of output between then and now).

The tty buffers are maintained per console - thus Jupiter is most effective when combined with the 12 virtual consoles that come standard in Linux, accessible by alt F1 through alt F12. Switch to console 7 and you will be reading the tty buffer associated with that console, i.e. the output of the programs that have been run from that console. If you use screen(1) to maintain parallel sessions, the output of all your sessions will intermix in the tty buffer, and the result will be confusing. I therefore recommend 12 or 24 virtual text-based consoles to manage your parallel sessions.

Texting

My three kids are often not near their computers, but I like to stay in touch with them in real time, and texting is the perfect way to do that. We happen to be on sprint, which works really well. If my daughters number is 2485551111, then I text her by sending email to

2485551111@messaging.sprintpcs.com

Because of the way edbrowse works, every mail out has to have a subject line, but if there are no words after "subject:" then the generated email has no subject. This is appropriate when sending a text. The body of the message contains the text, but it is cut off at 120 characters, so be brief. Use sm- to suppress the .signature file. It would probably be cut off anyways.

When she replies it could come from the same address, or sometimes from this address if she has a smart phone.

2485551111@pm.sprint.com

So these are good lines to have in your address book.

# text to my daughter Beth and get replies therefrom
btxt:2485551111@messaging.sprintpcs.com
bpm:2485551111@pm.sprint.com

AT&T is just as easy.

2485551111@txt.att.net

If all you have is a cell number and don't know the carrier you might start with txtdrop.com. It's an easy, edbrowse friendly site. If that works and they reply then you've got a handle on it.

Setting up Wifi

wifi can be seriously annoying, because all the setup procedures are graphical. Select the SSID from a dropdown list, set up passwords from a screen, and so on. Hey, I don't even have X on my computer. What am I suppose to do? I spent two months finding a solution, and I have sort of put all my eggs in that basket. It's a little usb wifi stick, Ralink Tenda 2870/3070, no larger than a cigarette lighter. I have about 4 of them, and I can pop them onto any computer or laptop. I don't care what wifi device might be built into the machine; I don't use it. I don't build the linux modules for them either, only rt3070.ko. That driver comes standard with linux, but oops, it doesn't support any of the command line features that I want. In other words, you can't set wep or wpa passwords with command line tools. Don't ask me why?! I got the source and made a few changes, and it has worked for me since the 2.6 kernels, and through 3.12. You can grab my Ralink source here. If you have kernel sources then you can just type make to get rt3070sta.ko. Install this module and plug in the usb wifi and you have the interface ra0. Run ifconfig or iwconfig and see.

Then I have little scripts in /etc/sysconfig/network-scripts for the wifi channels I typically connect to. Here is the one for my home wifi.

# wifi-home, connect to my home wifi
# This is invoked with one argument, usually ra0
iwconfig $1 nick `hostname` mode managed
iwpriv $1 set NetworkType=Infra
iwpriv $1 set AuthMode=WPA2PSK
iwpriv $1 set EncrypType=AES
# Use wpa_passphrase to build the encoded password
iwpriv $1 set WPAPSK=hex-encoded-password
iwpriv $1 set SSID="my home ssid"

Then wifi-current is just a link to wifi-home or whatever I am using. Then /sbin/ifup-local calls wifi-current on ra0. Course all this may be fedora specific. With all that in place, connect to wifi like this.

ifup ra0

And yet one time in five it does not work, I don't know why. I just wait 30 seconds, ifdown ra0, wait 30 seconds, ifup ra0, And then it usually works. On rare occasions I have had to unload and reload the module. so it's a tad finicky, but once connected, you're good.

In a hotel etc with my laptop I can usually bring up ra0 like I was home, it won't connect of course, but then iwlist shows me what is there, all in command mode, and if one of them is public then I can tweak a public wifi script and reconnect using that. I'm on the air through the public wifi.

IRC

Internet Relay Chat, or IRC, is a system that lets you send small text messages to other members of a group. It is a convenient way to participate in an on-line discussion forum. Of course you have to know the server and the channel, like calling a particular conference line for a multiway phone conversation.

I use a simple, command line IRC client called sic. There is one line in the original source that I changed to make it a little more user friendly. The line prints out date and time for every message, and we don't really want to read that every time. You can get my sic source here.

If you use this program straight out of the box it can be very confusing. Suppose you are typing a message to a friend, and suddenly his message comes rolling out at you. You read it, but don't remember what you typed, because what you typed so far is above the message from him that you just read, and you are now typing the other half of your message after his message to you. It's all interleaved and very confusing. This is one time where the linear mode fails us. Of course this is nicely separated in the sighted world. They are in separate windows on the screen. You type in the bottom and incoming messages roll in at the top. I always try to simulate this separation in some other way. In this case you can separate input and output by consoles. First make a fifo specifically for sic.

mkfifo -m 666 /etc/sicfifo

Then in your .bashrc file, have these functions for sic input and sic output. These tap into the Speakup group, which has become a more general forum for screen readers, adapters, and modified utilities.

alias sicinput="cat >/etc/sicfifo"
alias sicoutput="sic -h irc.freenode.net -p 6667 -n YourLogin < /etc/sicfifo"

Now do this, and no you don't need to be root. Switch to console 9 and run sicinput. Switch to console 10 and run sicoutput. Of course the latter is tailored to the server that runs the Speakup chat, using your login on that server. On the input side you need to enter

:m nickserv identify password
:j #speakup

The first authenticates you to the server, and the second joins the speakup list. Now you're in. Type your messages in on console 9, and read messages from the group on console 10. Type control D on console 9 when done.

When running sic, you always have to specify the port, even if it is the default 6667.

Play the Music

If you like music as much as I do, mpg123 is the greatest thing since sliced bread, especially when combined with the following alias.

alias mpgplay="mpg123 -q -C"

If you have purchased all 27 Mozart piano concertos by Murray Perahia, perhaps the best musical investment I've ever made, and if they are given filenames like Moz-pc01-1.mp3, then you can play them all with mpgplay Moz-pc*. Sit back and revel in the music, or just let it play in the background while you do your work. I like to let this run in console 11, with console 12 reserved for superuser functions (logged in as root). Simple keystrokes let you restart the song (in this case songs are movements of concertos), go back a song, go forward a song, pause the music, etc. Beyond this, the author was kind enough to add a feature for me, wherein the music pauses when mpg123 receives a signal from another process. If your version does not have this feature, you can grab my source here. With this in place, my jupiter adapter binds the three keys at the upper right of the keyboard to "increase volume", "decrease volume", and "pause music", just like the Mac. I can be in any console, in any application, and adjust the volume of my music, or pause it altogether if I have to concentrate on my work or answer the phone, using the three keys at the upper right. No X, no screen, just play the music and adjust it in real time. Here are the Jupiter key bindings that control the music.

sysrq | aumix -v-2
scroll | aumix -v+2
pause | killall -q -s10 mpg123

If you want to play a virtual jukebox / radio, using a smart shuffle play with memory, then this perl script may be helpful.

Edbrowse and mpg123 can play nicely together via the pb or pb.mp3 commands, if this is in your .ebrc file.

mime {
type = audio/mp3
desc = audio file in mp3 format
suffix = mp3
program = mpg123 -q -
}

Streaming Audio

To listen to streaming audio, or even streaming video, install mplayer on your system. This is included in some distributions, but not in others. In particular, it is not part of fedora, unless you bring in these repositories, and then run yum install mplayer.

The manual page for mplayer is quite long; I'm not going to describe its full functionality here. The simplest invocation is mplayer -quiet foo.mp3. Yes, you can use it just like mpg123, but it doesn't have as many interactive features, such as pause via signal 10 from another process, so you probably want to stick with mpg123 for local music.

To play nicely with edbrowse, put something like this in your .ebrc file.

mime {
# the < forces it to be a stream, hence the url is passed to the program
type = <audio/x-pn-realaudio
desc = streaming audio
suffix = rm,ra,ram,pls
protocol = rtsp,pnm,sdp
program = /usr/bin/mplayer -quiet
}

Here is an assortment of radio stations. Select one, and you will reach a file with just one line, the url for the station. If it presents a standard protocol such as rtsp, or a standard suffix such as .pls, then just type g for go, and listen to the music. Type q when finished. If you like the station then make it an alias, like this.

alias play70='mplayer -quiet "http://www.181.fm/tunein.pls?station=181-70s&style=mp3&description=Super%2070s"'

This station was selected from the aforementioned website, and it's very good. It has a great variety of music from the 70's, with very few commercials, though you will see some patterns if you listen for several days. Also stations from the 60's, 80's, classical, jazz, sports, even other languages, so enjoy.

Podcast

Create a directory in your home directory called .podcast, and within that directory create a file called pod.conf that contains the podcasts you are interested in. Each line is a url of a web page that holds current podcasts. These web pages should have a certain xml format that is machine readable. In fact they often end in .xml, rather than .html. Finding the podcast page is not always easy. Sometimes you have to roam around the website to find it. Here is my pod.conf file, reading podcasts from Nature and Scientific American.

www.sciam.com/podcast/sciam_podcast_r_d.xml
www.sciam.com/podcast/sciam_podcast_r.xml
rss.sciam.com/sciam/60-second-psych
rss.sciam.com/sciam/60-second-space
rss.sciam.com/sciam/60-second-tech
www.nature.com/nature/podcast/rss/nature.xml
www.nasa.gov/rss/NASAcast_podcast.rss

With this set up, I then use a bash script that I found, well, from somewhere, to fetch the new podcasts that I have not yet listened to. You can get the script here.

Every couple of years a website may change its files, establishing a new naming convention, and at that point all the podcasts look like they are new. The next invocation of podgo fetches them all, and it's kind of annoying. Just delete them and move on, it doesn't happen very often.

Youtube

Search for something on youtube, then note the url of the page. Pass this as an argument to clive (perl version) or cclive (c version), as you prefer. This will download the video as a file, in one of several formats. From here you can probably convert it to mp3 using ffmpeg.

ffmpeg -i foobar.3gpp foobar.mp3

Then play the mp3 file with mpg123. This is a recurring theme, as we saw with podcasts above. download the file, convert it to mp3, and use mpg123, which has nice features like pause, fast forward, rewind, etc. These functions may help.

# youtube search, look for something on youtube
function+yts {
b http://www.youtube.com/results?search_query=~0
}

# If a link looks like what you want, don't go to it, run this
# youtube extract function.
function+yte {
A
1s/^.*?href=//
s/>$//
!clive "'."
^
}

Please send email if you know of any additional command line utilities, or if you can offer enhancements to the above.