DiggerDiff v1.2

  1. What it is
  2. How to use it
    1. The main interface
      1. Site list rows
      2. Site list columns
      3. Page ranges
    2. Session configuration
      1. Main window config
      2. Diff config
      3. Misc config
      4. Site list config
      5. Edit config
    3. Site list editor
      1. Select list
      2. Edit page
  3. How to set it up
  1. What it is

    DiggerDiff is a program that helps you watch for changes on web pages you are interested in. I wrote it to watch the job pages of local employers, to help with job-hunting. It fetches each page in its list and compares the contents against an older copy of that page and displays the results. If a page has changed, it will let you view the differences between the old and new versions so you can easily spot what has changed.

    You can specify multiple search and replace expressions to apply to the local copies of each page before comparison. This can be useful to minimize the changes displayed for pages that often have predictable, small changes that are not important, and to clean up the display of the page changes.

  2. How to use it

    1. The main interface

      The main interface consists of a list of sites and associated status information, in a table format with configurable rows and columns. It also optionally displays the range of site names for each page of the list.

      1. Site list rows

        Each page of the site list is composed of some combination of the following row options:

        1. Blank: Displays a blank row in the table.
        2. Config: This row displays a link to configure the settings for the current session. Details of the session settings are described in the Session configuration section.
        3. Edit: This row displays a link to edit the list of pages being monitored. Use of the editor is discussed in the Site list editor section.
        4. Filter: This row displays a control to filter the list of pages. If you enter text here and click "Filter site titles", the site list will only display those sites that contain the given text in their titles. The filter is case-insensitive. When a filter is active, this row will have a "Clear filter" link, which turns off filtering.
        5. List: This option displays the actual list of sites for the current page of the list. The list is displayed using the columns described in the Site list columns section.
        6. Pages: This row displays a navigation bar consisting of links to each page of the site list, as well as next and previous page links. The current page is highlighted to stand out.
        7. Save: This row displays a link for copying the new local copy of all sites with status 'NEW' or 'CHANGED' into the old local copy of that site, and resetting the status for all those sites to 'SAME'. Note that this link will affect ALL sites in the list, not just the current page.
        8. Search: This row displays a control to search for text in the local copies of the sites in the list. You can search in either the Archive (old local copies), Current (new local copies), or Both. You search for text by entering a regular expression that conforms to these Syntax and Modifier documents.
      2. Site list columns

        Each page of the site list is composed of some combination of the following column options:

        1. Archive: If there is an old local copy of the web page, this column displays the date it was created, with a link to view that file.
        2. Blank: Displays a blank column in the table.
        3. Current: If there is a new local copy of the web page, this column displays a link to view that file.
        4. Edit: This column displays a link to edit the details of this web page.
        5. Fetch: If the status of a web page is 'ERROR' or the new local copy is older than the age limit set in config.php, this column displays a link to fetch the latest version of the page as the new local copy.
        6. Save: If the status of a web page is 'NEW' or 'CHANGED', this column displays a link to copy the new local copy into the old local copy (causing the status to change to 'SAME').
        7. Site: This column displays the title of a web page as either a link or a button to load that page, depending on whether the page has a regular URL or uses POST data.
        8. Status: This column displays one of four values, with these meanings:
          1. CHANGED: The new local copy of the web page is different from the old local copy. Displayed in red, with a link to view the changes in the page.
          2. ERROR: An error occurred on the previous attempt to fetch a local copy of the web page. Displayed in red.
          3. NEW: An old local copy of the web page has not yet been created to compare against. Displayed in black.
          4. SAME: The new and old local copies of the web page are identical. Displayed in green.
      3. Page ranges

        Optionally displayed at the bottom of each page of the site list is the page ranges table. This table contains a list of all the pages in the site list, along with the unique first part of the first and last site titles on each page. This allows you to quickly jump to the page containing some site you want to see.

    2. Session configuration

      Clicking on the "Configure session" link in the site list loads a page that allows you to modify some program settings. The settings are grouped into five categories, as described below.

      1. Main window config

        This category has three settings in one section. The settings control the frame behavior of the main interface window.

        1. Frame - Layout: When set to "Single", the site list is displayed in one window, and pages and diffs are loaded in another window. When set to "Split", the window is split into two frames, with the site list in one frame and pages and diffs loaded into the other frame.
        2. Frame - Divider: This setting only takes effect if "Layout" is set to "Split". When this is set to "Horizontal", the frame divider is horizontal, with site list above and pages and diffs below. When set to "Vertical", the frame divider is vertical, with site list on the left and pages and diffs on the right.
        3. Frame - Size: This setting also only takes effect when "Layout" is "Split". This controls the percentage of the window occupied by the site list frame. For example, when set to "45", the top (or left) frame occupies 45% of the window, and the bottom (or right) frame occupies 55%.
      2. Diff config

        This category has seven settings in two sections. The "Frame" section controls the behavior of the frame or window containing the page diffs, and the "Format" section controls how the diffs are displayed.

        1. Frame - Layout: When set to "Single", just one view of the page diff is displayed in a single window or frame. When set to "Split", the diff window or frame is split into two frames, with the rendered diff view in one frame and the source diff view in the other frame.
        2. Frame - Divider: This setting only takes effect if "Layout" is set to "Split". When this is set to "Horizontal", the frame divider is horizontal, with rendered view above and source view below. When set to "Vertical", the frame divider is vertical, with rendered view on the left and source view on the right.
        3. Frame - Style: This setting only takes effect if "Layout" is "Single". When set to "Rendered", the diff is displayed as-is, so that HTML elements are rendered by the browser. When set to "Source", special HTML characters are encoded, so that the browser displays the HTML code in the diff.
        4. Frame - Size: This setting also only takes effect when "Layout" is "Split". This controls the percentage of the window or frame occupied by the rendered view frame. For example, when set to "75", the top (or left) frame occupies 75% of the window, and the bottom (or right) frame occupies 25%.
        5. Format - Rendered: This setting controls the method used to format the rendered view diff. When set to "<BR>", line endings are replaced with an HTML <BR> tag. In this mode, text will generally be displayed in a proportional font, and whitespace will be collapsed into a single space. When set to "<PRE>", line endings are left alone and the whole diff is enclosed in an HTML <PRE> block. In this mode, text will generally be displayed in a fixed-width font, and whitespace will be left as-is.
        6. Format - Source: This setting controls the method used to format the source view diff. When set to "<BR>", line endings are replaced with an HTML <BR> tag. In this mode, text will generally be displayed in a proportional font, and whitespace will be collapsed into a single space. When set to "<PRE>", line endings are left alone and the whole diff is enclosed in an HTML <PRE> block. In this mode, text will generally be displayed in a fixed-width font, and whitespace will be left as-is.
        7. Format - Filter: This setting determines which lines of a diff are displayed. Only lines which begin with one of the characters in this field are displayed. The choice of characters to use here depends on what options are used with the diff command in config.php, and what you are interested in seeing. For example, if you are using unified diff and only care about seeing what is added to pages, not what is removed, you can set Filter to "+". The table below shows what filter settings will display all lines of the diff for different options in the version of diff I am using. This may vary on your system.
          Diff Command Filter
          "diff" 123456789<>-
          "diff --context" or "diff -c" *! +-
          "diff --context=0" or "diff -C 0" *!+-
          "diff --unified" or "diff -u" @ +-
          "diff --unified=0" or "diff -U 0" @+-
      3. Misc config

        This category has two settings in one section. The settings control which site list is in use.

        1. URL File - Select File: This control lets you choose which of the available site list files to use.
        2. URL File - Create File: This allows you to create a new site list file. Only letters and numbers are allowed here. All other characters will be stripped out.
      4. Site list config

        This category has six settings in two sections. The "Options" section controls various options for displaying the site list. The "Page Elements" section allows you to change which rows and columns are used in the site list.

        1. Options - Sites Per Page: This setting controls how many sites are listed on each page of the site list.
        2. Options - Toggles - Sort List: This setting determines whether or not the site list is sorted alphabetically. You generally would not want to use "Sort List" and "Show Comments" at the same time.
        3. Options - Toggles - Show Comments: This setting determines whether or not any comments contained in the site list are displayed. You generally would not want to use "Show Comments" and "Sort List" at the same time.
        4. Options - Toggles - Show Page Ranges: This setting determines whether or not the page ranges table, described in the Page ranges section, is displayed on each page of the site list.
        5. Page Elements - Rows: This control allows you to edit the table rows used to build each page of the site list. You may use any of the options listed in the Site list rows section, in any order desired and as many times as desired.
        6. Page Elements - Columns: This control allows you to edit the table columns used to build each page of the site list. You may use any of the options listed in the Site list columns section, in any order desired and as many times as desired.
      5. Edit config

        This category has four settings in two sections. The "Select List" section controls the dimensions of the line selection page of the site list editor. The "Fields" section controls the dimensions of the entry fields used when adding or modifying a site in the list. Use of the site list editor is covered in the Site list editor section.

        1. Select List - Lines: This setting determines the number of lines displayed in the line selection control.
        2. Select List - Width: This setting determines the width of the line selection control.
        3. Fields - Lines: This setting determines the number of lines displayed in the "Search and Replace" field in the editor.
        4. Fields - Width: This setting determines the width of the input fields in the editor.
    3. Site list editor

      The site list editor is used to maintain the list of pages being monitored. It consists of two pages: The select list page, and the editing page.

      1. Select list

        This page displays the contents of the URL list file. You can select any line in the file and perform one of the following operations. There is also a link to close this window.

        1. Insert line: This action inserts a new blank line before the currently selected line. It then loads the new line into the editor.
        2. Copy line: This action adds a copy of the currently selected line after the selected line. It then loads the copy into the editor.
        3. Delete line: This action deletes the currently selected line.
        4. Move up: This action moves the currently selected line up one line.
        5. Move down: This action moves the currently selected line down one line.
        6. Edit line: This action loads the currently selected line into the editor.
      2. Edit page

        After selecting either Insert line, Copy line or Edit line, you are brought to the edit page, where you can edit the details of the selected page. You can also get to the edit page by clicking on the link in the Edit column of the site list. Once on the edit page, there are four fields you can edit, and two buttons.

        1. Title: This field is the name that will be displayed in the site list. This field can only be blank if "Post data" and "Search and replace" are also blank. In other words, when "URL / Comment" is the only non-blank field, or ALL fields are blank. To convert a regular entry with a title into a comment, put a '#' as the first character of this field.
        2. URL / Comment: This field is the URL for the page to be monitored, or a comment if this entry is a comment. To make a new entry a comment, fill out only this field, using '#' as the first character. Putting a '#' in front of this field in an existing entry will only turn it into a comment if this is the ONLY non-blank field. This is the only mandatory field, and MUST be non-blank unless ALL fields are blank.
        3. Post data: This field contains data to be POSTed to a URL. Any text in this field causes the URL to be handled as a POST. The data will be sent as-is, so be sure that it is properly encoded.
        4. Search and replace: This field is a list of expressions that will be matched against the contents of the page. They can be used to strip out predictable, unimportant changes from a page before comparison, or to reformat a page to produce a clearer diff. The expression format is: "/search/[replace]/[modifiers]". You can enter multiple expressions here, one per line, and just one line per expression. The separator can be any character, not just '/', but MUST appear EXACTLY three times in the expression.
          'Search' is a pattern that is matched against the contents of a page. Matching is done with PHP's preg_replace() function, so this pattern must fit the rules for that function.
          If 'replace' is blank, anything matching 'search' will be removed, otherwise matching text will be replaced with the contents of 'replace'. As with 'search', this must fit the rules for preg_replace(). In addition, certain special characters must be properly escaped to be processed by stripcslashes().
          'Modifiers' is an optional parameter containing Pattern Modifiers for use in preg_replace(). The possible modifiers are listed here. Among others, 'i' makes the search case insensitive, 'm' MUST be used if you use '^' (beginning of line) or '$' (end of line) in your search, 's' makes '.' include newlines, and 'U' makes the pattern non-greedy.
          To sum up, an expression '/search/replace/modifiers' is used internally like this:
          $page = preg_replace('/search/modifiers', stripcslashes('replace'), $page);
        5. Undo changes: Resets the form to its original condition and original data.
        6. Save changes: Saves any changes to the site list, and returns you to either the select list or the site list, depending on how you got here.
  3. How to set it up

    [Not yet written.]


This site and its contents are Copyright © 2003-2004 by Dave Walton (diggerdiff@digger.net).
All rights reserved.