Testing against QWidget Appearance Regression in PyQt

schollii

5.00/5 (4 votes)

Sep 30, 2016

MIT

5 min read

16237

Recipe that can be included in a pytest or nose test suite to compare a PyQt widget's appearance with a reference saved on disk, or generate the reference if it does not exist.

Introduction

Unit tests for GUI can benefit from verifying the appearance of widgets against an "approved" snapshot of the widget, since not all the minute details of the widget may be easily testable. The recipe in this tip is specific to PyQt widgets, and is designed to either generate an image of a PyQt widget, or compare the current widget to a saved image. It has been tested with the following configuration: PyQt 5.7, Python 3.5, and Windows 7 64 bit. It will likely work additionally for any combination of PyQt 5.x and Python 3.x, on any platform supported by the combination.

Requirements

The basic requirements of a widget appearance test function are the following:

It should be easy to generate a snapshot of the widget the first time, or whenever a widget has been intentionally modified.
If the image already exists, the test function should assume that the current widget appaearance should be compared against it.
If the appearance has not changed (beyond a specified tolerance, if any), the function should return True.
If the appearance has changed beyond a certain tolerance (which defaults to 0), the test function should save the actual snapshot of the widget and should generate a difference image, in case these help understand what has been broken. It should then return False.
The test function shoud delete any previous actual and difference images saved to filesystem: if the test succeeds, they are obsolete (and hence leaving them around would be misleading), if the test fails, they will get overwritten.
The test function should allow for the check to repeat at a certain interval for a short period of time since some multi-threaded apps may delay the completion of appearance of a widget
Several types of appearance differences should be checked:
- very slight change in colors over a large area. Example: background color change
- medium change in colors, in small area. Example: speckling effect around characters in widgets embedded in graphics scene
- large change in colors, in very small area. Example: text change by even one character
It should be easy to save the widget snapshot in the same folder as a test.
The name of the file to compare against, or to save as reference, should be easy to provide.

Using the Code

The function requires a PyQt widget, a path to a folder or file, and a name for the snapshot. Optionally, it can take a % tolerance of mismatch and a maximum number of pixels that can be different.

def check_widget_snapshot(widget: QWidget, ref_path: str, fig_name: str,
                          rms_tol_perc: float = 0.0, num_tol_perc: float=100.0) -> bool:
    """
    Get the difference between a widget's appearance and a file stored on disk. If the file doesn't
    exist, the file is created.
    :param widget: the widget for which appearance must be verified
    :param ref_path: the path to folder containing reference image (can be a file name too, then the
        parent is used as folder -- useful for tests where __file__ can be passed)
    :param fig_name: the name of the reference image (will be saved in ref_path folder if it doesn't
        exist, or read if it does)
    :param rms_tol_perc: percentage difference allowed between widget's snapshot and the reference, i.e.
        a floating point value in range 0..100
    :param num_tol_perc: the maximum percentage of mismatched pixels allowed, relative to total number
        of pixels in the reference image; two pixels differ if any of their R, G, B and A values differ
    :return: True if reference didn't exist (and was hence generated); True if widget matches reference;
        True if not matched but RMS diff is less than rms_tol_perc % and number of different pixels
        < max_num_diffs; False otherwise

    Example use:

        # file some_test.py in folder 'testing'
        app = QApplication([])
        widget = QLabel('test')
        assert check_widget_snapshot(widget, __file__, 'label', rms_tol_perc=0.5, num_tol_perc=4)

    The first time the test is run, it will save a snapshot of the widget to label.png in 
    Path(__file__).parent and return True. The next time it is run, the widget snapshot will be compared 
    to the image saved. If the image colors have changed by more than 0.5% when averaged over all pixels, 
    or if more than 4% of pixels have changed in any way, check_widget_snapshot() will save a snapshot 
    of the widget to label_actual.png and the difference image to label_diff.png and will return False, 
    thus causing the test to fail. Otherwise (no differences, or differences within stated bounds), 
    check_widget_snapshot() will just return True and the test will pass.
    """
    actual_pixmap = widget.grab()
    image = actual_pixmap.toImage()
    rms_tol_perc = float(rms_tol_perc)  # ensure FPV so logging can format
    num_tol_perc = float(num_tol_perc)  # ensure FPV so logging can format

    ref_path = Path(ref_path)
    if ref_path.is_file():
        ref_path = ref_path.parent
    assert ref_path.is_dir()
    ref_pixmap_path = (ref_path / fig_name).with_suffix('.png')
    if not ref_pixmap_path.exists():
        # ref doesn't exist, generate it:
        log.info('Generating ref snapshot %s in %s for widget %s',
                 ref_pixmap_path.name, ref_pixmap_path.parent, widget)
        actual_pixmap.save(str(ref_pixmap_path))
        return True

    ref_pixmap = QPixmap(str(ref_pixmap_path))
    ref_image = ref_pixmap.toImage()
    if ref_image == image:
        return True

    diff_image = QImage(ref_image.width(), ref_image.height(), ref_image.format())
    DIFF_REF_PIX = 255  # value corresponds to "no difference"
    DIFF_MAX = 255
    REF_RGBA = [DIFF_REF_PIX] * 4
    diff_rms = 0
    num_diffs = 0
    for i in range(ref_image.width()):
        for j in range(ref_image.height()):
            pixel = image.pixelColor(i, j).getRgb()
            ref_pixel = ref_image.pixelColor(i, j).getRgb()
            if pixel != ref_pixel:
                diff = [pow((x - y) / DIFF_MAX, 2) for x, y in zip(pixel, ref_pixel)]
                diff_rms += sqrt(sum(diff) / len(pixel))
                diff = [DIFF_REF_PIX - abs(x - y) for x, y in zip(pixel, ref_pixel)]
                diff_image.setPixelColor(i, j, QColor(*diff))
                num_diffs += 1
            else:
                diff_image.setPixelColor(i, j, QColor(*REF_RGBA))

    total_num_pixels = ref_image.width() * ref_image.height()
    diff_rms /= total_num_pixels
    # use the following instead, for possibly improved representation of average difference over 
    # differing pixels:
    # diff_rms /= num_diffs 
    diff_rms_perc = diff_rms * 100
    num_diffs_perc = num_diffs * 100 / total_num_pixels
    log.info("Widget %s vs ref %s in %s:",
             widget.objectName() or repr(widget), ref_pixmap_path.name, ref_pixmap_path.parent)
    log.info("    RMS diff=%s%% (rms_tol_perc=%s%%), # pixels changed=%s%% (num_tol_perc=%s%%)",
             diff_rms_perc, rms_tol_perc, num_diffs_perc, num_tol_perc)

    if diff_rms_perc <= rms_tol_perc and num_diffs_perc <= num_tol_perc:
        return True

    # save the data

    actual_pix_path = ref_pixmap_path.with_name(fig_name + '_actual.png')
    log.warn("    Snapshot has changed beyond tolerances, saving actual and diff images to folder %s:",
             actual_pix_path.parent)
    actual_pixmap.save(str(actual_pix_path))

    diff_pix_path = ref_pixmap_path.with_name(fig_name + '_diff.png')
    assert actual_pix_path.parent == diff_pix_path.parent
    diff_pixmap = QPixmap(diff_image)
    diff_pixmap.save(str(diff_pix_path))

    log.warn("    Actual image saved to %s", actual_pix_path.name)
    log.warn("    White - |ref - widget| image saved to %s", diff_pix_path.name)

    return False

The function's docstring gives an example of use. If the RMS difference percentage is larger than tolerated, or if the number of pixels that did not match between the widget snapshot and the reference is more than num_tol_perc % of total number of pixels, then two PNG images will be saved: the actual widget snapshot, and the absolute difference between the two images, relative to white.

Points of Interest

There are many ways to compute a scalar the represents the difference between two images. For widget appearance regression testing, it is likely sufficient to compute the average RMS of each pixel difference scaled between 0 and 1:

diff_for_one_pixel = sqrt [(r1-r2)²/255² + (g1-g2)²/255² + (b1-b2)²/255² + (a1-a2)²/255² ] / 2

where r, g, b, a are the integer values (0-255) of the RGB and alpha channels of the pixel. This produces a number between 0 and 1. The total difference is then:

total_diff = sum(diff_for_one_pixel for all pixels) / number of pixels

The rms_tol_perc is a "percentage difference". Very localized but strong differences get washed out. For example, a difference of 1 between each pixel's RGBA values (0-255) will give a percentage difference of about 0.4% over the entire image. Whereas a difference of 255 on R, G, B, A of a 1000 pixels in a 1000x1000 pixel image will give a percentage difference of 0.1%. Yet the latter may well represent a more severe regression. In GUI widgets, this would be a common case of widget layout, words, or content having changed, where much of the widget remains unchanged but some portions have shifted or been replaced in various "pockets" of the widget.

It is useful to average the RMS difference over differing pixels only, rather than the total number of pixels, because identical pixels don't carry useful information for regression testing. In the previous example of a 1000 pixels completely different i.e. color = (0,0,0) in one image and (255, 255, 255) in the other, the check_widget_snapshot() would indicate that 0.1% of all pixels were unmatched, on average by 100%. In the image that has all pixels just one notch down from the reference image (color = (R, G, B, A) in reference image and (R-1, G-1, B-1, A-1) in the actual), the function would indicate that 100% of the image pixels were different, on average by 0.4%.

If the tolerances (RMS and percentage of pixels changed) are set to non-zero because of noise (such as speckling in graphics view with embedded widgets, due to machine differences), another measure is necessary to catch real differences such as a character changing in a label. Such change would be a large color difference in only a few pixels, but the difference could get diluted in the noise pixels. The maximum allowed difference is a tolerance that can guard against such changes going unnoticed.

Another measurement of difference between two images is the Structural Similarity Index (SSIM). However, this measure is significantly more complicated to compute, and available implementations are based on numpy. However, check_widget_snapshot() could have a parameter that allows a "difference computation" to be passed in, a form of dependency injection.

The check_widget_snapshot() function is available in github and is significantly more comprehensive than the above: it supports dependency injection for the difference computation, and implements all the requirements mentioned at the begninng of this article.

History

First version: 2016/09/28
Version 2: additional tolerance for number of differences
Version 3: more requirements, additional tolerance for max difference, dependency injection of difference computation