How to test application implementing ML algorithm?

To perform testing of software programs, one arrives at a set of tests steps to test programs and test data to be provided at each of testable step and the expected output from the program based on the test data and test step. If the actual output from the program is same as expected output, we declare that program is functioning fine.  The working of the program gets tested for correctness for boundary and exception scenarios of both program and data input.

Having spec algorithm’s, coded algorithm’s, unit tested algorithm and tested them as part of application in my earlier days, I want to understand how people test mobile learning programs. This is my current understanding which I want to improve.

Coming to software testing of machine learning program, directly applying conventional software engineering process may not work. It is challenge to detect errors, faults and defects in machine learning program that takes arbitrary input to generate program’s output and to determine whether the program’s output is correct or reliable for the data inputs. Are ML programs non-testable?

Should testing of machine learning program focus less on whether ML algorithm learns well and focus more on whether application using the algorithm implements the specification and fulfills the user’s expectations?

First, start to understand the problem domain and suitability of algorithm in the problem context based on potential range of data inputs arriving in real time, in terms of real world data sets. Thinking of data sets can start with following data-set characteristics. Small vs large, repeating vs non-repeating values, missing vs non-missing attribute values, repeating vs non-repeating attribute labels,  predictable vs non-predictable attribute values, attributes that take non-negative values only , attributes that can also take negative value and the precision required for floating point numbers.

Second, test working of algorithm and third is to test algorithm providing data inputs.

  • Are you implementing algorithm? Design a series of primitive tests for various sub-parts of the algorithms, and an end-to-end test testing the final output or algorithm behavior.
  • Are you making use of some algorithm? Understand the algorithm and required validation for user inputs to ensure getting best possible results and how to arrive in the problem context, whether the algorithm results are sensible or not.
  • Check  upper bound reports on time and space used by the algorithm and get a measure of efficiency in terms of size or complexity of its input (Big O notation).

Think in terms of unit tests and regression tests for machine learning programs.

  • Add unit tests to your code and have approximate testing of your expected results
  • Create multiple data-sets with different difficulty levels like easy, difficult and adversarial. Whenever code changes to add a feature or fix a bug, run code against all of these data-sets to ensure that expected outputs lie in a reasonable error range and do not break existing functionality.

Arrive at criteria to determine meaning of correctness, working with domain specialists.

Discuss, Decide and determine margin of errors or correctness percentage beforehand to testing machine learning program. For example, if program interprets 75% of test data correctly, the programs is considered to be good enough. Remember that it might not be possible to demand test validation of 100% correctness as the intent of machine learning is to tolerate ambiguity.

Testing would benefit with software engineers ability to provide a data set generator, tools that would help to compare the output results and their correctness based on the data inputs.  You need to have methods to capture and view trace options that are inserted in to the ML program and tools to analyse traces to debug, test and validate intermediate results in specific steps of the algorithm.