-
Notifications
You must be signed in to change notification settings - Fork 0
RFC 6 Static, Unit and Integration testing
Authors: M. Ubeda Garcia
Last Modified: 15/VI/2012
This proposal comes along with the usage of the Continuous Integration tool Jenkins .
The "raison d'être" of this proposal is the lack of standardized tests at any level in the DIRAC framework. Or with other words, put together all bits of code running tests and launch then on a coherent way. The sooner the errors are spotted in the development / testing phase, the better. Therefore, it requires different approaches and granularities in which concerns tests.
In this proposal there are three testing level described: static , unit and integration . We omit system testing and system integration testing as they will need a requirements specification we are lacking now.
You can find the running prototype used for LHCbDirac here - use your NICE credentials to log. We understand that this prototype is "living" in an LHCb environment. LHCb is willing to volunteer in maintaining such prototypes. Access from members of other VOs would be guaranteed, if requested.
Every developer MUST ensure there are no bugs, typos on the code. That is easily achievable and can be argued that no external tool is needed to do such thing. Certainly true, but as there is a specific DIRAC code convention , all developer must follow it, which it is not always the case.
This step ensures the quality of the code written by developers. More interesting is the check for bad programming practices, such as the usage of mutable objects as defaults in the methods, or the well known 'catch Exception'. Just two examples, but under certain conditions can hide real problems ( or introduce a problem on a perfectly healthy code ). Sometimes small checks like these can spot unimaginable amounts of errors..
The proposed tools are:
- pyLint : is a source code bug and quality checker for the Python programming language. It follows the style recommended by PEP 8 , the Python style guide.
- clonedigger : aimed to detect similar / duplicated / cloned code in Python programs.
- sloccount : set of tools for counting physical Source Lines of Code (SLOC)
- cyclomatic complexity : analyzes the linearly independent paths through a program.
The first one ( pyLint ) is already running , and shows there is a LOT of work to do. Putting the others in place is not a time consuming task, but until the first tests are not successful there is no point on moving forward. It shows h ( high ), m ( medium ) and l ( low ) warnings. The first ones are bugs, which must be fixed straight away. The second ones are usually bad coding practices, which can be dangerous. The last ones are warnings about missing documentation mostly.
clonedigger, sloaccount and cyclomatic complexity are not running, neither they have been actively tried out. Possible actions on them will come out as part of this RFC.
Unit tests must be able to run without having a database back-end, a service up and running or an agent running. They test the simplest pieces of code although this is not always easy. Unit tests are a very good indicator of spaghetti code - it there is no way to write a unit test for a function, would be better to rewrite it and save future headaches.
When developing, it must be taken into account that writing tests takes in average as much time as writing the code. ( IMHO ) there is no reason to not write them. Concerning code already experienced, this point is subject to discussion.
Unittests holds hands with mocking and faking code . So far, we lack guidelines to do such thing. Once established, writing the unittests is a piece of cake. If the mocking is done properly, tests will represent a very good description of what the code is doing behind the scenes.
For those unit tests that are already written within DIRAC, the tool used is the python library unittest . Within Jenkins, we have set up the nose tool for automatic run: Python unitests are launched with nose, which is the tool returning a complete report for success / failures. As for mocking, it is done using the mock library , which is already part of the DIRAC externals. While the usage of these tools is subject to formal approval with this RFC, we believe that there should be no concrete reason for changing. The use of unittest, mock and nose is well documented, and no deviations from their standard usage is proposed.
Together with the unittests, within the jenkins prototype we run cobertura , to know which percentage of the code is actually tested. In a perfect world, it should be 100%.
For some LHCbDirac packages the results of nose (and unittest with mock), and cobertura, are Already available and cobertura too.
This is the most complex test by far. It requires a fully functional system ( which includes databases, services and very probably agents running ) and also it must be reproducible, this is what becomes problematic.
In order to ensure it is repeatable, we need the same information on the database ( and CS ). This means, we need a snapshot of the databases and CS at the time integration tests are written. First time will be time expensive to get all data snapshots, in future code modifications it should be the developer the one who updates the test data if needed.
The proposal is to ship the snapshots with the code, populate the test database with them, and run the needed integration tests.
To be decided how / where to be run.
As it is now, every hour checks if there are changes in the repository ( svn or git ). If there are, it schedules a new Jenkins job. This job is running static and unit tests agains trunk (SVN) / integration (Git). It is intended to be used by developers to see if their new code looks in good shape or not. If a system does not pass this first test, there is no need to propose a release candidate until it is successful. As of today (14 Jun 2012), the jenkins server is running tests for only this use case.
Same use case, but can be done on demand through the web portal. It schedules a test automatically for those impatient developers that do not want to wait for the cron-job mode Jenkins job.
Once there is a pre-release candidate, a tag is created by whoever is on charge. Jenkins picks it automatically and generates a new Jenkins job for that tagged code. It runs automatically all integration tests ( on top of static and unit test ).
If needed, commissioning of a new environment can be done with a few clicks. Jenkins can run if needed over a set of nodes, which are configurable to emulate any HW / SW.
The proposed implementation follows these guidelines:
- test are located on a subdirectory named 'tests' on each system ( e.g. Core/tests )
- test names follow the pattern Test_.py ( e.g. Test_DMS_Client_Dataset.py )
- no sys.modules redefinition in the tests! Overwrite imported modules with mocked ones if needed.
- do not forget about tearDown
- feel free to usage fixtures
- unittesting & mocking guide by Krzys
to be discussed:
- fake subdirectory with proper fake implementation of all modules maintained by the developer. This would simplify the problem of not updating a fake piece of code when the original has been updated. This requires some work and coordination. ( e.g. Test_ModuleA uses ModuleB, and runs with is own fake version of ModuleB. ModuleB is updated, but not the fakeModuleB used by ModuleA on its tests.. if ModuleB provides already its fake version, we avoid false positives ). The problem boils down to the following: if there is a single entry point to the data per system ( clients ) then it is easy to maintain the fake code, it the code is accessed on multiple places, it is a nightmare. Are we in conditions to grant the second ?
The testing framework can notify users if tests are failing via email. This way there is no need to go though the portal and check the status. The proposal is the following:
- notify developer when its changes are crashing ( this is used in the cron-job mode ). User A commits some modifications, Jenkins picks this latest code with some regression errors, it notifies User A.
- notify developer(s) in charge of the system whenever there is a crash on their code, independently of the source. In this respect, we would need a list of developers for each system. We have such list on an informal way, but why not write it down ? This way, there will be no room for forgotten / unseen bugs.
- notify a 'power user', most likely the person in charge of preparing next release. After all, this person needs to know whether the code he / she is about to release is in good shape or not.