質問

I follow TDD religiously. My projects typically have 85% or better test coverage, with meaningful test cases.

I do a lot of work with HBase, and the main client interface, HTable, is a real pain to mock. It takes me 3 or 4 times longer to write my unit tests than it does to write tests that use a live endpoint.

I know that, philosophically, tests that use mocks should take priority over tests that use a live endpoint. But mocking HTable is a serious pain, and I'm not really sure it offers much of an advantage over testing against a live HBase instance.

Everyone on my team runs a single-node HBase instance on their workstation, and we have single-node HBase instances running on our Jenkins boxes, so it's not an issue of availability. The live endpoint tests obviously take longer to run than the tests that use mocks, but we don't really care about that.

Right now, I write live endpoint tests AND mock-based tests for all my classes. I'd love to ditch the mocks, but I don't want quality to decline as a result.

What do you all think?

役に立ちましたか?

解決

  • My first recommendation would be to not mock types you don't own. You mentioned HTable being a real pain to mock - maybe you should wrap it instead in an Adapter that exposes the 20% of HTable's features you need, and mock the wrapper where needed.

  • That being said, let's assume we're talking about types you all own. If your mock-based tests are focused on happy path scenarios where everything goes smoothly, you won't lose anything ditching them because your integration tests are probably already testing the exact same paths.

    However, isolated tests become interesting when you start thinking about how your system under test should react to every little thing that could happen as defined in its collaborator's contract, regardless of the actual concrete object it's talking to. That's part of what some call basic correctness. There could be many of those little cases and many more combinations of them. This is where integration tests start getting lousy while isolated tests will remain fast and manageable.

    To be more concrete, what happens if one of your HTable adapter's methods returns an empty list ? What if it returns null ? What if it throws a connection exception ? It should be defined in the Adapter's contract if any of those things could happen, and any of its consumers should be prepared to deal with these situations, hence the need for tests for them.

To sum up : you won't see any quality decline by removing your mock-based tests if they tested the exact same things as your integration tests. However, trying to imagine additional isolated tests (and contract tests) can help you think out your interfaces/contracts extensively and increase quality by tackling defects that would have been hard to think about and/or slow to test with integration tests.

他のヒント

philosophically, tests that use mocks should take priority over tests that use a live endpoint

I think at the very least, that's a point of current ongoing controversy amongst TDD proponents.

My personal view goes beyond that to say that a mock-based test is mostly a way of representing a form of interface contract; ideally it breaks (i.e fails) if and only if you change the interface. And as such, in a reasonably strongly typed languages like Java, and when using an explicitly-defined interface, it is almost entirely superfluous: the compiler will already have told you if you have changed the interface.

The main exception is when you are using a very generic interface, perhaps based on annotations or reflection, that the compiler isn't able to usefully police automatically. Even then you should check to see if there is a way of doing validation programatically (e.q. a SQL syntax checking library) rather than by hand using mocks.

It is that latter case you are doing when you test with a 'live' local database; the htable implementation kicks in and applies much more comprehensive validation of the interfacve contract than you would ever think to write out by hand.

Unfortunately, a much more common use of mock-based testing is is the test that:

  • passes for whatever the code was at the time the test was written
  • provides no guarantees about any properties of the code other than that it exists and kind of runs
  • fails any time you change that code

Such tests should of course be deleted on sight.

How much longer does an endpoint-based test take to run than a mock-based test? If it's significantly longer, then yes, it's worth the investment of your test-writing time to make the unit tests quicker - because you'll have to run them many, many times. If it's not significantly longer, even though the endpoint-based tests are not "pure" unit tests, as long as they're doing a good job of testing the unit, there's no reason to be religious about it.

I agree completely with the response of guillaume31, never mock types that you don't own!.

Normally a pain in the test (mocking a complex interface) reflect a problem in your design. Perhaps you need some abstraction between your model and your data access code, form example using an hexagonal architecture and a repository pattern its the most usual way to solve this kind of problems.

If you want to do a integration test for checking things do a integration test, if you want to do a unit test because you are testing your logic do a unit test and isolate de persistence. But doing an integration test because you don't know how to isolate your logic from a external system (or because isolating its a pain) its a big smell, you are choosing integration over unit for a limitation in your design not for a real need to test integration.

Take a look to this talk form Ian cooper: http://vimeo.com/68375232 , he talks about hexagonal architecture and testing, he talks about when and what to mock, a really inspired talk that solves many questions like yours about real TDD.

TL;DR - The way I see it, it depends on how much effort you end up spending on tests, and whether it would have been better to spend more of it on your actual system instead.

Long version:

Some good answers here, but my take on it is different: testing is an economic activity that needs to pay back for itself, and if the time you spend isn't returned in development and system reliability (or anything else you look to get out of tests) then you may be making a bad investment; you're in the business of building systems, not writing tests. Therefore, reducing the effort to write and maintain tests is crucial.

For example, some main values I gain from tests are:

  • Reliability (and therefore development speed): refactor code/integrate a new framework/swap a component/port to a different platform, be confident that stuff still works
  • Design feedback: classic TDD/BDD "use your code" feedback on your low/mid-level interfaces

Testing against a live endpoint should still provide these.

Some drawbacks for testing against a live endpoint:

  • Environment setup - configuring and standardizing the test running environment is more work, and subtly different environment setups could result in subtly different behavior
  • Statelessness - working against a live endpoint can end up promoting writing tests that rely on a mutating endpoint state, which is fragile and hard to reason against (i.e. when something is failing, is it failing because of weird state?)
  • Test running environment is fragile - if a test fails, is it the test, the code, or the live endpoint?
  • Run speed - a live endpoint is usually slower, and sometimes is harder to parallelize
  • Creating edge cases for testing - usually trivial with a mock, sometimes a pain with a live endpoint (e.g tricky ones to set up are transport/HTTP errors)

If I were in this situation, and the drawbacks didn't seem to be a problem whereas mocking the endpoint slowed down my test writing considerably, I'd test against a live endpoint in a heartbeat, as long as I'd be sure to check again after a while to see that the drawbacks don't turn out to be a problem in practice.

From a testing perspective there are some requirements that are an absolute must:

  • Testing (unit or otherwise) must never have a way to touch production data
  • The results from one test must never affect the results of another test
  • You must always start from a known position

That's a big challenge when connecting to any source that maintains state outside of your tests. It's not "pure" TDD, but the Ruby on Rails crew solved this issue in a way that might be able to be adapted for your purposes. The rails test framework worked in this way:

  • Test configuration was automatically selected when running unit tests
  • The database was created and initialized at the start of running unit tests
  • The database was deleted after unit tests were run
  • If using SqlLite, the test configuration used a RAM database

All this work was built into the test harness, and it works reasonably well. There's a lot more to it, but the basics are enough for this conversation.

On the different teams that I've worked with over time, we would make choices that would promote code getting tested even if it wasn't the most correct path. Ideally, we would wrap all calls to a data store with code we controlled. In theory, if any of these old projects got new funding we could go back and move them from being database bound to Hadoop bound by focusing our attention on just a handful of classes.

The important aspects are not to mess with production data, and make sure you are truly testing what you think you are testing. It's really important to be able to reset the external service to a known baseline on demand--even from your code.

ライセンス: CC-BY-SA帰属
所属していません softwareengineering.stackexchange
scroll top