A Fitting Conclusion

The final week of my internship was a gratifying one. Not that the other weeks weren’t gratifying (au contraire!), but as luck would have it, one of the final bugs I grappled with turned out to require pretty much everything I’ve learned about exploratory testing during my time with Wikimedia Foundation.

The bug had to do with translating musical scores.

A little background:  MediaWiki has an extension that converts Lilypond or ABC musical notation into an engraved score, with an option to produce an audio file as well. In the wikitext version of a page, these elements are surrounded by <score> tags, so we usually refer to them as “score elements.”

A week or so before the end of the internship, one of my mentors asked me to look into a bug in which a page containing several score elements was not translating correctly. Normally, you can translate a page paragraph by paragraph – but the page containing the score elements would only translate as one monolithic block.

I began by finding and testing other pages that contained score elements. I wanted to get complete coverage of all the variables, so I tried all these permutations (as listed in my initial testing report):

  • Tested on these language pairs: English>Hebrew, English>Chinese, English>Guarani, English>Spanish, English>Tajik, French>Russian
  • Tested both Lilypond and ABC input.
  • Tested both block and inline <score> elements.
  • Tested all translation engines available for each language pair.

Interestingly, I encountered almost no problems translating score elements! There was a single translation engine (of 5) that was unable to translate a score element from English to Chinese, but overall support was very robust. Not once did I encounter the issue described in the bug report – a page that would only translate as a single block.

That made me wonder if something other than <score> elements could be to blame.

There were other pages by the same author and in the same category (Tone Poems of Richard Strauss) as the reported page. I tried translating these pages and discovered that every last one of them had the same problem. Clearly they had something in common that was tripping up the parser. If it wasn’t the score elements, what was it?

I examined the wikitext of each of the pages, looking for common features. I found that each of the pages began with a particular template whose purpose was to italicize the page title, followed by a thumbnail image.

Wikipedia tracks which pages — among all the tens of thousands that are posted there — use a particular template, so I began testing pages that, like the Strauss Tone Poem pages, began with the Italic title template. I encountered no problems in translating those pages. Next I tracked down and began testing the subset of pages that began with an Italic title template immediately followed by a thumbnail image.

Voila! Every last example that I found translated as a monolithic block rather than translating paragraph by paragraph! And as further evidence that score elements were not to blame, most of those pages did not include even a single score element.

I can’t tell you why that particular combination of elements – Italic Title plus thumbnail image – causes a problem for the parser, but I can tell you that it DOES, and that brings us a step closer to a solution. (Remember, my single arrow does not have to bring down the entire buffalo; see my last blog post: Teamwork.)

It was a fantastic puzzle to test my newfound testing skills, and a very gratifying way to end the internship!

Advertisements

Teamwork

By now I’ve found and filed a number of bugs, and it’s been very gratifying (not to mention fun), but finding bugs is not the only way of contributing to a testing team, or even the most important. It’s equally important to help add to the group’s understanding of existing bugs. The more precisely we can diagnose a problem, the easier it will be for development team to locate the part of the code that’s creating it. You can see this collaborative investigation going on for virtually every item on our work board; go to any task page and you will find, beneath the bug description, a comment section where others check in to report their additional test results and observations.

Last week I got a taste of how hard you can work without having a bug to show for it at the end! One of my mentors suggested I do a little investigative work on several possibly interrelated bugs, all having to do with the way math formulas are rendered (or NOT rendered, as the case may be). I went deep into the weeds on one particular bug, laboriously comparing the HTML output for different machine translations of a particular block of source material. Although it hasn’t led anywhere definite yet, it does add to our knowledge. We know that something different is happening to what we might expect, and we know what parameter (machine translation engine!) causes the something to occur. Stay tuned!

Toward the middle of the week, my mentor guided me to a task that at one and the same time rescued me from the migraine-inducing, HTML-comparison work I’d been doing AND dovetailed with it perfectly. She showed me a math formula bug in which the original reporter (an actual user involved in translating math-related articles from English to Portuguese) had listed ten different pages on which he’d encountered a particular problem, and she asked me to go back to those pages and check the status of the bug. This is an important part of bug list maintenance! The mediawiki code changes literally every week, and as time goes by, you need to periodically verify whether a particular bug is still present and whether it is behaving the same as originally reported. So again, this isn’t as “glamorous” as discovering a brand new bug, but it is important to the team effort.

“No member of a crew is praised for the rugged individuality of his rowing.”
– Ralph Waldo Emerson

Yep! That about sums it up.

My goals evolve…

These days I am dividing my time between two projects: testing for the CX2 translation tool and testing for VE, the visual editor. At the beginning of the internship I was working only on CX2, and one of my main goals was to create a set of test cases for it. It was a terrific project for a noob like me, because it helped me to learn the product – you have to understand how something works (and a fair amount about how it can FAIL to work) before you can turn around and write test cases for it! Also, the end result (120 test cases that together exercise every part of the UI) will be a very helpful tool for anyone else who wants to learn the product. It’s pretty much a guided tour.

But the more familiar I grew with the product, the more constraining the test cases felt. I mean, picture having to follow a set of detailed instructions for tying your shoe – something you automatically do without  thinking, because you’ve done it so often before. It would make you a little crazy, right? I had a sneaking suspicion that the test cases I’d poured so much energy into were not going to be much use to the people who’d been testing the product for weeks, months, and years.

Then I started my new project – testing the visual editor. VE is such a sprawling product, with so many dialogs and widgets within it, that I wouldn’t be able to write a complete set of test cases for it by the end of the internship even if I wanted to. Luckily, that’s not on my “to do” list. (I think my mentors wanted me to learn to do test cases but not spend ALL my time on it – that wouldn’t leave much time for exploratory testing, which is the main focus of the internship.) My mentor for VE suggested that, at the very most, I create a “checklist” to guide my testing. I began one and immediately saw what a lightweight and flexible tool it could be – especially for a tester who already knows the product. The detailed, 22-step instructions for tying your shoe are replaced with a single, handy reminder: “Tie your shoe.” (Even old hands can use the occasional reminder!)

As soon as I finish my VE checklist, I’m going to go back and make a couple for CX, because it’s my new-and-improved goal to leave behind some tools that are truly useful – not just to new testers (as the test cases will be) but to those who have been around a while longer as well.

More about the kind of testing I’m doing

My job during this internship is to help the QA team expand its test coverage of various software features (to date, the ContentTranslation and Visual Editor tools). These two tools each have dedicated QA people who perform smoke tests and regression tests. A smoke test is a quick inspection of the major functionality of the software to ensure that it’s working as expected. For example, with the ContentTranslation tool, you’d check to see if you could enable it, navigate to the main page, select an article to translate, specify what language to use, navigate to a secondary page where the actual translation takes place, successfully translate a few paragraphs of text, and finally publish the translated article. There are more steps than that, but you get the idea. In regression testing, you go through the functionality in a more detailed way, making sure that nothing has been broken by any recent changes that have been made to the code. For example, instead of just translating a few paragraphs of text, you test other page elements such as tables, templates, captioned images, different types of references, and specially-formatted items like math functions – and you check them using three different translation engines!

Many (maybe most?) companies use test cases – actual scripted instructions – to guide the testing. At Wikimedia, it’s a little different! The type of testing we’re doing (at least in the two groups I’ve worked with) is called “exploratory testing.” I told you a little bit about that in my second blog post, but to recap, exploratory testing is not as linear as traditional testing, which goes from test design to test execution and then to evaluation. In exploratory testing, rather than following a script, the tester is free to investigate and experiment with the product, and follow her own instincts about what to look into at any given time. This is a very practical approach, especially when you consider that most  testers have extensive experience with their product and know exactly how to evaluate it without having to refer to instructions. Not only that, but they are perfectly aware of the priorities and risks attached to each individual release – so they know where to focus their energies to make the best use of limited time. That’s something that a set of test cases can’t do!

But don’t get me wrong, it’s not that test cases are tossed aside.*  The exploratory tester is free to make use of any tool. It’s just that the tester’s skills and experience – and not some spreadsheet – drive the process.

* In fact, my very first project was to create a set of test cases for ContentTranslation, and it was that exercise that taught me all the ins and outs and bells and whistles of the feature. Invaluable! But now that I can find my way around, I use the test cases mainly as a reference and as a launching pad for further discovery.

On Diving In

There’s a story my dad used to tell about when he was a kid in the first grade learning to read. The teacher’s method of instruction was to call on the children one-by-one to stand up in front of everybody and read a few sentences from a story book, and then to correct them and help them along if they ran into trouble – which of course they always did, it being a first grade reading class! Well, on this day she called on my father first. He slowly got to his feet, red with embarrassment, and respectfully declined her request. “I’m sorry, teacher.  You’ll have to ask someone else.  I can’t read.”

I just LOVE that story, because it shows the deep roots (already present when we’re only 6 years old!) of our reluctance to publicly attempt something that we’re not completely confident about – in other words, our fear of making mistakes in front of others. There he was, in a setting DESIGNED to help him learn, and he felt he couldn’t participate because he hadn’t already mastered the subject!

Not to unfairly single out my dad, let me tell you MY first grade story. My friend Patty walked up to me during recess with an untied shoe and asked me to tie it for her. I didn’t know how to tie shoes – but no way was I going to tell HER that (or the rest of the kids on the playground). Knowing, in general, what a tied shoelace was supposed to look like, I set to work and engineered something truly monstrous, although it did have two floppy loops that looked vaguely bow-like.  I shudder to think what her mother went through that night when she tried to undo the mass of knots and free her child from the shoe. She probably had to cut the lace out entirely.

Oh, the things we do in our youth to avoid seeming less than perfect in public! Fortunately, we outgrow that, right?

HA!

My hair is now mostly gray, but last week I found myself stuck on a problem that I increasingly suspected was somehow my own doing, and you would think I’d know better by now, but I found myself delaying and finding hinky work-arounds rather than pestering my mentor. (“Pestering” was the way I put it to myself, but the fact is that I was discouraged that I couldn’t figure out the problem myself, and also I didn’t want to look dumb.) The problem involved a test environment that I for some reason was unable to log into. Fortunately for me and my sudden reversion back to my first-grade mind set, my mentor happened to mention that the environment had been down for a while, and so we began to talk about it. And when the test environment came back and I STILL couldn’t log in, she collaborated with me to diagnose the problem, showing by example different steps that I could take to determine whether the problem was with the server, my machine, a specific browser on my machine, the number of browser tabs open at the time of the login attempt, or even the state of the cache on a particular browser. (It turned out that clearing the cache on Chrome and rebooting was enough to enable me to log in to the testing server again.)

It was very instructive watching my mentor tick through various diagnostic tests on her end — and now I have all that in my toolbox if some variation of the problem arises in the future — but do you want to know what was actually the most inspiring part? At some point in the thick of it all, she made the remark “This is all very interesting.”

Wait. What?  Interesting?  WOW!  In one off-hand remark, she completely reframed the notion of having a problem! It’s not the end of the world, it’s not failure, it’s not Evidence of My Ineptitude, it’s not obstruction (well, yeah, it’s temporarily obstruction, but not ONLY that) — it’s interesting! A puzzle to be solved. Looking at it that way not only changes how I feel about being stuck, it makes me much more willing to ask others for input on my  problem  interesting puzzle.

Okay, one last story. It takes place on a movie set, but like open source projects, movie-making is an intensely collaborative effort.

I read about this 20 years ago and it’s stayed with me ever since. There was film-making company that was losing a lot of money during production because mistakes would happen on the set and go undetected until the director viewed the dailies (the raw footage from that day’s shooting). They’d have to go back and reshoot the scene at enormous expense. The director looked into why this was happening, and he found that people were so worried about messing up that, if they ran into a problem or made a mistake, they would not MENTION it to anybody and would instead just hope that it would go unnoticed. So the director instituted something new – he told the crew that prizes would be awarded on the spot to anybody on the set who ran into a problem and PROCLAIMED it right away. Boom! Everything changed. People shouted out when they were stuck, and others would rush over to help problem-solve. The whole operation became much more efficient and joyful, and bad dailies became a thing of the past.

I hope any or all of these stories help loosen the hold of whatever inside you thinks you need to be perfect before you can dive in. (And I hope you remind ME to reread them the next time I’m hemming and hawing and futzing around with hinky work-arounds rather than taking it to the team, lol!)

My First Project

Last week was the first full week of my internship, and the project I was given was nearly ideal for the purpose. I was given an initial set of test cases for one of the products I will be testing (called Content Translation) and asked to review, edit, and expand it, while at the same time using it to actually test the product as it passed through various test environments on its way to production.

The reason that this project was (and continues to be) so ideal is that it teaches many important things simultaneously. First, it is teaching me how to write test cases — how to organize them on a spreadsheet, group them logically, think out the prerequisites, and plan the flow of the tests in the most efficient way possible. Secondly, it is teaching me the product itself — after all, you can’t test something if you don’t know how it works! You have to dive in and use it, trying every bell and whistle until you come to understand everything the product is supposed to do and all the ways it can behave. Finally, the project is helping me to understand the process of testing. This particular product has a set weekly cycle. Changes first happen in the developers’ local environments, and on a certain day those changes are merged into a non-production master, and the whole thing is tested. The next day, the release is moved to another test site that is a perfect replica the largest of the wikimedia-based wikis (Wikipedia, of course), and there it is fully tested again. The day after that, the release is deployed to a limited number of (smaller) production wikis — and it’s tested AGAIN. Finally, if everything is good to go, the next day the release is deployed to ALL production sites, including Wikipedia.

In the week ahead, I will continue working on Content Translation while taking on a new product as well (not sure which one yet).

Onward!

Context-Driven Testing (compared to other schools)

I’ve been doing some reading this week in preparation for the internship kick-off on January 4. Elena (one of my mentors) suggested some articles, including Four Schools of Software Testing by Brett Pettichord, which compares and contrasts these approaches:

  • Analytical—Focuses on the internal structure of the software and aims for comprehensive code coverage. Requires a detailed specification. Tests are technical (usually requiring programming skills) and have a binary pass/fail result.
  • Factory—Testing for which there’s a predictable template that can be applied to different projects. In other words, something that is standardized and can be easily managed. Additional goal of cost-effectiveness. Focuses on requirements testing.
  • Quality Assurance—Focuses on enforcing and improving development processes. Notion of having to “protect users from bad software.”
  • Context-Driven—Takes into account the context of a software project (stakeholders, resource & schedule constraints, etc.) when determining the right testing strategy for this particular project at this particular time. In other words: flexible and pragmatic. Focuses on “exploratory testing,” in which each tester is highly engaged in understanding the stakeholder interests and the specs (both explicit and implicit) and aims to design tests that will advance the whole team’s understanding of the software.

Pettichord says right up front that his purpose in comparing these schools is to highlight how his own school (Context-Driven) differs from the others, so there’s extra attention devoted to that approach in the final third of the presentation. It interested me enough to follow some links and do some extra reading. Here’s what I learned:

Context-Driven or Exploratory testing starts with the acknowledgment that it is impossible to test everything about the software. (If you don’t believe that, check out You Are Not Done Yet, a testing checklist by Michael Hunter. I just skimmed it but was still overwhelmed!)

Given the almost limitless potential size of the task, the most effective and efficient way to proceed is to focus on, as Cem Kaner (a founder) puts it, “risks or issues that are of critical interest TODAY.”

That’s not all there is to it, of course. Another tenet of Context-Driven testing concerns gradually expanding test coverage to more and more avenues of exploration rather than simply repeating the same tests over and over again.

But the emphasis on immediate risks and issues along with the team focus and the idea of having ALL testers fully engaged in the problem-solving made me think of this scene from Apollo 13: