May 24, 2012
The Importance of Development Documentation
Overview
Lately I've found myself harping on the importance of documenting code, program execution, and SCM items (i.e. JIRA issues and Perforce changelists). Documentation can be a controversial topic, particularly when mixing people from opposite camps on the subject. It has even been referred to as a philosophical difference.
Typical arguments against producing documentation for internal consumption tend to fall into the following two categories:
- The documentation is superfluous with respect to the source code.
- The resources spent producing documentation are better spent elsewhere.
While I could continue to espouse the benefits of good documentation, in many ways the discussion reduces to a disagreement along the lines of he-said/she-said. So instead of proselytizing I will instead provide scientific evidence in support of documentation. It is not a difference of philosophy.
The majority of evidence presented here applies to software developers but the analogous benefits apply o any persons involved in the development process including QA, technical writers, and anyone else that may need to synthesize information about the product. Only evidence indicated as statistically significant is included.
This essay will not cover the benefits of clean code although those benefits may be discussed in the referenced papers. For more on clean code, see Clean Code: A Handbook of Agile Software Craftsmanship [Google Books] or Writing clean code [IBM developerWorks].
The Importance of Comprehension
To identify the value associated with the variable or attribute of a scientific experiment a metric must be defined. For documentation that metric is comprehension and the resulting benefits of improved comprehension.
Debugging Efficiency
Leo Gugerty and Gary M. Olson. 1986. Comprehension Differences in Debugging by Skilled and Novice Programmers. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 13-27.
Gugerty and Olson conducted an experiment to determine differences in debugging skill between novice and expert programmers. Experts were able to identify and fix the programs in less than half the time (18.2m/17.3m for novices, 7.0m/9.3m for experts), with fewer attempts (4.5/2.2 for novices, 1.9/1.1 for experts), and with less probability of introducing new bugs (23%/30% for novices, 17%/0% for experts). Results indicated this was in large part due to generating high quality hypotheses with less study of the code primarily due to their superior ability to comprehend the program.
Murthi Nanja and Curtis R. Cook. 1987. An analysis of the on-line debugging process. In Empirical studies of programmers: second workshop, Gary M. Olson, Sylvia Sheppard, and Elliot Soloway (Eds.). Ablex Publishing Corp., Norwood, NJ, USA 172-184.
Nanja and Cook studied differences in the debugging process of expert, intermediate, and novice programmers and measured their performance when debugging. Their results support the conclusions of Gugerty and Olson's study: experts relied on superior program comprehension to fix bugs faster (19.8m for experts, 36.55m/56.0m for intermediates and novices) with less code changes (8.83 LOC for experts, 10.33/23.16 LOC for intermediates and novices) and without introducing as many new bugs (1 for experts, 2.33/4.83 for intermediates and novices).
Robert W. Holt, Deborah A. Boehm-Davis, and Alan C. Shultz. 1987. Mental representations of programs for student and professional programmers. In Empirical studies of programmers: second workshop, Gary M. Olson, Sylvia Sheppard, and Elliot Soloway (Eds.). Ablex Publishing Corp., Norwood, NJ, USA 33-46.
Holt et. al. examined the correlation between a programmer's perceived difficulty and complexity of code on that programmer's debugging performance. They found a small but significant correlation between debugging time/attempts and the difficulty in finding information (0.235/0.184/0.237) and the difficulty in recognizing program units (0.291/0.177/0.205). A somewhat less significant correlation was found between difficultly in working with the code and time to debug (0.210) and between program formatting being too condensed and number of debugging transactions (0.197).
Poor comprehension increased the time to fix bugs and correlated with the introduction of new bugs or incorrect fixes.
Systematic Understanding
David C. Littman, Jeannine Pinto, Stanley Letovsky, and Elliot Soloway. 1987. Mental models and software maintenance. Journal of Systems and Software. 7, 4 (December 1987), 341-355. DOI=10.1016/0164-1212(87)90033-1 http://dx.doi.org/10.1016/0164-1212(87)90033-1{info}
Littman et. al. analyzed the development process of experienced programmers tasked with modifying a program and identified two categories for understanding programs.
- Systematic developers trace data and control flow to understand global program behavior. The programmer detects causal interactions between program components and designs a modification taking these interactions into account.
- As-needed developers limit the scope of their understanding to the code that must be modified to implement the change. Data and control flow and interactions that may be affected due to the modification are unlikely to be found.
In their experiment all five developers who used the systematic strategy successfully modified the program while all five developers who used the as-needed strategy failed to modify the program correctly.
Failure to understand global program behavior and interactions between components resulted in incorrect implementation every time.
Code Reuse
Hoadley, C.M., Mann, L.M., Linn, M.C., & Clancy, M.J. (1996). When, Why and How do Novice Programmers Reuse Code? In W. Gray & D. Boehm-Davis (Eds.), Empirical Studies of Programmers, Sixth Workshop (pp. 109-130). Norwood, NJ: Ablex.
Among developers who are pre-disposed towards code reuse, comprehension influenced both the frequency of and form of reuse. Two mechanisms of reuse were examined:
- Direct is reuse of a function by calling it from new code.
- Cloning is copying code out of an existing function into new code.
An abstract understanding of functions resulted in 65% reuse (both direct and cloned) while only an algorithmic understanding resulted in 12% reuse. Misunderstood functions had low direct reuse of 5% but were reused by cloning 40%.
Code that is not well understood is less likely to be reused. Code that is misunderstood is likely to result in incorrect code.
Improving Comprehension
Beacons
Beacons are key features in code that indicate the presence of a structure or operation and strengthen the reader's hypothesis of functional behavior. They serve as shortcuts towards comprehension; failing to recognize a beacon requires a developer to spend additional time on comprehension.
Susan Wiedenbeck. 1986. Processes in Computer Program Comprehension. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 48-57.
Wiedenbeck's experiments found that experienced programmers were able to recall 77.75% of the beacons versus 47.50% of the non-beacons in the code while novices only recalled 13.83% of the beacons and 30.42% of the non-beacons.
Martha E. Crosby and Jean Scholtz and Susan Wiedenbeck. 2002. The Roles Beacons Play in Comprehension for Novice and Expert Programmers. In Programmers, 14th Workshop of the Psychology of Programming Interest Group, Brunel University. 18-21.
Comment beacons indicative of functionality are quickly processed by experienced programmers. Pure code beacons (i.e. important lines of code) require more time to process and might benefit from comprehension aids.
Edward M. Gellenbeck and Curtis R. Cook. 1991. An Investigation of Procedure and Variable Names as Beacons During Program Comprehension. Technical Report. Oregon State University, Corvallis, OR, USA.
Gellenbeck and Cook found that meaningful procedure and variable names resulted in higher rates (52% and 74%) of correct behavior identification compared to combinations with neutral procedure and variable names. However this still shows a large percentage of incorrect identification (48% and 26%) for undocumented source code.
Add documentation beacons (comments, mnemonic hints, or whitespace and formatting) to highlight important operations and logical concepts to speed up comprehension time and ensure proper comprehension.
Plausible Slot Filling
Stanley Letovsky. 1986. Cognitive Processes in Program Comprehension. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 58-79.
Plausible slot filling is an attempt to explain an unknown based on existing incomplete knowledge. It is a result of [abductive inference|http://en.wikipedia.org/wiki/Abductive_inference] (i.e. guessing) where one tries to explain something through reversed logical deduction. In other words:
if "Q" and "P implies Q" then "maybe P"
The deduction may be incorrect. In Letovsky's experiment a developer incorrectly guessed that a memory allocation within a database function was for a database record. In another example the developer did not immediately understand why only six elements were displayed when the record array contained seven elements.
Document background information and the purpose of code to prevent incorrect conclusions, even when the issue appears isolated or minor.
Program-Dependent Items
Mark Thomas and Stuart Zweben. 1986. The Effects of Program-Dependent and Program-Independent Deletions on Software Cloze Tests. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 138-152.
A cloze test is a comprehension and vocabulary test where words are removed from a larger body of text. Removed items fall into one of two categories:
- Program-independent items can be resolved correctly without understanding the functionality (e.g. by process of elimination or to meet compilation requirements).
- Program-dependent items require functional understanding for correct resolution.
In the tests conducted by Thomas and Zweben cloze test error rates for program-dependent items were 41.14%/32.11% while only 12.41%/5.75% for program-independent items. Stated differently, participants had a much harder time deciphering the correct meaning of the code when lacking program-dependent information.
Document considerations (global, external, state) to reduce the chance of incorrect conclusions due to missing context.
Abstract Comprehension
Hoadley, C.M., Mann, L.M., Linn, M.C., & Clancy, M.J. (1996). When, Why and How do Novice Programmers Reuse Code? In W. Gray & D. Boehm-Davis (Eds.), Empirical Studies of Programmers, Sixth Workshop (pp. 109-130). Norwood, NJ: Ablex.
Experiments found that students having difficulty summarizing code were less likely to reuse code. Additionally, abstract comprehension resulted in 65% function reuse versus 12% function reuse with only algorithmic comprehension. Code that was not understood either abstractly or algorithmically was cloned 40% of the time which likely resulted in incorrect code.
Documentation should be written towards both abstract and algorithmic comprehension to increase code reuse and prevent incorrect code cloning.
Encouraging Documentation
While the benefits and mechanisms of improved development documentation may be clear, it is also important to take action that will result in the production of this documentation.
Herb Krasner, Bill Curtis, and Neil Iscoe. 1987. Communication breakdowns and boundary spanning activities on large programming projects. In Empirical studies of programmers: second workshop, Gary M. Olson, Sylvia Sheppard, and Elliot Soloway (Eds.). Ablex Publishing Corp., Norwood, NJ, USA 47-64.
Krasner et. al. conducted an informal analysis of the communication issues affecting large programming projects and identified areas in which the culture and environment discouraged effective communication. These areas include communication skills, incentive systems, representational formats, rapid change, jargon, information overload, scheduling pressure, and peer/management expectations.
Encouraging the production of documentation and effective communication must be accomplished through a combination of peer pressure and management behavior.
- Hire or foster developers with high communication and technical competence who exhibit an attitude of egoless programming.
- Reward documentation, communication, and long-term goals instead of short-term performance.
- Use similar/standard documentation formats and minimize the use of jargon.
Posted by josuah at 12:55 AM UTC+00:00 | Comments (0) | TrackBack
May 22, 2012
Transmit SFTP Failure
I ran into a strange problem today where my attempts to SFTP to my server were failing but I could SSH in just fine. My login credentials were correct, and my server logs weren't indicating a failure. They seemed to indicate a problem with the client.
May 21 10:56:55 binibik systemd-logind[979]: New session 21304 of user wesley. May 21 10:56:55 binibik sshd[7714]: subsystem request for sftp by user wesley May 21 10:56:55 binibik sshd[7714]: Received disconnect from 69.53.237.65:11: disconnected by user May 21 10:56:55 binibik systemd-logind[979]: Removed session 21304.
I am using the wonderful Transmit FTP client and version 3 of the client displayed an error dialog stating 'permission denied' while version 4 of the client displayed an error dialog stating the username or password was incorrect.
So both the server logs and client error message was incorrect and therefore misleading. I turned on Transmit verbose logging which showed authentication succeeded. I think the log messages might have indicated something was wrong but there wasn't a clear message indicating so.
Turns out the problem was my sshd_config configuration. The sftp subsystem configuration line was pointing at an old file location that no longer existed. I fixed that so it pointed at the correct location and everything works now.
Subsystem sftp /usr/lib/ssh/sftp-server
Posted by josuah at 4:23 AM UTC+00:00 | Comments (0) | TrackBack
May 20, 2012
Married to Christina
Christina and I had our marriage celebration today with a small group of friends and family. Everything went very well and Christina looked beautiful in her wedding gown and Chinese dress. We held the ceremony beside a lake (water hazard) at Summitpointe Golf Club and then had the reception afterwards in the main hall. It wasn't the most fancy venue but the outdoor ceremony was very nice and most importantly everyone enjoyed it.
My friend Anthony did a great job as our ceremony officiant, reading a script prepared by one of my other friends Matt who unfortunately couldn't make it because he got stuck in Indonesia on business. We had a sand ceremony of red and blue sand that we poured into a heart-shaped vase Christina found in China. Christine, one of Christina's friends, was her maid of honor and Jasmine was her bridesmaid. Calvin was my best man and Dennis was my groomsman. Naomi wore a pretty little white dress and was our flower girl.
Christina was particularly happy with the flowers that we had that day. That was probably the best decoration of the entire celebration. We took all the flowers home and they're all over our bedroom right now. Some of the other stuff didn't go as well: we didn't find out that the room wasn't going to be decorated until the day before and had to make rush arrangements to get that done and the changing room was very small and was primarily the event coordinator's office.
At the reception, Dennis was the DJ and he did a really good job at it. We played 'Eyes on Me' by Faye Wong as the song for our first dance. During our dinner at the sweetheart table we had a special guest because Caitlin came to eat with us. Calvin and my mom said a few words for the toast. Christina didn't like the food that much but other people said it was good. I played a lot with Naomi and Caitlin which was lots of fun, and was happy to see Shannon and Mei-Ling again after over a year since last time.
We all had a good time but we're glad it's all over so we can relax again.
Posted by josuah at 6:05 AM UTC+00:00 | Comments (0) | TrackBack