The following article was written by Jon Byous for Sun Microsystems, Inc. and appeared on the Javasoft home page . Reprinted with permission.

Genetic Code: JavaTM Technology
and the Human Genome

Sidebar: Tools for Deciphering the Genetic Puzzle

by Jon Byous

You're sitting in a developer's lab in Berkeley, California, surrounded by Ph.Ds in biology, physics, mathematics, and computer science, staring at an on-screen representation of a five-foot section of human genetic material.

In less than two seconds, in one continuous zoom, you're taken from a top-level screen-sized graphic of genetic material down to an individual gene sequence, revealing new levels of zoom-sensitive detail all along the way. Something that was represented as an eight-inch graphic across the screen has now stretched to miles in length to display the basic component building blocks of genetic material -- Adenine, Cytosine, Thymine, and Guanine (ACTG) -- in scrollable detail, along with associated data.

The experience of this zoom is similar to that of starting with a space photo of the planet Earth and continuously zooming in to stop finally at the faces of four kids on a school playground. You then scroll across the playground to count how many boys and girls are on the jungle gym, while seeing their heights, weights, and ages displayed in a data window.

An API Packed with Power

The Genome Software Developmenters Kit (SDK), Neomorphic's primary visualization platform, is a class library that can be customized for Neomorphic's clients to display gene sequence data as well as capture and annotate collaborative data for their specific bioinformatics requirements.

The application's scrolling and semantic zooming capabilities work well on very large graphics, such as five-foot gene segments, partially because only the visible portions of the graphics are rendered. The developers at Neomorphic, mostly scientists, built it themselves from scratch.

Gregg Helt, CTO
Gregg Helt, CTO

"It's a fairly extensive API," says Gregg Helt, Neomorphic's chief technology officer with a Ph.D in biology. "It gives you options such as zooming in on a selected area on the screen instead of just the center of the screen, and it can resize with or without changing the zoom level, which would reveal more data about the structure. The scaling can be linear or exponential or some other transform."

Neomorphic's three major applications share five JavaTM technology widgets developed in house to deliver industry-first graphic visualization capabilities: NeoMap, NeoAssembler, NeoQualler, NeoSeq, and NeoTracer.

Eric Blossom, Sr. Software Designer
Eric Blossom,
Senior Software Designer

 

How the widgets work is a company secret. "We've included the source code for many demonstration applications with the SDK tools so customers can get an idea of how they can really exercise the API, but the source code for the underlying widgets isn't included," says Eric Blossom, a senior software designer with an M.A. in mathematics.

Cyrus Harmon, CEO
Cyrus Harmon, CEO

"This is more than a browsing and visualization tool," says Cyrus Harmon, chief executive officer and president. "It's also a visual query interface. If I am connected to one of the genetic databases and I select a region of a displayed genome, I can get features of the sequence and query for other similar sequences, or genes that are also involved in the same biological process, for example. That's very powerful."

Neomorphic's Genome Browser
A glimpse of Neomorphic's
visual query interface
(Click to enlarge image.)

Neomorphic's Java Technology Toolkit

"Because of user browser limitations, and several clients, we have stayed with JDKTM 1.0," says Harmon. "However, several clients have moved to 1.1 and are ready for an upgrade in capabilities."

Instead of a full upgrade, which would have caused problems for their 1.0 clients, they developed their own features. Helt explains, "One of the things we did was to emulate the 1.1 event model because we really liked the delegation capabilities instead of the percolation of 1.0."

Blossom adds: "We're looking forward to branching to 1.1, at which time we will be able to get rid of some of our code, such as the event model emulation. At this point, we're ready to lose some code."

Sequence Browser
Sequence Browser with
selected ACTG sequence
(Click to enlarge image.)
 

The three main Neomorphic applications are built with a combination of three programming languages. The gene-finding calculations in each application are processed in C, and the Labtrack application is written mostly in Perl -- "although we're in the process of moving that to Java servlets," says Blossom.

Labtrack application
Labtrack displaying a
detail of a sequence read
(Click to enlarge image.)
 

All of the visualization work is written in pure Java technology. In the future, they're considering WebLogic's Tengah for JavaBeansTM. "This may be the case where somebody has already developed what we want and we won't have to do it ourselves," says Blossom.

For demonstrations of some of Neomorphic's application capabilities, see http://www.neomorphic.com/products/
overview.html
.

Taking the Plunge

In the early days of Neomorphic, the choice of Java technology as the development language wasn't obvious or unanimous. "During our earliest development efforts, part of the team wanted to experiment with Java technology, and I didn't think it would work," says Helt. "I wanted to use TkPerl. So as a negative proof of concept, to prove that Java technology wouldn't work, I took on a pilot development project. After about one month, I decided that yeah, this will work after all. Then a working prototype of the visualization tool took another month. Then a working pre-alpha prototype of the component system took another six months. Each time we would toss out what we had done and use what we had learned and do it again."

One factor in the final decision to use Java technology was compatibility with other genome vendors. "Several genome centers were working together to move away from proprietary, monolithic database and analysis tools that only ran on UNIX machines," says Helt. "Distributing data on CD-ROMs wasn't an option, and FTP-ing applications between platforms wasn't very appealing either. A new application needed to work in a browser. So the multiplatform aspect of Java technology became an important part of the decision."

Several of the Neomorphic team members had previous experience with the Java language and welcomed the decision. Blossom describes, "You can do a lot more with it than with Visual Basic or Power Builder."

Groundbreaking Gains

"Bioinformatics is a new space with a lot of ongoing, ground-breaking research," says Helt. "There are so many exciting things happening in this field that you really need to be out on the edge of technology to innovate. And we want to use the best technology."

In this case, innovation can bring enormous rewards. "What genetic science has discovered is that diseases, such as various types of cancers, can be distinguished from each other very effectively at the molecular level and thus treated differently," says Harmon. "In the near future, doctors will be able to analyze your genome and decide which therapies and drugs you will respond to best -- in other words, personalized medicine. But until we can identify and categorize genes by looking at each one molecularly, you can't treat them differently."

The Neomorphic Genome SDK is based on work originally done by Dr. Helt for the Berkeley Drosphila Genome Project (BDGP) at the University of California (http://fruitfly.berkeley.edu). Neomorphic has collaborated with the BDGP to continue work on the Genome SDK and to develop applications for the scientific community.

Visit Neomorphic's Web site at:http://www.neomorphic.com.

See Also

Sidebar: Tools for Deciphering the Genetic Puzzle

Demonstrations of Neomorphic applications
(http://www.neomorphic.com/products/overview.html)

Berkeley Drosphila Genome Project (BDGP)
(http://fruitfly.berkeley.edu)

The Neomorphic Web site
(http://www.neomorphic.com)