Curriculum Vitae

Photo of Isabelle
Isabelle GUYON
Clopinet Consulting
955 Creston road
Berkeley, CA 94708
(510) 524 62 11


Present Position
Professional Experience
Additional Information
Achievements and Interests
Selected Publications

Present Position

Independent engineering consultant specialized in statistical data analysis, pattern recognition and machine learning techniques. Applications include handwriting recognition, knowledge discovery in databases, and bioinformatics.

Professional Experience

1996 to present:
Clopinet, Berkeley, California.
Independent engineering consultant.
Worked on 12 different projects for small and large companies, including:
- the design and implementation of a program to teach cursive handwriting to children,
- a fingerprint verification system,
- data analysis of DNA microarray, antibody array and mass-spectrometric data,
- conception of data formats and organization of benchmarks.
Experience with project coordination and telecommuting.

BIOwulf Technologies, Berkeley, California.
Vice President.
Assembled a team of world recognized experts in Machine Learning and SVMs. Initiated several research projects with Universities and Industry to analyze genomics and proteomics data, including DNA microarray data and mass spectrometric data.

AT&T Bell Laboratories, Holmdel, New Jersey.
Information services department (headed by Dr. John Denker).
Research on search engines for World Wide Web applications.

1989 to 1994:
AT&T Bell Laboratories, Holmdel, New Jersey.
Adaptive systems department (headed by Dr. Larry Jackel).
Research on artificial neural networks and the theory of learning, with application to pattern recognition and in particular on-line handwriting recognition.


Université Pierre et Marie Curie, Paris, France.
Doctoral studies on artificial neural networks, with several contributions on architecture design, learning algorithms and applications to character recognition.
Professor Gérard Dreyfus, advisor.
Ph.D. degree in Electrical Engineering in December 1988.

Ecole Supérieure de Physique et Chimie, Paris, France.
Engineering diploma in physics and chemistry in June 1985.
Master degree in electrical engineering in June 1985.

Additional Information

Born August 15., 1961, in Paris (France).
Married, three children.
Citizenships: French and Swiss.
Permanent US Resident.
Languages: French, English, Spanish.
Computer Skills include: C/C++, Java, Lisp, Windows and MS Office, Unix, HTML, XML, Perl, CGI, Adobe Photoshop, Matlab.
Hobbies: Painting, ski touring, Flamenco dancing.


Consulting project, 1997-1998: Design and implementation in C++ of a program that analyzes cursive handwriting to be part of a handwriting teaching software.
Chip Gierhart
Penmanship Inc.
1099 Los Robles
Palo Alto, CA 94306, USA
1 + (415) 493-3668

Consulting project, 1996-1997: Signal filtering of an electronic pen using accelerometers and gyroscopes.
David Stork
Ricoh Silicon Valley
2882 Sand Hill Road, Suite 115
Menlo Park, CA 94025-7022, USA
1 +(415) 496-5720

Research on Neural Networks and Handwriting Recognition, 1989-1994.
Department head:
Doctor Larry Jackel
AT&T Labs Research
Room 3-140
100 Schltz Dr
Red Bank, NJ 07701-7033, USA
1 + (732) 345-3370

Research on Neural Networks, 1985-1988.
Thesis director:
Professor Gérard Dreyfus
Laboratoire d'Electronique
10, rue Vauquelin
75005 Paris
33 + (1) 47-07-13-93

Achievements and interests

My work focuses on learning systems and in particular Support Vector Machines, Neural Networks and Markov Models. I have also strong interests in pattern recognition algorithms in general, in classical statistics and in learning theory. I like addressing all the aspects of a learning problem: theoretical, algorithmic and practical. My research work has always had a strong emphasis on applications. Since I work as an independent consultant, I participate to the development of new applications of the technologies that I have investigated during my years of research. I also use the latest products of new research conducted by other people. This Section is devided into the following subsections:
First contributions to neural networks
On-line handwriting recognition and "pen computing''
Learning theory and SVMs
Computational linguistics
World Wide Web applications
Genomics and cancer research
Teaching and graduate student supervising

First contributions to neural networks

For my thesis, entitled "Réseaux de Neurones pour la Reconnaissance de Formes: Architecture et Apprentissage'', I proposed and studied neural network algorithms for the Hopfield network. My first publication "Information Storage and Retrieval in Spin Glass Like Neural Networks'' (1985) retained the attention of the statistical physics community and is still cited.

From a practical standpoint, I demonstrated that classical recognition algorithms such as "nearest neighbors'' outperform the best Hopfield networks in a handwritten digit recognition benchmark. I proposed a novel perceptron architecture (one layer neural network) based on pairwise separations between classes. My architecture outperformed nearest neighbors and all other classical statistical techniques. It would today be called an error correcting code method: the outputs of the perceptron being redundantly encoded, the error rate is reduced substantially with some simple postprocessing.

I pursued the last part of my thesis work, devoted to learning sequences of data, when I joined Bell Laboratories in 1989. I explored various applications in speech recognition, handwriting recognition and control. It became apparent that an "all neural network'' solution is not viable. I started working on hybrids of neural networks and hidden Markov models.

On-line handwriting recognition and "pen computing''

One of the applications I have most concentrated on is on-line handwriting recognition. This is the branch of handwriting recognition which uses pen trajectory information coming from a digitizer. This field, also called "pen computing'' in the industry, has known a considerable growth in the past few year since the apparition in the market of the first pen computers (e.g. the Palm Pilot).

The first system I developed, which is described in "Design of a Neural Network Character Recognizer for a Touch Terminal'' (1991) has inspired many presently used systems and was granted a patent in 1992. I won in 1992 an AT&T award for leadership in pen computing. In collaboration with other people from my department, we won in 1993 a benchmark organized by AT&T GIS (ex-NCR) in which we were compared with the best commercially available systems for on-line handprinted sentences. The commercialization of our system was started by AT&T but abandoned when AT&T phased out of the pen computing business.

Following a project that I initiated with a summer student at AT&T in 1992, a collaborator following one of my original ideas implemented a signature verification system that won a best paper award at the NIPS conference1993.

In my consulting practice, since 1996, my customers have recognized me as a world expert in pen computing by hiring my services. I have conducted several projects of pen computer interfaces (Ricoh, 1996; Baron, 1996; University of Dublin, 1997). One of my recent projects was the design and implementation of a program to teach cursive handwriting to children (Penmanship, 1997-1998). The resemblance of signature verification and fingerprint verification problems allowed me to design and implement a fingerprint verification system (WhoVision, 1999). I contributed to the design of an XML format for storing and annotating handwriting pieces of evidence to be analyzed by forensic experts (Wanda, 2003).

Learning theory and SVMs

In my collaboration with the Russian mathematician Vladimir Vapnik, well known for his work in learning theory, I went deeper into the theoretical aspects of learning. Central to the Vapnik-Chervonenkis theory (VC theory) is the notion of capacity, or VC dimension, of a learning system. Roughly speaking, a system with too small capacity cannot even learn the training data while a system with too large capacity does not generalize well on unseen data.

In a paper entitled "Structural Risk Minimization for Character Recognition'' (1992), we developed a practical method to control the VC dimension.

We went further and proposed, in a paper entitled "A training algorithm for optimal margin classifiers"(1992), an novel algorithm based on the principles of learning theory. This algorithm now called "Support Vector Machines" or SVMs has known considerable developments in the past few years (see It is considered to be the successor of Neural Networks and has dozens of successful applications (see In a recent article entitled "Comparison of Classifier Methods: a Case Study in Handwriting Digit Recognition'', this algorithm reaches, without any a priori knowledge about the task at hand (i.e. working directly on pixel images), the same performance as the best system, a neural network with sophisticated architecture which has been optimized over several years of human effort. Our invention was granted a patent in 1997. It has become a standard textbook technique that is described in the new edition of the classical textbook "Pattern Classification'' by Duda, Hart and Stork, 2001.

In a paper published in 1998, ``What Size Test Set Gives Good Error Rate Estimates'', we address the difficult problem of finding an optimum split of the data into training set and test set. My present research addresses the difficult problem of input variable selection.

Computational linguistics

In 1994, I designed with the Bell Labs linguist Fernando Pereira a linguistic engine which uses the statistics of English to make predictions about the next character, given a variable length window of past characters. The predictions of this engine were used to improve handwriting recognition by favoring choices that are consistent with the statistics of English.

The classical measure of performance of linguistic engines is the cross-entropy on a large test set, that is the per character length of the shortest code to encode the prediction errors. I ran experiments to compare our model with a model designed at IBM which is the best model reported in the literature. Using a standard benchmark (the Brown corpus) our system reached approximately 2 bits per character, compared to 1.75 for the IBM system and 1 for humans (as estimated by Shannon in 1950). While our performance are slightly worse, our model has the tremendous advantage that it is 200 times smaller than the IBM system (160,000 parameters).

In our design, we controlled the tradeoffs among memory, speed and accuracy by varying a single information-theoretic parameter. Finite state automata composition allowed us to reconfigure flexibly the language model according to the needs of different applications, without having to retrain the core of the model.

Combining the model with on-line neural network handwriting recognizers designed in our group showed substantial reduction in error rate (up to a factor of 2 to 3 depending on the writer).

World Wide Web applications

During a brief period, I was interested in the World Wide Web. In 1995, I explored at Bell Labs information retrieval problems from the theoretical and practical point of views. This research was later applied when I worked on matching customers with products and services at CyberGold (1996). In 1998, I was hired by AT&T to contribute to the design a help system for their One Mail portal. In 1999, I conducted for another customer (Intraware) a comparison of different software of encryption and key management.

Genomics and cancer research

In the recent years, I started applying Support Vector Machines and other kernel methods (see to the analysis of biological data. With the advent of new assays that measure in parallel several thousand variables like gene expression DNA microarrays, biology has moved into a new data analysis intensive era. SVMs and kernel methods have proved to be particularly well suited to analyze such data.

While working with BIOwulf (1999 to 2002), I have been confronted with a wide variety of problems of medical diagnosis and prognosis and have been collaborating with a number of universities, biotech companies and pharmaceutical companies. Central to this research is the problem of variable or feature selection: determining which input variables contribute most to making correct predictions. Applied to diagnosis or prognosis, variable selection is called "biomarker discovery". Drug target discovery is also a related problem that benefits from the identification of valid biomarkers.

The initial results of gene selection using SVM Recursive Feature Elimination, a technique of feature selection that I invented, demonstrated that predictors using only a few genes can be built to diagnose Leukemia, colon cancer, lymphoma, and protate cancer ("Gene Selection for Cancer Classification using Support Vector Machines", Machine Learning, 2002). The SVM prediction performance also compare favorably to other methods and our group won several benchmarks organized by our customers.

I am presently consulting for several companies designing new biomedical assays. In this way, I have direct access to good quality fresh data by having some control over experimental design and data collection. One of my main charters has been instrument characterization and quality control. At Pointilliste (2002 to present) I have been confronted with patterns of activities of antibody arrays. At Biospect (2002 to present) I am working on two dimensional patterns resulting form the combination of capillary electrophoresis and mass-spectrometry (CE-MS).

Teaching and graduate student supervising

I gave tutorial lectures and courses in several summer schools and workshops, including at the conference on Numerical Methods for Engineers (Lausanne, 1989), at the summer school on Computational Physics, Parallel Architectures and Applications (Lausanne, 1990), at the NIPS workshop that I co-organized with M. Kearns and E. Levin on Comparison and Unification of Algorithms, Loss Functions and Complexity Measures (Vail, 1992), at the summer school on Fundamentals in Handwriting Recognition (Bonas,1993), at the workshop on Model Selection and Inference Principles (Dagsthul, 2001). I am the author of several tutorial articles on neural networks, kernel methods and feature selection.

In the course of the pen computing project, I co-advised two PhD students and one Masters student:
- Nada Matic graduated with honors in 1993 from the City University of New York for a work entitled "Exploration in On-Line Handwriting Recognition''. Her thesis comprises two parts: (1) innovative methods of computer aided data cleaning and (2) writer adaptation.
- Markus Schenkel graduated with honors in 1994 from the Swiss Federal Institute of Technology of Zürich. His work entitled "Handwriting Recognition using Neural Networks and Hidden Markov Models'' presents the design and study of a hybrid neural network and hidden Markov models for cursive handwriting recognition.
- Claudia Medina obtained in 1995 a Masters degree from "Mills College, California'' for her work entitled "Demographic and Linguistic Study of Handwriting Recognition''. For her thesis she conducted a survey of the uses of handwriting and a linguistic analysis of handwritten notes.
- Asa Ben Hur spend a year with me as a post-doc working on the applications of clustering to DNA microarray data.


In the course of my career at Bell Labs, I have demonstrated leadership capabilities that have been recognized by my AT&T award for "leadership in pen-based systems and pen computing''. I lead a group of six people working on the pen computing project for two years and raised the salary of three of them. I instigated and coordinated for several years of a worldwide effort of data exchange and benchmarks for on-line handwriting recognition: the UNIPEN project. Fourty companies and Universities from all over the world are involved. At BIOwulf I assembled and coordinated a team of world known scientists in Machine Learning in the capacity of Vice President. I have been organizing workshops at the NIPS conference in the past few year. One of them included a benchmark on feature selection that I organized.


I moved to Berkeley in 1992 when my husband obtained a position at UCB. Because of my key role in the pen computing project, I was allowed to telecommute four weeks out of five and to spend only ten weeks per year in New Jersey. This arrangement lasted four years, until I left voluntarily Bell Labs to join CyberGold Inc.

I have considerable experience in participating to and leading teleconference calls and even giving presentations over the telephone. I have scientific collaborators all over the world with whom I interact by electronic mail. Some of them I have never met in person. At BIOwulf, I have conducted a reading group meeting regularly to discuss scientific papers over the phone, with participants living on 3 continents and accross 4 different time zones.

In my consulting practice, I mostly work from home. My customers appreciate that they do not need to provide me with office space and computer equipment. They get a very clear understanding of the project progress because I thoroughly document everything I do. They occasionally use me to work with remotely located associates or to coordinate efforts of several people located at different places.

Selected Publications

1. Structural risk minimization for character recognition.
     I. Guyon, V. Vapnik, B. Boser, L. Bottou, and S.A. Solla.
     In J. E. Moody et al., editor, Advances in Neural Information Processing Systems 4 (NIPS 91), pages 471--479,
     San Mateo CA, Morgan Kaufmann. 1992.
2. Automatic capacity tuning of very large VC-dimension classifiers.
     I. Guyon, B. Boser, and V. Vapnik.
     In S. Hanson et al., editor, Advances in Neural Information Processing Systems 5 (NIPS 92) , pages 147--155,
     San Mateo CA, Morgan Kaufmann. 1993.
3. Discovering informative patterns and data cleaning.
     I. Guyon, N. Matic , and V. Vapnik.
     In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge
     Discovery and Data Mining, pages 181--203. MIT Press. 1996.
4. An training algorithm for optimal margin classifiers.
     B. Boser, I. Guyon, and V. Vapnik.
     In Fifth Annual Workshop on Computational Learning Theory, pages 144--152, Pittsburgh, ACM. 1992.
5. What size test set gives good error rate estimates?
     I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik.
     PAMI, 20 (1), pages 52--64, IEEE. 1998.
6. Linear discriminant and support vector classifiers.
     I. Guyon and D. Stork.
     In Smola et al Eds. Advances in Large Margin Classiers. Pages 147--169, MIT Press, 2000.
7. Gene selection for cancer classification using support vector machines.
     I. Guyon, J. Weston, S. Barnhill and V. Vapnik.
     Machine Learning Volume 46, number 1/2/3, January 2002.
8. An introduction to variable and feature selection.
     I. Guyon and A. Elisseeff.
    Journal of Machine Learning Research, Volume 3. Pages 1157-1182, March 2003.

(This is a partial list, total number of publications is 30, for a full list, see

Book and journal edited:
9. Advances in Pattern Recognition Systems using Neural Network Technologies ,volume 7.
     I. Guyon and P.S.P. Wang, editors.
     World Scientific, Series in Machine Perception and Artificial Intelligence, Singapore, 1994.
10. Variable and feature selection.
    I. Guyon and A. Elisseeff, editors.
    Journal of Machine Learning Research, Volume 3, March 2003.

Patents granted:
11. Time delay neural network for printed and cursive handwritten character recognition.
     I. Guyon, J. S. Denker, and Y. Le Cun.
     US Patent 5,105,468. 1992.
12. Pattern recognition system using support vectors.
     B. Boser, I. Guyon, and V. Vapnik.
     US Patent 5,649,068. 1997.

Isabelle Guyon

Last updated Oct. 1, 2003