Ben Clifford CV/Resume
Email:
benc@hawaga.org.uk
Web page: http://www.hawaga.org.uk/ben/
Employment and Professional Activity
CQX Limited, August 2011-present
I supply my services as a contractor/freelancer. Primarily I work
with functional languages, build distributed systems, and help people with their development infrastructure.
Examples of projects are:
- Implementing LSST-specific functionality for the parsl Python parallel scripting library.
- Working with tweag.io on a Cloud Haskell based consensus algorithm.
- With a collaboration of institutions working on a US Department of Energy
grant, prototyped a new API for high performance data storage targeted
at exa-scale systems.
- Worked with a California-based startup to develop their mathematical model
and prototyped that model in Haskell; and advised on their ongoing
development process.
- Assisted a major retail point-of-sale equipment vendor with the prototype
of a new version of their cloud-based point of sale system as they moved from
a Microsoft-based environment to Linux.
- For an online shop in the Netherlands, retained for ad-hoc advice and custom
PHP coding in support of in-house technical staff.
- Emergency system restoration for a South African publicity company when
their website and email systems were attacked.
University of Chicago, May 2018-February 2019
I worked as a programmer on parsl, a parallel scripting library for Python.
Beautiful Destinations, April 2016-March 2017
I worked as a programmer and systems person in a group focused on machine
learning and analytics for Instagram images and other social media artefacts.
The languages I worked in were primarily Haskell, Python and SQL, with some
use of others such as R and PureScript.
Notably:
- I supported data scientists in moving their research code (in languages
such as Python and R) towards production-ready, maintainable products.
- I codified our diverse and ad-hoc build and deployment practices into a single tool,
using Shake and Docker to allow the production environment, the test environment
and developer laptops to have very similar environments that were easily
version controlled.
- I developed a tool for producing consistent subsets of Postgres databases,
respecting possibly cyclic foreign key relations, to allow our continuous testing
environment to run against realistic looking databases.
- I diagnosed a serious performance bug within ghcjs, a JavaScript compiler
for Haskell, which occured with non-trivial use of Template Haskell.
- I worked on short-notice emergency projects as technical and customer
needs arose.
- I mentored a pair of junior employees in developing their skills, both
as Haskell programmers and in systems work.
University of Chicago Department of Astronomy and Astrophysics, September 2010 - August 2011
Title: Computer Systems Programmer
I worked on a simulation of combustion and detonation, mostly on
IBM BlueGene/P supercomputers scaling up to 65536 cores. The code uses a
combination of MPI and OpenMP parallelisation implemented in
a hybrid of C and Fortran. I concentrated on
improving runtime efficiency of the core simulation and of IO;
and on 3d visualisation of results using VisIt.
Contractor for USC/Information Sciences Institute, November 2009-April 2010
Working with ISI's Center for Health Informatics (CHI),
I was involved in build/release generation of
CHI's appliance DVDs, and in development of a
Nagios based monitoring system for CHI's distributed multi-site environment.
Short-term visitor at Monash University, Melbourne (Australia) with a group
working on similar topics as previous groups at the University of Chicago
and the University of Southern California - primarily for skills and
knowledge exchange between those groups.
Title: Computer Systems Programmer
I worked primarily on
Swift, a programming language and execution environment for coarse grained distributed parallel data-centric computation.
Development included the integration of many different components:
at the hardware level, a range of architectures from laptops to IBM
Blue Gene supercomputers and large scale grids of clusters; at the
middleware level,
a range of execution and data transfer systems including the Globus Toolkit
and PBS; at the application layer, interfacing existing scientific application
codes to the programming model provided by Swift; and at the social level,
co-ordinating the needs of various application and infrastructure groups
each with their own vested interests.
I developed a small suite of performance analysis tools to aid in
characterisation and debugging of the use of Swift for specific applications
and environments. This was especially useful as the application space changed
from early use with hundreds of minute long tasks to a hundred thousand
tasks of sub-second duration composed of many smaller subtasks often of
millisecond duration.
In addition to production-level development, I prototyped the use of Swift
in new environments, including the South African National Grid (SAGrid) using
gLite middleware, and on campus workstation pools using Condor; I also
developed a prototype to record provenance of output data produced by
Swift and contributed to the Open Provenance Model (OPM).
I was also involved in education, outreach and training activities for
Globus and the
Open Science Grid.
In the summer of 2008, I mentored a Google Summer of Code student
in a project to add stronger static type checking
to Swift.
Title: Programmer Analyst III / Programmer Analyst IV
My activities under Dr. Carl Kesselman in the Center for Grid
Technologies included:
- Developer and a technical co-ordinator for MDS (the Monitoring
and discovery (MDS) component of the
Globus Toolkit) in a
team of ~4 programmers
Worked on three versions of MDS:
- MDS2: OpenLDAP, multi-platform information gathering scripts
(Linux, Solaris, IRIX, AIX) ; OpenLDAP database backends in C.
- MDS3 and MDS4: Web-services (WSDL, SOAP, OGSI, WSRF) in a
Java environment under
Linux, with some C and shell script coding. XML was used heavily,
with a focus on XSD and XPath.
- Creation and presentation of technical
tutorial and demo material on MDS and the whole Globus Toolkit
(notably a full day 'Build A Service' tutorial) at conferences and
other events nationally and internationally
- Administered grid applications running on CGT's unix servers
- End-user support for the Globus Toolkit, online and in-person
- Participated in the Global Grid Forum, representing some of ISI
and Globus' monitoring interests.
At Custom Networks, a small IT company in the Greater London area, I
worked in several areas:
- Internet Service Provider (ISP) server deployment and management:
Linux, sendmail, BIND, apache httpd.
- Design and implementation of business specific database applications:
MS SQL Server, MS Access, Visual Basic for Applications
- Office IT system deployment and management for small businesses.
Windows NT/2000; Microsoft BackOffice and Office suites
Education
Queen Mary,
University of London, 1997-2001
M.Sci in Mathematics, Upper second class honours. In particular, I
focused on discrete mathematics, whilst for my masters project I
learned some Latin and translated part of Newton's
Principia Mathematica.
EPCC,
University of Edinburgh, Summer 2000
Summer Scholarship Programme, consisting first of introductory courses
on high performance and parallel computing
followed by an eight week individual project
investigating the use of Jini in a grid environment. See report below.
University of California's Education Abroad Program.
GPA: 3.644. Twice on Provost's Honors List.
Publications, papers, articles, reports, tutorials, talks
Haskell
Swift and Parsl workflow systems
- The Open
Provenance Model Core Specification v1.1 - submitted to Future Generation
Computer Systems
- Parallel Scripting for Applications at the Petascale and Beyond, Michael Wilde, Ian Foster, Kamil Iskra, Pete Beckman, Zhao Zhang, Allan Espinosa, Mihael Hategan, Ben Clifford, Ioan Raicu, Computer, vol. 42, no. 11, pp. 50-60, Nov. 2009, doi:10.1109/MC.2009.365
- Swift: Fast, Reliable, Loosely Coupled Parallel Computation,
Zhao Y., Hategan, M., Clifford, B., Foster, I., vonLaszewski, G., Raicu, I., Stef-Praun, T. and Wilde, M,
IEEE International Workshop on Scientific Workflows, 2007
-
Accelerating Medical Research using the Swift Workflow System,
Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S., Wilde, M and Zhao,Y., Health Grid, 2007
-
Non-Rigid Registration for Image-Guided Neurosurgery on the TeraGrid: A Case Study,
Andriy Fedorov, Benjamin Clifford, Simon K. Warfield, Ron Kikinis, Nikos Chrisochoides,
College of William and Mary Technical Report, 2009
-
Tracking provenance in a virtual data grid, Ben Clifford, Ian T. Foster, Jens-S. Voeckler, Michael Wilde, Yong Zhao, Concurrency and Computation: Practice and Experience, Volume 20, Issue 5, pp 565-575, April 2008
-
Towards Loosely Coupled Programming on Petascale Systems, Ioan Raicu, Zhao Zhang, Mike Wilde, Ian Foster, Pete Beckman, Kamil Iskra, Ben Clifford, IEEE/ACM Supercomputing 2008
-
Extreme-scale scripting: Opportunities for large task-parallel applications on petascale computers, Michael Wilde, Ioan Raicu, Allan Espinosa, Zhao Zhang, Ben Clifford, Mihael Hategan, Sarah Kenny, Kamil Iskra, Pete Beckman, Ian Foster, to appear in Journal of Physics: Conference Series, 2009
- Parsl: Pervasive Parallel Programming in Python, Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Rohan Kumar, Luksaz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster, Michael Wilde and Kyle Chard. 28th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC). 2019.
Grid monitoring:
Replica location:
- Implementation and Evaluation of a ReplicaSet Grid Service
Mary Manohar, Ann Chervenak, Ben Clifford, Carl Kesselman,
presented at 5th IEEE/ACM International Workshop on Grid Computing (Grid 2004),
Pittsburgh, PA, November 8, 2004.
-
A Replica Location Grid Service Implementation
Mary Manohar, Ann Chervenak, Ben Clifford, Carl Kesselman,
Data Area Workshop, Global Grid Forum 10, March 2004.
Tutorials:
I participated in the Grid2003 project which resulted in this paper:
- The Grid2003 Production Grid: Principles and Practice,
The Grid2003 Project
The Grid2003 Project has deployed a multi-virtual
organization, application-driven grid laboratory (Grid3) that has
sustained for several months the production-level services required
by physics experiments of the Large Hadron Collider at CERN (ATLAS
and CMS), the Sloan Digital Sky Survey project, the gravitational
wave search experiment LIGO, the BTeV experiment at Fermilab, as well
as applications in molecular structure analysis and genome analysis,
and computer science research projects in such areas as job and data
scheduling. The deployed infrastructure has been operating since
November 2003 with 27 sites, a peak of 2800 processors, work loads from
10 different applications exceeding 1300 simultaneous jobs, and data
transfers among sites of greater than 2 TB/day. We describe the principles
that have guided the development of this unique infrastructure and the
practical experiences that have resulted from its creation and use.
We discuss application requirements for grid services deployment and
configuration, monitoring infrastructure, application performance,
metrics, and operational experiences. We also summarize lessons learned.
As a student, I worked on
JiniGrid
at EPCC in the University of Edinburgh,
producing the following final report:
- JiniGrid: Specification and Implementation of a Task Farm Service
for Jini.
Edinburgh Parallel Computer Centre
SSP Report,
EPCC-SS-2000-02, September 2000.
Download: [PDF 974k]
[PS 2055k];
There is an also a poster:
[PDF 612k]
[PS 1244k]
[CiteSeer]
Jini is a technology that allows networks of heterogenous services
to organise themselves with little human intervention.
It provides for fault tolerance and for the automatic transfer
of driver code, written in Java, when and where it is needed.
These abilities will be extremely useful, if not essential, in a
large scale environment such as the Grid.
In this project, Jini technology is used to share compute servers
using the task farm paradigm.
A standard API that all compute servers must implement is defined.
Four implementations with different capabilities are presented, as
well as a selection of clients and helper classes.
For my M.Sci project, I was persuaded to
refresh my Latin and read some of Newton's work under the watchful eye of
Charles Leedham-Green.
- A translation of part of Newton's Principia
M.Sci Project, Queen Mary and Westfield College, University of London,
April 2001.
(n.b. There are a number of cosmetic differences between this text and the text
submitted for the degree)
Download: [PDF 332k]
[PS 353k]
[DVI 139k]
Newton's Principia provides the basis for modern Newtonian
dynamics. Many of the results have survived the last three hundred years
with little or no change.
In this report, an English translation of a part of the text is presented,
along with commentary.