'Hooked on data': Clear Capital's chief data scientist

Projects underway include a 'data lake'

'Hooked on data': Clear Capital's chief data scientist

Data first came alive for Erik Allen while he was a Ph.D. student at the Massachusetts Institute of Technology, focused on numerical modeling and simulation during the early 2000s. While there, he and some friends wrote a model that predicted the outcome of baseball games, using data they scraped from the web.

The model led to a major spike in game betting wins, but more than that, it grabbed Allen’s attention and never let go.

“At that point I was hooked on data and the value that data could bring,” said Allen (pictured), Clear Capital’s chief data scientist. “I’ve been a convert ever since.”

Allen has held the chief data scientist position for nearly two years at Clear Capital, a mortgage industry software company that partners with banks and appraisers to deliver valuations and appraisals to customers. Based in Reno, Nevada, Clear Capital has approximately 1,500 employees.

Read more: Clear Capital launches appraisal tool in response to COVID-19

Chief Information Officer Deepak Sachdeva sets the company’s broader technology agenda, but Allen and his team of 35 technologists enable a key part of that agenda by focusing on data and analytics products, as well as providing much of Clear Capital’s appraisal product data infrastructure.

“We run the gamut of data, data products, and analytics,” Allen said. “If it’s data that touches data, we’re probably working on it. That includes how we ingest data from a variety of sources external and internal, how we merge and clean that data and then how we build insights on top of that data.”

Those insights, he added, can include analytics or machine learning models.

Data dreams

Allen admits he thinks about data a lot. Recently, he said, he dreamt about data quality and how to ensure it.

“I am passionate about data, as is a lot of our team,” Allen said.

While many experts got into the field through data science and machine learning, Allen started out as a chemical engineer. His Ph.D. program and working with friends to test numerical modeling simulation for Major League Baseball helped open his eyes to its enormous potential and how powerful it can be in certain situations.

“If you’re looking at a single row of data, it would be interesting and important, but you may not be able to derive much insight from that,” Allen said. “But when you aggregate that data to a huge level – I’m talking about 150 million residential properties or millions of appraisals – all of a sudden, these patterns appear in the data and you learn new things about the world that you cannot see in that single record.”

Ultimately, data is dynamic, Allen said.

“I find data very interesting… both how you move it, how you use it and how you keep it from fooling you,” Allen said. “Everything about data is pretty fascinating on its own.”

That sentiment wasn’t too common five or 10 years ago, when companies were capturing data during businesses transactions, focused more on how to store it rather than using it for maximum gain. Businesses have become more confident in the power of data in recent years as technology has allowed it to be unlocked, Allen said.

Read next: Clear Capital launches lending-grade automated valuation model

“Data itself has tremendous value and the insights derived from it have tremendous value, and there are a lot of people in the mortgage industry – certainly throughout Clear Capital – that would fully agree with that,” Allen observed.

Data priorities

Data and analytics have long been important at Clear Capital, Allen said. Since he came on board, the company has been building infrastructure to realize and maximize its value.

“As data has become more prominent… you recognize that you need a new way of storing and cataloging the data to view it as a valuable piece of the business as opposed to something that needs to be stored,” Allen said. “It’s about breaking down silos between legacy systems, bringing that data together and being able to collect it – thinking more and more about the quality and cleanliness of the data, being able to observe the data and that sort of thing. That’s really what we’ve been pushing on.”

One favorite project Allen worked on since he arrived is ClearPhoto, a system that extracts information from photo imagery to help the appraisal process. Another is in progress: the creation of a so-called data lake to combine all of Clear Capital’s property data into a single resource area, buttressed with security to enable safe storage and use.

“When you do all of this work, and you get to the end point,” Allen said, “the insights you can derive from that are just immense.”