Technology News

latest updates from easySERVICE™

Making sense of “big data” from identity management

identity theft

Providing employees with access to applications and information is a complex operational challenge. Users require broad and varied access to be productive, but that incurs risk. IT must control access, enforcing the principle of “least privilege” in the face of compliance regulations and the threat of security breaches.

Do it right and business runs efficiently with risks understood, mitigated, and rewarded. Do it wrong and catastrophe looms.

To understand business risk effectively, you must have visibility into the access approved, access granted (which may be different than what was approved), the resources and data behind the access granted, and how access is being used. Years ago this was less complex: Employee and customer data lived in the data center, was accessed during work hours, and was less heavily regulated and audited.

Today, data resides not only in the data center but also in mobile devices and the cloud. It’s also regulated, audited, and available to many more audiences than just your employees. Here’s one way to break down the problem:

More and different types of identities. In the past, IAM (identity and access management) was primarily concerned with workers. Now contractors, suppliers, customers, partners, affiliates, and even devices have identities.

Data explosion. We’re generating and archiving more data than ever before. Recent coverage of the NSA’s data analysis efforts reveal just how much data we generate as a nation: 1.8 petabytes daily!

Flexible access. In the past, access was largely consolidated in a data center, but then came desktops, then laptops, then mobile and cloud. Today, users expect access anywhere, everywhere, all the time.

Need for speed. The United States is no longer the only “I want it now!” society. Every globally competitive company is keenly aware of the need to provide access and information immediately, whether to a shop floor employee or to a customer who needs current order status.

Increased security expectations. In the past, security was considered a specialized area, but today, government and industry regulators, auditors, board members, media, and consumers are expected to know the ropes. Increasingly, CISOs are calling for staff to flag new risks as they arise.

Logging everything
What does this all mean to a CISO who is concerned with providing only the right access to the right people at the right time? A whole lot of information about a rapidly expanding universe of electronic identities and their context. At Courion, we call this “big identity data.”

By way of example, consider a hypothetical 10,000-employee company:

  • 10,000 users with access to 10 applications results in 100,000 accounts
  • Logging in to these applications at least twice per day yields 200,000 login activity records per day
  • Keeping a data store of one month of activity creates a total of 4 million login activity records

Now let’s consider how worker interaction with files and folders enters the equation:

  • 10,000 workers accessing 50 data assets per day creates 500,000 activity records per day
  • Distributed over an eight-hour workday, this results in 62,500 activity records per hour or 1,031 per minute (or 17 per second)
  • Keeping a data store of one month of activity creates a total of 10 million unstructured data activity records

Just think: 14 million data elements, and that’s the tip of the identity and access data iceberg! One might contend that 4 million (or 10 million) records in the examples above are not really indicative of “big data” per se. That’s true. We used simple, conservative numbers to show how things grow.

In a real-world environment, data accumulates much faster. The two previous examples only talked about data for people, applications (accounts), and activity (logins and file shares). Every business has many more applications that are important, and each of those has a wealth of data to be collected and analyzed. Data must be collected regarding access, roles, inheritance, permissions, assignment, denial — and for such key systems such as financials, HR, CRM, databases, email, SharePoint, and so on. We also need to collect activity for those systems — more than just logon events. The growth in data collected is very rapid.

Next, we need to prepare the data for analysis, using ETL (extract, transform, load).

Extract: Pull some or all data from a multitude of sources that have information about identities, accounts, rights, activities, and resources. Expect the data to exist in various repositories, with different storage formats and data representations, each with their own security challenges. Anticipate needing to employ different techniques and technologies to connect to and extract the needed data. Most systems have data available that help answer the who, what, where, when, why, and how. For example, an HCIS (health care information system) typically has information about the following:

  • Accounts for workers, clinicians, researchers, affiliates (Dr. Smith)
  • Rights assigned to the accounts (Dr. Smith can schedule appointments, dispense medication)
  • Resources accessible via the assigned rights (schedules for Dr. Smith’s team of clinicians and records for Dr. Smith’s patients)
  • Activity done within the HCIS (Dr. Smith logged in and viewed the records of patient X)

The extraction phase may be performed in a batch/bulk manner, or it may be conducted real time, where data is extracted as it changes.

Transform: Next the data must be converted and normalized to get it into an understandable format. A simple example is date and time: Data may show 9:00 a.m., but what time zone? Is it Daylight Savings Time or Standard Time? Does all the data conform to the same level of granularity in minutes, seconds, or microseconds? Typically you resort to transforming all data to Greenwich Mean Time (GMT). The time stamp format for logon events may vary with each system extract and needs to be converted to a consistent format for analysis.

Many other data transformations may be done to prepare the extracted data for storage and analysis. The data may need to be augmented from another repository, split into new data elements, validated against other repositories, or changed to a new value. Here’s how a ZIP code might be transformed:

  • ZIP codes may be either the five-number format or ZIP+4
  • Split ZIP+4 records into two fields
  • Data without a ZIP code or with alpha characters should be discarded
  • Verify that the numeric five-digit ZIP code is valid, verify ZIP+4 if present
  • ZIP code lookup populates and corrects city and state information

Load: The last step is to store the transformed data in a repository for analysis and determine what data is overwritten and what data is changed. For example, is the “load” data authoritative, or is the data already present in the repository authoritative? Expect to collect a large amount of data and then, depending on your data retention policy, add it to an already large data set. When activity data is collected, not only is it likely to be large, but it may also be arrive quickly (in real time).

Furthermore, the need to do forensics often drives the need to retain detailed records, resulting in larger data sets. Expect that your disk storage needs may increase based on the size of your organization and your forensics needs.

To get answers: Analyze, relate, infer, and visualize
With the data normalized and loaded, it’s ready to analyze. The analysis itself generates new data in the form of facts, relationships, indicators, trends, and inferences.

Multidimensional analysis reorganizes the data and provides new ways to pivot, view, and analyze. IAI (identity and access intelligence) analytics solutions are specifically tailored to provide analysis and visualization specific to IAM, making the connections between identities, the access assigned, permissions, and ultimately the resulting access that a person has to a given resource.

Customizing IAI to reveal business risk
Already we can see that by aggregating disparate data types and looking at the context or relationship between those elements, we are revealing new information. Next let’s look at how we can highlight information and uncover knowledge rather than just showing data in a static report.

Source: Associated Press


One comment on “Making sense of “big data” from identity management

  1. StellarPhoenixS
    September 17, 2014

    Reblogged this on Stellar Phoenix Solutions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: