A few posts back, we encountered the evolving term big data which describes gigantic mass of data that big business enterprises are eyeing to mine for whatever value can be obtained from the data.
Examples of big data may be found in the unimaginable collection of facts, figures, image/video/multimedia data that the Google search engines have piled up from 1997 to the present, as well as in the staggering amount of personal and related data that Facebook has collected from its more than 1.35 billion registered users worldwide since Mark Zuckerberg established it in 2004. Other organizations have their own sets of big data from their own sources.
The process of big data collection alone is itself is an enormous effort that requires backend support of data centers running on a 24/7 basis the whole year — and the advanced technology packed inside the data centers. With the extremely high cost of collecting big data, it is only natural for the business enterprise involved to recover that cost by making use of the Godzilla-sized data waiting to be tapped in the enterprise’s storage devices. An important step in using big data is data analytics, and this too requires the use of advanced technology.
Fortunately such a technology exists, thanks to hardware/software vendors and open-source software developers who are coming up with more powerful processing capability, increased levels of memory, advances in bandwidth, and highly distributed architectures that measure up to the challenge of big data.
One particular technology that stands out from the many offerings in the market is Apache Hive, which the Apache Software Foundation itself describes as “a data warehouse software (that) facilitates querying and managing large datasets residing in distributed storage“.
Hive does not work alone. It is built on top of — and works with — Apache Hadoop, an open-source software that allows distributed processing of large subsets of big data across clustered computers using simple programming models. Hadoop is designed for scalability; user organizations can start with single server machines and scale up to hundreds or thousands, and each machine is capable of local computation and storage. The Hadoop software library is designed for detecting and handling failures at the application layer. This means highly available service over clustered machines.
Hive has tools to easily extract, transform, and load subsets of big data that are stored in HDFS (Hadoop Distributed File System) or in other compatible storage systems such as Apache HBase. It can impose structure on various data formats, which makes it possible to query it using HiveQL (a query language that resembles SQL). The ability to query, in turn, provides the ability to analyze data and extract value out of it.
Data queries on Hive are done via Hadoop MapReduce, a software framework for easily writing applications which process multi-terabyte data sets in parallel on clusters consisting of thousands of nodes. Sequences of MapReduce programs are produced by a powerful data analysis platform working behind the scenes: Apache Pig. MapReduce and HDFS run in the same set of nodes.
Apache Hive and all the collaborating software need appropriate IT infrastructure to host them. Unless you have the necessary talent in your business, you need to see qualified IT professionals to help you plan infrastructure acquisition and configuration because there will be plenty of technical details to attend to before Apache Hive can make big data analytics a reality in your business.
Yes. Cisco Systems, the dominant giant in the networking equipment industry, migrated to Active Directory. But before we look into the why or how of this story, let’s have a little background on Active Directory and on Cisco.
Active Directory (AD), according to its maker Microsoft, is a special-purpose database designed to handle high-volume data searches and reads but only relatively few changes and updates. Microsoft developed this directory service for Windows domain networks, and wherever you find Windows Server operating systems there you are likely to find AD also. Within the AD database are collections of objects (such as users, groups, computers, printers) and corresponding object attributes, the definitions of which are stored in a data structure called a schema. Before a user is given access to any object, one needs to be authenticated. The authentication process is handled by a domain controller (DC)
AD data can be described as hierarchical, replicated, and extensible.
Hierarchical. A collections of related objects defined by AD are confined within an administrative boundariy called a domain. Information about objects in each domain are often arranged into a hierarchy of parent-child relationships; a parent domain is superior to a child domain, but the child domain can be a parent of its own child domain, and so on.
Replicated. Since AD is a distributed directory service, objects in the directory are distributed across DCs. Changes that are made on one DC are synchronized with those in all other DCs in a systematic way using automatically created connection topology; this synchronization process is called replication.
Extensible. AD uses ESE (Extensible Storage Engine) to store and retrieve information using indexed and sequential storage methods along with transactional processing. (Transactional processing keeps a data transaction open as long as the transaction is in process, so that when an error occurs, the data can be rolled back to its original state prior to processing.)
Cisco Systems pioneered in producing routers that supported multiple network protocols and became a leader in the market. The company rose to prominence with the wide adoption of IP (Internet Protocol) in the wake of the phenomenal growth of the Internet in the early 2000’s. Since then, Cisco maintained its leadership in the industry.
Migration to Active Directory
It has been a Cisco tradition to maintain separate network operating systems and LDAP (Lightweight Directory Access Protocol) directories for each desktop OS and application. However, it soon became difficult to manage numerous user accounts and passwords used to log into diverse systems. There came a point when each system had to have its own trained Administrator, and in-house software developers had to write different code for every directory accessed by their applications. IT costs soared, and on top of this the presence of many different systems made compliance with regulations difficult. Cisco had to find an effective solution.
The solution that Cisco’s IT leaders found was a Microsoft product — Active Directory. AD was a perfect fit for Cisco’s NOS (network operating system) and enterprise data directories. So, the decision to migrate to AD was made, and within a short period AD was deployed in Cisco’s 12 worldwide WAN locations. Now, Cisco employees can authenticate locally through the nearest site regardless of where in the world they happen to be.
Many layers of virtualization are currently at work, and one of them is application virtualization.
Application virtualization is software technology that implements the encapsulation of an application so that it can be isolated from its host OS. The resulting isolation gives rise to operating system independence, the first benefit that can be cited for application virtualization.
Encapsulation is the basic factor that enables an application to run in an artificial, or “virtual”, environment.
There are different ways of implementing a virtualized application, and one that is 100% virtualized is not installed the way users of Windows, Linux and Mac applications know it; the approach is entirely different, but its execution appears to be the same on the surface. To the untrained eye, the runtime behavior of a virtualized application appears to be similar to that of a traditional one — intercommunicating directly with the base OS — but in reality, it is not, because the application can be decoupled from the OS in various degrees.
Two kinds of application virtualization exist: client-side application virtualization and server-side application virtualization.
Client-side application virtualization isolates an application from other applications and from its operating system. When the base OS is updated, application virtualization enables an application that was developed on the old version to continue working on the new one. In a Windows environment, this situation would trigger an error caused by mismatch of software library versions, but not so in a virtual environment. Application isolation is the second benefit of application virtualization.
Server-side application virtualization offers benefits similar to those found in client-side virtualization, but there are more. The former allows multiple instances of an application to be executed automatically on other machines when service-level benchmarks can no longer be met by current the workload — an action that translates to performance speed boost. Server-side virtualization also enables restarting of applications in another server when a failure occurs in one server; the result is enhanced availability, the third benefit can be mentioned for application virtualization.
Application virtualization offers two more advantages aside from those already pointed out, and these are: better performance scalability and cost reduction.
The key to performance boost in server-based application virtualization is the presence of a workload management function, which allows one application to start automatically on multiple systems when the need arises. Built-in workload management also allows more people to access a single application simultaneously.
Cost reduction is achieved through more efficient copying or streaming of encapsulated applications to remote systems. With application virtualization, one can avoid additional costs for provisioning, installing, updating, and administering applications, which would be incurred in non-virtualized environments every time more users need to be supported in order to use the application simultaneously.
Application virtualization has its downsides, too. Here are three examples. First, the possibility exists that the application vendor might withhold support when told that their product is run in a virtual environment. Second, unforeseen problems could occur, specially with Windows applications not designed to run in a virtual environment. Third, applications that install their own system drivers might cause the virtual environment to produce unexpected results.
Weighing the benefits and disadvantages of application virtualization with the help of expert IT professionals can help arrive at a sound decision to use the technology in one’s business.
A user doing daily routine work in front of a virtualized system may not be aware of the fact that there is some “illusion” taking place right before his eyes. He may be running Linux applications one moment and Windows applications the next while using exactly the same machine the whole time, without being conscious that behind the scenes his work is serviced by different operating systems coming from machines other than the one he is working on that particular moment. It’s as if everything is taking place in just the one device his hands are currently busy with. This is a typical scenario encountered while actively engaging with a virtualized system.
The illusion experienced by the user would impossible without a special software entity installed right at the heart of the system: the hypervisor. The hypervisor is the “magician” that makes virtualization work. In reality there is no magic involved — just honest-to-goodness technical work; the hypervisor manages many things: processor, memory, resource allocation, multiple instances of a single operating system, even multiple operating systems running simultaneously.
There are two types of hypervisors: native or bare-metal hypervisors (a.k.a. Type 1 hypervisors) and hosted hypervisors (otherwise called Type 2 hypervisors).
The first type controls the hardware that hosts it and manages the operating systems that the hardware accommodates as guests. If one were to draw two rectangles stacked one over the other, the bottom layer represents the host machine (server hardware), while the one directly on top of it is the hypervisor. Adding three smaller rectangles to sit on an upper layer above the hypervisor would be comparable to connecting three virtual machines to the underlying hardware through the hypervisor. A hypervisor set up this way is Type 1 (see image at the top of this blog post).
The second type provides virtualized input/output devices and memory management from the operating system that hosts it. If one were to add a new rectangle to the stack described in the previous paragraph and place it between the server hardware and the hypervisor layers, the new rectangle layer would represent the operating system providing virtual services to the virtual machines at the topmost layer through the hypervisor. This is a Type 2 hypervisor setup, and a graphical illustration of it is presented below.
The basic difference between a Type I and Type II hypervisor, as the illustrations show, lies on where the hypervisor runs from. Type 1 runs directly on the server hardware while Type 2 runs on a host operating system. However, there is apparently no formal definition of Type 1 and Type 2 hypervisors based on established standards.
XenServer (from Citrix) and ESXi (from VMWare) are typical examples of Type 1 hypervisors, while VMWare Server, Microsoft VirtualPC, and Sun VirtualBox are those of Type 2.
It never fails. For every gain in technology there is always a matching specter of loss close by. For every capability, a hindrance. For every moment of celebration, a cause for regret.
There are currently spectacular advances in IT brought about by such technologies as virtualization and cloud computing. But together with them comes the potential, if not reality, of big security risks. Take the cloud, for example, and its accessibility to ubiquitous mobile devices such as tablets and smartphones.
The modern generation of workers love to carry their mobiles around and use them at work. Because these devices are designed to communicate easily with the cloud, a modern phenomenon is born: bring your own cloud (BYOC). BYOC is apparently a kind of extended version of a predecessor issue, bring your own device (BYOD). The worries created by BYOD centered on the carrying of laptops and thumb drives to work, saving company data in them, and return home with the saved data. The concerns on BYOC focalize on the power of mobile devices to remotely access and save business data in many ways with the help of cloud technology.
Unfortunately, BYOC has brought about polarization between Employee Productivity adherents and several mobile device owners in one “camp”, and business enterprises and IT security experts in another. The former support BYOC because they perceive its advantages to the workforce, while the latter reject it because of its serious security implications to business and personnel.
For pro-BYOC supporters, they see these bright spots: faster and more convenient way of conducting business, cost reduction, increase in employee productivity, stimulaton of independent thought/innovation through invigoration of coding and design, avoidance of time waste and exhaustion brought about by having to use duplicate tracking and development tools using pre-BYOC practices, and ability of personnel to use tools that support innovation which brings personal satisfaction.
For anti-BYOC supporters, they dread the dark spots. BYOC, they argue, poses a threat to system stability; it also blurs the demarcation between personal and business computing, thus introducing complications to corporate governance, risk management, and compliance with regulations mandated by government and industry. Security teams face potential or actual difficulty finding balance between enabling end users and maintaining compliance and security best practices.
Each of the opposing camps in the BYOC showdown has a good reason for adopting a particular view. The debate may go on indefinitely, but in the meantime the cloud is there waiting to be tapped through mobile devices. Just like Mother Nature’s cloud, the IT cloud can be bright or foreboding depending on the surrounding atmosphere. It’s up to the opposing parties to come together and think of ways to reap the cloud’s benefits to the maximum without running into occasions for regret.
At the expense of legacy data centers focused on applications, many IT organizations are joining the recent bandwagon headed for virtualization and enticing cloud offerings like ITaaS (IT as a service) that have taken center stage in current technology infrastructures. As a result, these organizations are facing new storage-related challenges brought about by server virtualization, unpredictable heavy workloads, and large-scale consolidation of IT hardware inherent in the new technologies.
The trend toward explosive growth that has replaced stable growth patterns in former computing environments is itself another challenge, and it calls for corresponding update of IT infrastructure, including storage. The slower pace of change that used to characterize IT not too long ago is gone. The speed at which change is now taking place has made it difficult for IT leaders to forecast service levels, to make buy-versus-repurpose decisions, to anticipate the effects of new applications on the response times of older ones, and to determine the organization’s ability to migrate applications as rapidly as they grow.
When it comes to storage, legacy infrastructures have disconnected platforms resulting in interoperability issues and they are too complex. The storage environments of some companies are fragmented, resulting in numerous problems and inefficiencies. In addition, many existing storage systems are practically isolated into silos and are excessively rigid. To meet the exponential demands of new virtualization and cloud technologies, storage systems need to be modernized.
Hewlett-Packard has a solution package for this, HP Converged Storage, which can be ideal for the modern environments where virtualization and cloud services have dominance. This package includes a product called HP 3PAR StoreServ built on modern flash-optimized storage architecture. HP says that this solution “delivers sustainable performance for diverse and unpredictable workloads that scales even with extremely high levels of capacity utilization.”
Another product included in the package is HP StoreFabricStorage which is designed to meet the needs of networking infrastructure. Offering FC (Fibre Channel) connectivity between servers and storage, this product complements HP 3PAR StoreServ.
If you are planning to modernize your storage using the HP solution package described here or an alternative product, in preparation for migration into virtualization and the cloud, you can always seek the assistance of Key4ce IT Professionals for charting your direction.
If your business success relies much on networked IT resources, it’s good to consider virtualization. With virtualization you can abstract applications and as well as their related components away from the supporting hardware, and you can get a virtual (logical) view of these resources. Almost always, the logical view is very much different from the physical, and it is constructed from surplus network bandwidth, memory, storage, or processing power.
Virtualization allows you to view many computers as a single computing resource, or a single machine as many individual computers. From the storage angle, virtualization can make one large storage asset look like many smaller ones, or the other way around — make many small storage devices appear to be one single device.
What end results can your business expect from virtualization? It can be reliability and high availability of your IT resources. It can also be scalability, agility, or unified security and management. Or the result can be dramatic improvement of overall system performance.
Virtualization has many facets, and knowing what these are can help you decide how you might implement virtualization in your unique situation. The different aspects of virtualization are: network virtualization, storage virtualization, application virtualization, access virtualization, and processing virtualization.
Network virtualization results in a view of the network (from the user’s perspective) that is different from the physical view and because of this, the user’s computer may only see the systems the administrator allows him to access. This type of virtualization makes several links appear to be a single link in the network.
In storage virtualization, your hardware and software technology can hide the real location of your storage systems and the device you are actually using to store applications and data. This aspect of virtualization enables other systems to transparently share the same storage devices.
Application virtualization. allows application software to run on different operating systems and hardware platforms. More advanced implementations of this technology can restart an application in case of unplanned outage, spawn another instance of an application that falls below service-level expectations, or balance the workloads of multiple instances of an application.
When you use access virtualization, any authorized device can access any application without the former having to know too much about the latter, or vice versa. The application sees a device it has become familiar with, while the device sees an application it is capable of displaying. With the use of specialized hardware, users may be allowed to share a single client system, or a single user may be allowed to see displays from multiple systems.
Processing virtualization makes it possible to hide physical hardware configuration from system services, applications, or operating systems. It can make one system seem like many, or many systems look like a single resource. With processing virtualization your business can achieve a number of goals: raw performance, agility, high scalability, or system reliability/availability. The technology further allows your business to consolidate multiple systems into one.
Your choice of virtualization technology depends upon what you are trying to accomplish with your networked system. You can explore your virtualization needs more deeply with the help of Key4ce IT professionals.
These days there’s much talk in IT circles about big data. No, it’s not a new phenomenon taking the world by storm. Neither is it about data having morphed into something so huge that the earth may no longer be enough to contain it. Rather, it’s more of super heightened interest in gigatons data that are being migrated from traditional physical storage devices into cloud computing (and storage) facilities.
At what point does data become “big data”? It’s not clear yet. One article I read on the Web says that big data is a term still in the process of “evolving”, and that the term is used to describe an extraordinary quantity of structured, not-so-structured, and entirely unstructured data that can potentially be mined for precious information because these are not just any data — they are enterprise data. Big data is said to have these characteristics: volume (an exceedingly large one), variety of data types (SQL/MySQL/NoSQL/XML datasets, multimedia, SMS, photos, plain text, etc.), and velocity at which processing of such data takes place.
Does a terabyte (a trillion bytes) of data qualify for big data? Not quite. Big data is in the magnitude of petabytes (quadrillions) and exabytes (quintillions). Now that is really big, I should say.
For years, much of the world’s data have been stashed away in relational databases, which are designed according to an established “schema” and therefore highly structured. But lately there has been a rapid deluge of data that follow either a “schema on the fly” architecture or no schema at all (structureless) and these contribute to the formation of a massive data minefield.
Business enterprises have to mine the data to extract value out of them, and doing this is not only exceedingly costly but also inordinately time consuming. Novel ways of storage and analysis that don’t rely much on data quality or RDBMS (relational database management systems) need to be explored. Some the new ways being considered are: combining extended metadata with unprocessed data in a huge data pool, teaching the computer system to learn the properties of the data being processed, and using artificial intelligence software to find and analyze repeatable data patterns. Big players (the ones with the big data) hope to find the solution in the much touted cloud platform.
Whatever the platform to be used in dealing with big data, there is going to be a corresponding increase in demand for highly skilled computer/systems engineers to deploy the required infrastructure and ensure that they work as expected. In addition, there will be a need for more data scientists to do the “dirty work” of extracting data at the minefield and convert them into profitable enterprise information.
Yes, you read it right. Whaling. And it’s different from whale fishing, the controversial business of catching whales for profit taking place in certain parts of the world.
And speaking of fishing, this brings to mind the homophonic term phishing that has become extremely popular — no, notorious — throughout the world. Phishing is a type of computer or network security threat that falls under the larger category of threat called social engineering. You can read more about social engineering and phishing in our KnowledgeBase article Security Measures vs Phishing.
Whaling is a security threat that works almost like phishing, except that: in phishing the attack is directed at thousands of users (like a single fishing line thrown into a school fish) with the hope that a number of them will become actual victims instead of just candidate victims; in contrast, a whaling attack is targeted at one high profile user (the big “fish” or whale) who is key to the big secrets under his/her control. The modus operadi is the same, only the count and profile of the targets vary.
Here’s one whaling scenario. John Doe is the CEO of ImaginaryConglomerate, an organization that pays him millions of dollars in salaries and perks. His profile is published in the organization’s website. Joe Scammer, a whaling expert, learns from the ImaginaryConglomerate website that John Doe loves to play golf. Mr. Scammer also learns from other sources that John is not that good in golf and often gets to pay for the beer after games with buddies at the club. From these bits of information, Mr. Scammer crafts a clever whaling message and sends it to Mr. Doe’s computer. For the sake of simplicity, let’s just say the message is worded like this: “Impress the ladies and gentlemen in the fairway with your winning drives and putts. This jealously guarded ebook ‘Golf Secrets PGA Champs Won’t Tell You’ is available to a lucky few only — and for a very limited time … so limited that the offer expires within the next 12 hours. Click this link now.“
Not all people fall for this type of message. But who knows John Doe just might. After all, his being top executive at ImaginaryConglomerate has not helped him mend his shattered ego at the golf club. But unknown to him, clicking the link redirects to Mr. Scammer’s website and will ask him for his credit card number (along with other juicy bits of sensitive information) in exchange for a copy of the jealously guarded ebook. Or worse… clicking the link might actually activate malware that has surreptitiously slipped into his computer and compromise the security of his company’s entire system because of his high-level access privileges.
John Doe has just been socially engineered through a technique called whaling.
To avoid falling prey to whaling, it helps to educate high profile users like John Doe on the existence of whaling and other social engineering attacks. It helps even more to have regular security audits on executive’s computers to make sure they are free of malware and other threats that could be triggered by actions suggested in a whaling message.
The first paragraph of this post says that whaling is different from the controversial business of catching whales for profit. On second thought, the two may in fact be similar.
Distributed denial of service (DDoS) is a type security threat wherein one individual or group — the attacker — intentionally and maliciously lets loose extremely high volumes of Internet traffic into the computer network resources of another — the victim — in order to paralyze those resources either by slowing down their performance or halting their operation altogether. The operative phrase is extremely high volumes.
When a network receives traffic volume that’s beyond its capacity to handle, at least one vital part of it if not the entire network itself, is bound to get choked and will no longer be able to perform network services requested by legitimate clients. We can compare the situation to a highway that gets maliciously swamped with thousands of motor vehicles at a particular hour of the day when it is designed to serve regular traffic of only a few hundred in the same time frame. The ensuing traffic jam denies the highway the ability to perform its service of efficiently transporting people, goods, or services from one geographical point to another.
DDoS attacks use multiple networked computers organized into malware clients called botnets or “zombies” that are controlled by servers acting as command centers. This is the “distributed” part of DDoS.
Specific physical targets of DDoS attacks include web services, applications, and firewalls. The victims are usually organizations that are business, political, social or ideological competitors of the attackers.
What makes DDoS attacks particularly troublesome is that there are many categories of them. There is, for instance, the simple attack which floods the target with nuisance traffic (often disguised as legitimate traffic) using a large number of botnets aimed at the weakest network link. The overwhelming presence of unwelcome traffic prevent legitimate traffic from availing of services of the system under attack. Other categories of attack are DNS (Domain Name System) attacks and HTTP (HyperText Transport Protocol) attacks, both of which have their own variations.
When the target of DDoS attacks are commercial establishments, there is almost always a financial loss. Surveys on the effects of such attacks suggest that losses could range from $10,000, to $50,000 to $100,000 per hour of network downtime depending on the particular type of business. Duration of attacks range from 24 hours or more, to days, to weeks. All these figures indicate very plainly that DDoS attacks can hurt the pockets of business enterprises in a big way and, consequently, the national economy.
But financial drain is not the only worry that confronts victims. There is also serious disruption of customer service and damage to brand reputation.
Can DDoS attacks be banished from the land and save victims from untold worries?
Network security experts say that there is no way DDoS can be eliminated. They can only be mitigated. This means that financial losses from DDoS attacks are bound to be incurred and the best that businesses could do is control the damage.
To guard against DDoS, organizations that rely heavily on network services should fully understand their present strengths and weaknesses as far network security is concerned. For best results they can partner with a DDoS protection specialist, or alternatively with IT specialists who have a very good handle on security.