The three main challenges I see are:
- SSD and PCI Flash memory with "zero" seek time,
- affordable Petabyte HDD storage, and
- object-based storage replacing "direct attach" devices.
- single record access time is no longer dominated by disk rotations, old 'optimisations' are large, costly, slow and irrelevant,
- the whole "write region" can be held in fast memory changing cache requirements and design,
- Petabtye storage allows "never delete" of datasets which pose new problems:
- how does old and new data get physically organised?
- what logical representations can be used to reduce queries to minimal collections?
- how does the one datastore support conflicting use types? [real-time transactions vs data wharehouse]
- How are changed Data Dictionaries supported?
- common DB formats are necessary as the lifetime of data will cover multiple products and their versions.
- Filesystems and Databases have to use the same primitives and use common tools for backups, snapshots and archives.
- As do higher order functions/facilities:
- compression, de-duplication, transparent provisioning, Access Control and Encryption
- Data Durability and Reliability [RAID + geo-replication]
- How is security managed over time with unchanging datasets?
- How is Performance Analysis and 'Tuning' performed?
- Can Petabyte datasets be restored or migrated at all?
- DB's must continue running without loss or performance degradation as the underlying storage and compute elements are changed or re-arragned.
- How is expired data 'cleaned' whilst respecting/enforcing any legal caveats or injunctions?
- What data are new Applications tested against?
- Just a subset of "full production"? [doesn't allow Sizing or Performance Testing]
- Testing and Developing against "live production" data is extremely unwise [unintended changes/damage] or a massive security hole. But when there's One DB, what to do?
- What does DB roll-back and recovery mean now? What actions should be expected?
- Is "roll-back" or reversion allowable or supportable in this new world?
- Can data really be deleted in a "never delete" dataset?
- Is the Accounting notion of "journal entries" necessary?
- What happens when logical inconsistencies appear in geo-diverse DB copies?
- can they be detected?
- can they ever be resolved?
- How do these never-delete DB's interface or support corporate Document and Knowledge Management systems?
- Should summarises ever be made and stored automatically under the many privacy and legal data-retention laws, regulations and policies around?
- How are conflicting multi-jurisdiction issues resolved for datasets with wide geo-coverage?
- How are organisation mergers accomplished?
- Who owns what data when an organisation is de-merged?
- Who is responsible for curating important data when an organisation disbands?
Redesign and adaption is needed at three levels:
- Logical Data layout, query language and Application interface.
- Physical to Logical mapping and supporting DB engines.
- Systems Configuration, Operations and Admin.
They also have to embrace the integration of multiple disparate data sources/streams as laid out in the solution Jerry Gregoire created for Dell in 1999 with his "G2 Strategy":
- Everything should be scalable through the addition of servers.
- Principle application interface should be a web browser.
- Key programming with Java or Active X type languages.
- Message brokers used for application interfacing.
- Technology selection on an application by application basis .
- Databases should be interchangeable.
- Extend the life of legacy systems by wrapping them in a new interface.
- Utilize "off the shelf systems" where appropriate.
- In house development should rely on object based technology - new applications should be made up of proven object puzzle pieces.
No comments:
Post a Comment