Data Cleaning

by Venkatesh Ganti, Anish Das Sarma

Estimated delivery 3-12 business days

Format Paperback

Condition Brand New

Description Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.

Publisher Description

Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.

Author Biography

Venky Ganti is the co-founder and CTO of Alation Inc, where he is developing technology to effectively search, understand, and analyze structured and semi-structured data. Prior to Alation, he was a member of the Google Adwords engineering team for a few years. He helped develop the Dynamic Search Ads (DSA) product, whose goal is to completely automate the configuration and maintenance of AdWords campaigns based on an advertiser's website and a few configuration parameters. e main technical challenge is to mine for appropriate keywords and automatically create high quality ads which match the accuracy and quality of manually configured campaigns. Prior to Google, Venky was a senior researcher at Microsoft Research (MSR). While at MSR, he worked extensively on data cleaning and integration technologies. Some of the technologies he helped develop in this context are now part of Microsoft SQL Server Integration Services, the ETL platform of Microsoft SQL Server. He also worked on leveraging rich structured databases on products, movies, people, etc., to enrich user experience for web search. Some of the tech nologies he helped develop are now part of the Bing product search. He has a Ph.D. in database systems and data mining from the University of Wisconsin-Madison. Anish Das Sarma is currently a Senior Research Scientist at Google (since May 2010), before which he was a Research Scientist at Yahoo (August 2009–April 2010). Prior to joining Yahoo research, Anish did his Ph.D. in Computer Science at Stanford University, advised by Prof. Jen nifer Widom. Anish received a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology (IIT) Bombay in 2004, and an M.S. in Computer Science from Stan ford University in 2006. Anish is a recipient of the Microsoft Graduate Fellowship, a Stanford University School of Engineering fellowship, and the IIT-Bombay Dr. Shankar Dayal Sharma Gold Medal. Anish has written over 40 technical papers, filed over 10 patents, is associate edi tor of Sigmod Record, has served on the thesis committee of a Stanford Ph.D. student, and has served on numerous program committees. Two SIGMOD and one VLDB paper co-authored by Anish were selected among the best papers of the conference, with invitations to journals. While at Stanford, Anish co-founded Shout Velocity, a social tweet ranking system that was named a top-50 fbFund Finalist for most promising upcoming start-up ideas

Details

  • ISBN 3031007697
  • ISBN-13 9783031007699
  • Title Data Cleaning
  • Author Venkatesh Ganti, Anish Das Sarma
  • Format Paperback
  • Year 2013
  • Pages 69
  • Publisher Springer International Publishing AG
GE_Item_ID:158850993;

About Us

Grand Eagle Retail is the ideal place for all your shopping needs! With fast shipping, low prices, friendly service and over 1,000,000 in stock items - you're bound to find what you want, at a price you'll love!

Shipping & Delivery Times

Shipping is FREE to any address in USA.

Please view eBay estimated delivery times at the top of the listing. Deliveries are made by either USPS or Courier. We are unable to deliver faster than stated.

International deliveries will take 1-6 weeks.

NOTE: We are unable to offer combined shipping for multiple items purchased. This is because our items are shipped from different locations.

Returns

If you wish to return an item, please consult our Returns Policy as below:

Please contact Customer Services and request "Return Authorisation" before you send your item back to us. Unauthorised returns will not be accepted.

Returns must be postmarked within 4 business days of authorisation and must be in resellable condition.

Returns are shipped at the customer's risk. We cannot take responsibility for items which are lost or damaged in transit.

For purchases where a shipping charge was paid, there will be no refund of the original shipping charge.

Additional Questions

If you have any questions please feel free to Contact Us.