Data Science Technologies: An Overview
Not sure where to start?
Python is one of the world’s most popular programming languages. It is production-ready, meaning it has the capacity to be a single tool that integrates with every part of your workflow. So whether you want to build a web application or a machine learning model, Python can get you there!
- General-purpose programming language (can be used to make anything)
- Widely considered one of the accessible programming languages to read and learn
- The language of choice for cutting edge machine learning and AI applications
- Commonly used for putting models “in production”
- Has high ease of deployment and reproducibility
R has been used primarily in academics and research, but in recent years, enterprise usage has rapidly expanded. Built specifically for working with data, R provides an intuitive interface to the most advanced statistical methods available today.
- Built specifically for data analysis and visualization
- Traditionally used by statisticians and academic researchers
- The language of choice for cutting edge statistics
- A vast collection of community-contributed packages
- Rapid prototyping of data-driven apps and dashboards
Much of the world’s raw data lives in organized collections of tables called relational databases. Data analysts and data scientists must know how to wrangle and extract data from these databases using SQL.
- Useful for every organization that stores information in databases
- One of the most in-demand skills in business
- Used to access, query, and extract structured data which has been organized into a formatted repository, e.g., a database
- Its scope includes data query, data manipulation, data definition, and data access control
Data scientists, analysts, and engineers must constantly interact with databases, which can store a vast amount of information in tables without slowing down performance. You can use SQL to query data from databases and model different phenomena in your data and the relationships between them.
MICROSOFT SQL SERVER
- Commercial relational database management system (RDBMS), built and maintained by Microsoft
- Available on Windows and Linux operating systems
- Free and open-source RDBMS, maintained by PostgreSQL Global Development Group and its community
- The most popular RDBMS, used by 97% of Fortune 100 companies
- Requires knowledge of PL/SQL, an extension of SQL, to access and query data
Spreadsheets are used across the business world to transform mountains of raw data into clear insights by organizing, analyzing, and storing data in tables. Microsoft Excel and Google Sheets are the most popular spreadsheet software, with a flexible structure that allows data to be entered in cells of a table.
- Free for users
- Allows collaboration between users via link sharing and permissions
- Statistical analysis and visualization must be done manually
- Requires a paid license
- Not as favorable as Google Sheets for collaboration
- Contains built-in functions for statistical analysis and visualization
Business intelligence tools
Business intelligence (BI) tools make data discovery accessible for all skill levels—not just advanced analytics professionals. They are one of the simplest ways to work with data, providing the tools to collect data in one place, gain insight into what will move the needle, forecast outcomes, and much more.
Tableau is a data visualization software that is like a supercharged Microsoft Excel. Its user-friendly drag-and-drop functionality makes it simple for anyone to access, analyze, and create highly impactful data visualizations.
- A widely used business intelligence (BI) and analytics software trusted by companies like Amazon, Experian, and Unilever
- User-friendly drag-and-drop functionality
- Supports multiple data sources including Microsoft Excel, Oracle, Microsoft SQL, Google Analytics, and SalesForce
MICROSOFT POWER BI
Microsoft Power BI allows users to connect and transform raw data, add calculated columns and measures, create simple visualizations, and combine them to create interactive reports.
- Web-based tool that provides real-time data access
- User-friendly drag-and-drop functionality
- Leverages existing Microsoft systems like Azure, SQL, and Excel
Shell provides command-line interface which allows you to control your computer’s operating system with just a few keystrokes. Sometimes called “the universal glue of programming,” it helps users combine existing programs in new ways, automate repetitive tasks, and run programs on clusters and clouds that may be halfway around the world.
Scala is a hybrid object-oriented and functional programming language popular for large-scale applications and data engineering infrastructure. Favored by companies like Netflix, Airbnb, and Morgan Stanley, Scala improves productivity, application scalability, and reliability.
Version control is one of the power tools of programming. It allows you to keep track of what you did when, undo any changes you decide you don’t want, and collaborate at scale with other people. Git is a modern version control tool that is very popular with data scientists and software developers and allows you to get more done in less time and with less pain.