Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

SQL Interview Questions

Uploaded by

Dharani Dharani
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

SQL Interview Questions

Uploaded by

Dharani Dharani
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

1.

Data Pipeline: Manages the flow of data from collection to storage


destinations like data lakes or warehouses.
2. Database, Schema, Table:
- Database: Stores and manages structured data.
- Schema: Defines the structure and rules of a database.
- Table: Organizes data in rows and columns within a database.
3. ETL vs. ELT:
- ETL: Extract, transform, then load data into a system.
- ELT: Extract, load data into a system, then transform it.
4. Data Lake vs. Data Warehouse vs. Data Mart:
- Data Lake: Stores large volumes of raw data.
- Data Warehouse: Optimized for querying structured data.
- Data Mart: Focused subset of a data warehouse for specific functions.
5. Batch vs. Stream Processing:
- Batch: Processes data in scheduled chunks.
- Stream: Processes data in real-time as it arrives.
6. Data Quality: Ensures data meets standards for its intended use.
7. Data Modeling: Designs data organization for efficient analysis.
8. Data Orchestration: Coordinates data movement and integration across
systems.
9. Data Lineage: Tracks data’s journey and transformations throughout its
lifecycle.
10. Git: Manages code collaboration and tracks changes.

Here are some tricky SQL interview questions!

1. Find the second-highest salary in a table without using LIMIT or TOP.

A: SELECT MAX(salary) FROM table WHERE salary NOT IN (SELECT MAX(salary)


FROM table)

2. Write a SQL query to find all employees who earn more than their managers.

A: SELECT e1.* FROM employees e1 JOIN employees e2 ON e1.manager_id = (link


unavailable) WHERE e1.salary > e2.salary

3. Find the duplicate rows in a table without using GROUP BY.

A: SELECT * FROM table WHERE rowid IN (SELECT rowid FROM table GROUP BY
column HAVING COUNT(*) > 1)

4. Write a SQL query to find the top 10% of earners in a table.

A: SELECT * FROM table WHERE salary > (SELECT PERCENTILE_CONT(0.9) WITHIN


GROUP (ORDER BY salary) FROM table)

5. Find the cumulative sum of a column in a table.

A: SELECT column, SUM(column) OVER (ORDER BY rowid) FROM table

6. Write a SQL query to find all employees who have never taken a leave.

A: SELECT * FROM employees WHERE id NOT IN (SELECT employee_id FROM leaves)

7. Find the difference between the current row and the next row in a table.
A: SELECT *, column - LEAD(column) OVER (ORDER BY rowid) FROM table

8. Write a SQL query to find all departments with more than one employee.

A: SELECT department FROM employees GROUP BY department HAVING COUNT(*)


>1

9. Find the maximum value of a column for each group without using GROUP BY.

A: SELECT MAX(column) FROM table WHERE column NOT IN (SELECT MAX(column)


FROM table GROUP BY group_column)

10. Write a SQL query to find all employees who have taken more than 3 leaves in a
month.

A: SELECT * FROM employees WHERE id IN (SELECT employee_id FROM leaves


GROUP BY employee_id HAVING COUNT

These questions are designed to test your SQL skills, including your ability to write
efficient queries, think creatively, and solve complex problems.

🎯 Data Engineering ≠ Just SQL Queries & ETL Pipelines! 🎯


Data Engineering is a vast field, and expertise grows with depth!

💡 Here's how to elevate your skills and master the real world of Data Engineering:
🔹 1. Data Ingestion & Integration
▪️Batch Processing: Apache Nifi, Airflow, AWS Batch
▪️Real-Time Streaming: Kafka, Kinesis, Pulsar
▪️Data Connectors: Kafka Connect, Debezium, Flume
▪️Message Queues: RabbitMQ, ActiveMQ

🔹 2. Data Transformation & Processing


▪️ETL vs ELT
▪️Frameworks: Apache Spark (RDDs, DataFrames), Apache Beam
▪️Libraries: Pandas, Koalas, Dask
▪️Big Data Tools: PySpark, Scala, Delta Lake, Hudi, Iceberg

🔹 3. Data Storage
▪️Relational Databases: PostgreSQL, MySQL, Amazon RDS
▪️NoSQL: MongoDB, DynamoDB, Cassandra
▪️Data Lakes: S3, GCS, Azure Blob
▪️Columnar Formats: Parquet, ORC, Avro
▪️Distributed Storage: HDFS, Snowflake, BigQuery

🔹 4. Data Modeling
▪️Star Schema vs Snowflake Schema
▪️Denormalization Strategies
▪️Fact & Dimension Tables
▪️Slowly Changing Dimensions (SCD)
▪️OLAP vs OLTP
▪️Schema Evolution
🔹 5. Big Data Frameworks
▪️Core Tools: Apache Hadoop (HDFS, YARN, MapReduce)
▪️Apache Spark (SQL, Streaming, MLlib)
▪️Apache Flink
▪️Elasticsearch
▪️Apache Hive, Impala

🔹 6. Orchestration & Automation


▪️Apache Airflow (DAGs, Operators)
▪️Workflow Automation: Cloud Composer, Step Functions
▪️Cron Jobs & Scheduling

🔹 7. Data Quality & Validation


▪️Data Profiling: Great Expectations, Deequ
▪️Data Lineage: DataHub, Amundsen
▪️Validation: Pytest, TDD for Data
▪️Anomaly Detection, Null Handling, Deduplication

🔹 8. Data Security & Governance


▪️Encryption: At-Rest, In-Transit
▪️Access Control: IAM, ACLs
▪️Compliance: GDPR, HIPAA
▪️Data Masking, Auditing, Monitoring

🔹 9. Cloud & Infrastructure


▪️AWS: S3, EMR, Glue, Redshift, Athena
▪️Google Cloud: BigQuery, Dataflow, Dataproc
▪️Azure: Data Factory, Synapse Analytics

You might also like