At a time when Herbt Sutter announced to everyone that the free lunch is over (The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software), concurrency has become our everyday life.A big change is coming to Java, the Loom project and with it such new terms as "virtual thread", "continuations" and "structured concurrency". If you've been wondering what they will change in our daily work or
whether it's worth rewriting your Tomcat-based application to super-efficient reactive Netty,or whether to wait for Project Loom? This presentation is for you.
I will talk about the Loom project and the new possibilities related to virtual wattles and "structured concurrency". I will tell you how it works and what can be achieved and the impact on performance
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
The document discusses high performance networking and summarizes a presentation about improving network performance. It describes drawbacks of the current Linux network stack, including kernel overhead and data copying. It then discusses approaches like DPDK and RDMA that can help improve performance by reducing overhead and enabling zero-copy data transfers. A case study is presented on using RDMA to improve TensorFlow performance by eliminating unnecessary data copies between devices.
Slides for a college course at City College San Francisco. Based on "The Shellcoder's Handbook: Discovering and Exploiting Security Holes ", by Chris Anley, John Heasman, Felix Lindner, Gerardo Richarte; ASIN: B004P5O38Q.
Instructor: Sam Bowne
Class website: https://samsclass.info/127/127_S17.shtml
This presentation provides an overview of Apache CloudStack, an open source cloud computing platform. It discusses CloudStack's history and licensing, its ability to provide infrastructure as a service across multiple hypervisors, and how it enables multi-tenancy, high availability, scalability, and resource allocation. Key CloudStack components and concepts are also summarized, such as networking models, security groups, primary and secondary storage, usage tracking, and its management architecture.
This document provides an introduction to Node.js, a framework for building scalable server-side applications with asynchronous JavaScript. It discusses what Node.js is, how it uses non-blocking I/O and events to avoid wasting CPU cycles, and how external Node modules help create a full JavaScript stack. Examples are given of using Node modules like Express for building RESTful APIs and Socket.IO for implementing real-time features like chat. Best practices, limitations, debugging techniques and references are also covered.
KSQL is an open-source streaming SQL engine for Apache Kafka. It allows users to easily interact with and analyze streaming data in Kafka using SQL-like queries. KSQL builds upon Kafka Streams to provide stream processing capabilities with exactly-once processing semantics. It aims to expand access to stream processing beyond coding by providing an interactive SQL interface for tasks like streaming ETL, anomaly detection, real-time monitoring, and simple topic transformations. KSQL can be run in standalone, client-server, or application deployment modes.
This document contains teaching material on distributed systems operating systems from the book "Distributed Systems: Concepts and Design". It discusses key concepts around processes, threads, communication, and operating system architecture to support distributed applications and middleware. The material is made available for teaching purposes and cannot be used without permission.
The document discusses just-in-time (JIT) compilers in the Java Virtual Machine (JVM). It describes how JIT compilers work by compiling bytecode to native machine code during execution based on profiling information. This allows for optimizations like inlining, devirtualization, loop unrolling and eliding unnecessary synchronization that improve performance. The JIT compiler uses feedback from profiling to enable more aggressive optimizations like these.
NUSE is a library implementation of a network stack in userspace that allows new protocols and implementations to be added more quickly without modifying the kernel. It works by hijacking system calls related to networking at the library level, running the network stack code in a separate execution context using lightweight virtualization, and connecting to the network interface using options like raw sockets, DPDK, or netmap. This approach avoids the slow evolution of making kernel changes and allows network stacks and applications to be updated and deployed more flexibly on a per-application basis.
Scaling up java applications on windowsJuarez Junior
This document discusses techniques for scaling up Java applications on Windows servers with 8 or more CPUs. It covers cache invalidation issues with multiple threads accessing shared data, setting process and thread affinity to contain threads to certain CPUs, sizing the Java heap and young generation appropriately, and using thread-local allocation blocks. The key points are that these tuning techniques can boost performance without rewriting the application code by improving data locality, reducing cache invalidations, and improving garbage collection behavior.
mTCP enables high-performance userspace TCP/IP stacks by bypassing the kernel and reducing system call overhead. It was shown to achieve up to 25x higher throughput than Linux for short flows. The document discusses porting the iperf benchmark to use mTCP, which required only minor changes. Performance tests found that mTCP-ified iperf achieved similar throughput as Linux iperf for different packet sizes, demonstrating mTCP's ability to easily accelerate networking applications with minimal changes. The author concludes mTCP is a simple and effective way to improve TCP performance but notes that for full-featured stacks, a system like NUSE may be preferable as it can provide the high performance of userspace stacks while supporting the full functionality of kernel
Caches are used in many layers of applications that we develop today, holding data inside or outside of your runtime environment, or even distributed across multiple platforms in data fabrics. However, considerable performance gains can often be realized by configuring the deployment platform/environment and coding your application to take advantage of the properties of CPU caches.
In this talk, we will explore what CPU caches are, how they work and how to measure your JVM-based application data usage to utilize them for maximum efficiency. We will discuss the future of CPU caches in a many-core world, as well as advancements that will soon arrive such as HP's Memristor.
Performance van Java 8 en verder - Jeroen BorgersNLJUG
We weten allemaal dat de grootste verbetering die Java 8 brengt de ondersteuning voor lambda-expressies is. Dit introduceert functioneel programmeren in Java. Door het toevoegen van de Stream API wordt deze verbetering nog groter: iteratie kan nu intern worden afgehandeld door een bibliotheek, je kunt daarmee nu het beginsel "Tell, don’t ask" toepassen op collecties. Je kunt gewoon vertellen dat er een ??functie uitgevoerd moet worden op je verzameling, of vertellen dat dat parallel, door meerdere cores moet gebeuren. Maar wat betekent dit voor de prestaties van onze Java-toepassingen? Kunnen we nu meteen volledig al onze CPU-cores benutten om betere responstijden te krijgen? Hoe werken filter / map / reduce en parallele streams precies intern? Hoe wordt het Fork-Join framework hierin gebruikt? Zijn lambda's sneller dan inner klassen? - Al deze vragen worden beantwoord in deze sessie. Daarnaast introduceert Java 8 meer performance verbeteringen: tiered compilatie, PermGen verwijdering, java.time, Accumulators, Adders en Map verbeteringen. Ten slotte zullen we ook een kijkje nemen in de keuken van de geplande performance verbeteringen voor Java 9: benutting van GPU's, Value Types en arrays 2.0.
Charts for the presentation at the OpenStack Summit at Barcelona, October 2016. The video is available at:
https://www.openstack.org/videos/video/toward-10000-containers-on-openstack
Fastsocket is a software that improves the scalability and performance of socket-based applications on multicore systems. It addresses kernel inefficiencies like synchronization overhead that consume over 90% of CPU cycles. Fastsocket introduces techniques like receive flow delivery, local listen/established tables, and a fastsocket-aware VFS to partition resources and process connections locally on each CPU core. In production at SINA, Fastsocket improved HTTP load balancing throughput by 45% on a 16-core system. Future work aims to further optimize performance through techniques like improved interrupt handling and system call batching.
VMworld 2013
Peter Boone, VMware
Seongbeom Kim, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
This document discusses techniques for writing highly scalable Java programs for multi-core systems. It begins with an overview of hardware trends showing an increasing number of cores per chip. It then discusses profiling tools that can identify lock contention issues. The document provides best practices for Java programming including reducing locking scope, splitting locks, stripping locks, using atomic variables, and lock-free algorithms. It emphasizes using concurrent containers and immutable/thread-local data where possible.
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
Presentation covering 25 years worth of lessons learned while performance benchmarking applications and databases. Presented at Percona Live London in November 2014.
Multithreading and Parallelism on iOS [MobOS 2013]Kuba Břečka
This document summarizes an overview of parallelism and multithreading on iOS. It covers key topics like parallelism terminology, why parallelization is important, and how it can be achieved through multiple processes, threads, high-level abstractions like Grand Central Dispatch and operation queues, and instruction-level parallelism. It also discusses challenges like race conditions and synchronization issues that must be addressed with techniques like locks and mutexes when working with threads.
The next release of the Java Standard Edition is scheduled for the beginning of 2010. In this session we'll review the latest feature list of what's in and what's out of the next version of the JDK.
An update on the latest news around JavaFX including the 1.2 release and the general availability of JavaFX Mobile for Windows Mobile devices.
Similar to New hope is comming? Project Loom.pdf (20)
In the global energy equation, the IT industry is not yet a major contributor to global warming, but it is increasingly significant. From an engineering standpoint we can achieve huge energy saving by replacing electronic signal processing with optical techniques for routing and switching, whilst longer fibre spans in the local loop offer further reductions. The mobile industry on the other hand has engineered 5G systems demanding ~10kW/tower due to signal processing and beam steering technologies. This sees some countries (i.e. China) closing cell sites at night to save money. So, what of 6G? The assumption that all surfaces can be smart signal regenerators with beam steering looks be a step too far and it may be time for a rethink!
On the extreme end of the scale we have AWS planning to colocate their latest AI data centre (at 1GW power consumption) along side two nuclear reactors because it needs 40% of their joint output. Google and Microsoft are following the AWS approach and reportedly in negotiation with nuclear plant owners. Needless to say that AI train ing sessions and usage have risen to dominate the top of the IT demand curve. At this time, there appears to be no limits to the projected energy demands of AI, but there is a further contender in this technology race, and that is the IoT. In order to satisfy the ecological demands of Industry 4.0/Society 5.0 we need to instrument and tag ‘Things’ by the Trillion, and not ~100 Billion as previously thought!
Now let’s see, Trillions of devices connected to the internet with 5G, 4G, WiFi, BlueTooth, LoRaWan et al using >100mW demands more power plants…
Numerical comaprison of various order explicit runge kutta methods with matla...DrAzizulHasan1
Numerical analysis is the area of mathematics and computer science that creates, analyzes andimplements numerical methods for solving numerically the problems of continuous mathematics. Such problems originates from real-world applications of algebra, geometry and calculus and they involve variables that vary continuously, such problems occur throughout the natural sciences, social science, engineering, medicine.
Reciprocating Air Compressor and its TypesAtif Razi
Air Compressors
Classification of Air Compressors
Reciprocating Air Compressor
Main Parts of Reciprocating Air Compressor
Working of Reciprocating Air Compressor
Types of Reciprocating Air Compressor
Applications of Reciprocating Air Compressor
Advantages & Disadvantages of Reciprocating Air Compressor
buy a fake University of London diploma supplementGlethDanold
Website: https://www.fakediplomamaker.shop/
Email: diplomaorder2003@gmail.com
Telegram: @fakeidiploma
skype: diplomaorder2003@gmail.com
wechat: jasonwilliam2003
buy bachelor degree from https://www.fakediplomamaker.shop/ to be competitive. Even if you are not already working and you havve just started to explore employment opportunities buy UK degree, buy masters degree from USA, buy bachelor degree from Australia, fake Canadian diploma where to buy diploma in Canada, It's still a great idea to purchase your degree and get a head start in your career. While many of the people your age will enlist in traditional programs and spend years learning you could accumulate valuable working experience. By the time they graduate you will have already solidified a respectable resume boasting both qualification and experience.
Computer Vision and GenAI for Geoscientists.pptxYohanes Nuwara
Presentation in a webinar hosted by Petroleum Engineers Association (PEA) in 28 July 2023. The topic of the webinar is computer vision for petroleum geoscience.
Manufacturing is the process of converting raw materials into finished goods through various production methods. Historically, manufacturing occurred on a small scale through apprenticeships or putting-out systems, but the Industrial Revolution led to large-scale manufacturing using machines powered by steam engines
13th International Conference on Information Technology Convergence and Servi...ijait
13th International Conference on Information Technology Convergence and Services
(ITCSE 2024) will provide an excellent international forum for sharing knowledge and
results in theory, methodology and applications of Information Technology Convergence and
Services. The aim of the conference is to provide a platform to the researchers and
practitioners from both academia as well as industry to meet and share cutting-edge
development in the field.
Artificial Intelligence Imaging - medical imagingNeeluPari
10 stages of Artificial Intelligence,
Artificial intelligence (AI) has made significant advancements in the field of medical imaging, offering valuable tools and capabilities to improve diagnostics, treatment planning, and patient care. Here are several ways AI is used in medical imaging
14. Concurrency: throughput(tasks/time unit)
Schedule multiple largely independent tasks to a set of computational resources
14
Parallelism: latency(time unit)
Speed up a task by splitting it to sub-tasks and exploiting multiple processing units
21. Thread
• Unit of work
• Too less them
• Requires a lot of resources
• Hard to manage (Reactive toys 😎)
• Not enough knowledge
• Lazy programmers
21
25. Platform Thread
• ~1ms to schedule thread
• Big memory consumption: 2MB of stack
• Expensive
• OS thread
• Task-switching requires switch to kernel: ~100µs (depends on the OS)
• Scheduling is a compromise for all usages. Bad cache locality
25
29. Virtual Thread
• Lighter threads
• Less memory usage
• Fastest blocking code*
• No more platform threads
• is not GC root
• CPU cache misses are possible
29
• Pay-as-you-go stacks (size 200-300
bytes) stored in a heap
• Scales to 1M+ on commodity
hardware
• Clean stack traces
• Your old code just works
• Readable sequential code
• The natural unit of scheduling for
operating systems
30. Virtual Thread
• Cheap to create
• Cheap to destroy
• Cheap to block
30
https://cojestgrane24.wyborcza.pl/cjg24/Warszawa/1,30,33618,Smerfy-live-on-stage-na-Torwarze.html
31. Purpose?
• mostly intended to write I/O application
• servers
• message brokers
• higher concurrency if system needs additional resources for concurrency
• available connections in a connection pool
• su
ffi
cient memory to serve the increased load
• increase e
ffi
ciency for short cycle tasks
31
32. Virtual Thread isn’t for
• The non-realtime kernels primarily employ time-sharing when the CPU is
at 100%
• Run for long time
• CPU bound tasks*
32
33. „Virtual threads are not an execution resource, but a business logic
object like a string.”
33
34. 34
final Thread thread1 = Thread
.ofPlatform()
.unstarted(() -> System.out.println("Hello from " + Thread.currentThread()));
final Thread thread2 = Thread
.ofVirtual()
.unstarted(() -> System.out.println("Hello from " + Thread.currentThread()));
Hello from Thread[#22,Thread-0,5,main]
Hello from VirtualThread[#23]/runnable@ForkJoinPool-1-worker-1
35. Fast forward to today
• Virtual thread = user mode thread
• Scheduled by JVM, not OS
• Virtual thread is a instance of java.lang.Thread
• Platform thread is instance of java.lang.Thread but implemented by
“traditional way”, thin wrapper around OS thread
35
37. How are virtual threads implemented?
• Built on continuations, as lower construct of JVM
• Virtual thread wraps a task in continuation
• FIFO mode
• M:N threading model
37
43. A scheduler assigns continuations to CPU cores, replacing a paused one
with another that's ready to run, and ensuring that a continuation that is
ready to resume will eventually be assigned to a CPU core.
43
47. Copy Terminology
• Freeze: Suspend a continuation and unmount it by copying frames from
OS thread stack → continuation object
• Thaw: Mount a suspended continuation by copying frames from
continuation object → OS thread stack
47
50. 50
private static void enter(Continuation c, boolean isContinue) {
// This method runs in the "entry frame".
// A yield jumps to this method's caller as if returning from this method.
try {
c.enter0();
} finally {
c.finish();
}
}
private void enter0() {
target.run();
}
67. I/O
• The java.nio.channels classes — SocketChannel, ServerSocketChannel and DatagramChannel — were retro
fi
tted to
become virtual-thread-friendly. When their synchronous operations, such as read and write, are performed on a virtual thread,
only non-blocking I/O is used under the covers.
• “Old” I/O networking — java.net.Socket, ServerSocket and DatagramSocket — has been reimplemented in Java on top
of NIO, so it immediately bene
fi
ts from NIO’s virtual-thread-friendliness.
• DNS lookups by the getHostName, getCanonicalHostName, getByName methods of java.net.InetAddress (and other
classes that use them) are still delegated to the operating system, which only provides a OS-thread-blocking API. Alternatives are
being explored.
• Process pipes will similarly be made virtual-thread-friendly, except maybe on Windows, where this requires a greater e
ff
ort.
• Console I/O has also been retro
fi
tted.
• Http(s)URLConnection and the implementation of TLS/SSL were changed to rely on j.u.c locks and avoid pinning.
• File I/O is problematic. Internally, the JDK uses bu
ff
ered I/O for
fi
les, which always reports available bytes even when a read will
block. On Linux, we plan to use io_uring for asynchronous
fi
le I/O, and in the meantime we’re using the
ForkJoinPool.ManagedBlocker mechanism to smooth over blocking
fi
le I/O operations by adding more OS threads to the
worker pool when a worker is blocked.
67
80. Project Synergies
• Data more local than ever
• Less reason to manually share data across thread pools
• Same data no are private in per request model
• GC when thread terminates
• The Virtual thread stack objet itself is thread-local
80
98. ConcurrentHashMap#computeIfAbsent
„Some attempted update operations on this map by other threads
may be blocked while computation is in progress, so the
computation should be short and simple, and must not attempt to
update any other mappings of this map.”
98
99. 99
import java.util.Map;
import java.util.concurrent.CancellationException;
import java.util.concurrent.ConcurrentHashMap;
public class CHMPinning {
public static void main(String... args) throws InterruptedException {
Map<Integer, Integer> map = new ConcurrentHashMap<>();
for (int i = 0; i < 1_000; i++) {
int finalI = i;
Thread.startVirtualThread(() -> map.computeIfAbsent(finalI % 3, key -> {
try {
Thread.sleep(2_000);
} catch (InterruptedException e) {
throw new CancellationException("interrupted");
}
return finalI;
}));
}
long time = System.nanoTime();
try {
Thread.startVirtualThread(() -> System.out.println("Hi, I'm an innocent virtual thread")).join();
} finally {
time = System.nanoTime() - time;
System.out.printf("time = %dms%n", (time / 1_000_000));
}
System.out.println("map = " + map);
}
}
100. 100
private static final ConcurrentMap<String, String> cache = new ConcurrentHashMap<>();
private static String refresh(String key) {
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
scope.fork(() -> UUID.randomUUID().toString());
scope.join();
return scope.result();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static void main(String[] args) throws Exception {
var cpus = Runtime.getRuntime().availableProcessors();
List<Future> fl = new ArrayList<>();
try (var es = Executors.newVirtualThreadPerTaskExecutor()) {
for (int i = 0; i < cpus; ++i)
fl.add(es.submit(() -> cache.computeIfAbsent("foo", k -> refresh(k))));
}
for (var f : fl)
System.out.println(f.get());
}
108. Future work
• BlockingQueue
• Structured Concurrency
• use io_uring for asynchronous
fi
le I/O
• Object.wait()
• Concurrent Collection review
108
109. Takeaways
• Nothing is changed 😃
• A virtual thread is a java.lang.Thread — in code, at
runtime, in the debugger and in the pro
fi
ler
• Lighter threads
• Pay-as-you-go stacks (size 200-300 bytes) stored in a heap
• Scales to 1M+ on commodity hardware
• Clean stack traces
• Your old code just works
• Readable sequential code
• The natural unit of scheduling for operating systems
109
• Your old code just works
• Readable sequential code
• The natural unit of scheduling for operating systems
• Clean stack traces
• A virtual thread is not a wrapper around an OS thread, but a
Java entity.
• Creating a virtual thread is cheap — have millions, and don’t
pool them!
• Blocking a virtual thread is cheap — be synchronous!
• No language changes are needed.
• Pluggable schedulers o
ff
er the
fl
exibility of asynchronous
programming.
110. Takeaways
• Move to simpler blocking/synchronous code
• Migrate tasks to Virtual threads not Platform threads to Virtual threads
• Use Semaphores or similar to limit concurrency
• Try to not cache expensive objects in Thread Locals
• Avoid pinning
• Avoid reusing
• Avoid pooling
110